MLOps Model Deployment Automation: Building a CI/CD/CT Pipeline with GitHub Actions + Kubeflow

Honestly, I used to think, "How hard can model deployment be? Just spin up an API server, right?" But when you actually do it, a completely different world unfolds. When the data pipeline changes, the model has to be retrained. A model that worked fine yesterday suddenly starts producing strange predictions today. Something that worked on server A inexplicably fails on server B — one mystery after another.

By the end of this post, you'll be able to: ① understand the 5-stage MLOps pipeline structure, ② work with real-world GitHub Actions YAML code, and ③ integrate drift monitoring with Evidently AI — all ready to follow along immediately. MLOps is the methodology that brings order to this chaos. It ties together the development (Dev) and operations (Ops) of machine learning models into a single system, so that a model from the experimental stage flows reliably and repeatably all the way to production. Whether you're a backend developer, a data engineer, or someone just getting started with ML, this post is packed with content you can apply directly in practice.

Core Concepts

MLOps Maturity: Where Is Your Team Right Now?

The MLOps maturity model defined by Google Cloud is divided into three levels. This framework works well as a self-diagnostic tool for figuring out "where our team currently stands."

Level	Name	Characteristics
Level 0	Manual Process	Experiments in notebooks, manual deployment. No reproducibility
Level 1	ML Pipeline Automation	Training pipeline automated, continuous training (CT) possible
Level 2	CI/CD Pipeline Automation	Code changes trigger automatic pipeline build and deployment

Most teams start at Level 0. The moment you think "we need to automate this" is the inflection point into Level 1, and Level 2 — automating the pipeline that builds pipelines — is a stage you see only in fairly mature ML organizations.

The Five Stages of an End-to-End Pipeline

A model deployment automation pipeline flows through roughly five stages.

css

[1. Data Collection & Preprocessing]
        ↓
[2. Model Training & Experiment Tracking]  ← MLflow / W&B
        ↓
[3. Model Validation & Registration]  ← Quality gate (AUC ≥ 0.85, etc.)
        ↓
[4. Deployment (Serving)]  ← REST/gRPC API
        ↓
[5. Monitoring & Feedback]  ← Drift detection → automatic retraining trigger

The most frequently overlooked parts of this flow are step 3, the quality gate, and step 5, monitoring. In particular, if you automate deployment without monitoring, you end up silently running a model whose performance is quietly degrading — and nobody knows. This is a situation you encounter often in practice, and the later the discovery, the greater the damage.

Data Drift vs. Concept Drift: Confusing Them Leads to the Wrong Response

Data Drift: The statistical distribution of the input data itself changes. Example: consumer patterns that shifted completely after COVID-19.

Concept Drift: The relationship between inputs and outputs (the pattern the model learned) changes. Example: fraud patterns evolve to the point where the existing model can no longer catch them.

The two types of drift have different causes and require different responses. Data Drift calls for inspecting the data pipeline first; Concept Drift is fundamentally about retraining. And one of the main causes of both is Training-Serving Skew, explained below. These concepts may seem unrelated, but in real-world incidents they tend to cascade and compound.

Training-Serving Skew: The Most Dangerous Trap in Practice

Training-Serving Skew: A mismatch between the data processing logic used during model training and the preprocessing logic applied at serving time. A significant share of cases where the model seems fine but predictions come out wrong originates here.

I missed this once and lost two days because of it. In the training code, missing values were filled with the mean; in the serving code, they were being filled with 0. Managing feature engineering code as a single source of truth is the most reliable way to prevent this problem.

Pros and Cons

Advantages

Item	Details
Deployment Speed	70%+ faster model deployment compared to manual processes (based on industry cases)
Reproducibility	Results can be reproduced at any time using the same pipeline
Collaboration	Clear role definition between data scientists ↔ infrastructure engineers
Stability	Safe, incremental rollouts via canary deployment and A/B testing
Experiment Tracking	60% reduction in experiment tracking overhead when adopting MLflow (based on industry cases)
Business Impact	Financial industry case: time-to-production for models reduced from 6 months → 2 weeks

Disadvantages and Caveats

Item	Details	Mitigation
Initial Build Cost	Platform setup can take months to 2 years	Start with managed services (SageMaker, Vertex AI)
Organizational Culture Change	ML team and infra team collaboration structure needs restructuring	Designate an MLOps champion role; adopt incrementally
Lack of Standardization	55% of companies cite lack of standardized practices as a major obstacle (ScienceDirect, 2025)	Start by documenting team conventions
Training-Serving Skew	Prediction errors caused by mismatched training and inference environments	Manage feature engineering code from a single source
Edge Deployment Complexity	Limited compute resources, unstable network environments	Lightweight models with TFLite or ONNX Runtime

The Most Common Mistakes in Practice

Implementing deployment automation without monitoring — Deployment is automated, but nobody notices when performance degrades. Drift detection and alerting should always be built alongside deployment automation. I once set up deployment automation first and later discovered a degraded model had been running for two weeks before anyone caught it.
Thinking about rollback strategy after deployment — By the time something goes wrong and you're looking for a rollback plan, it's already too late. It's strongly recommended to decide on the right strategy for your team — blue-green, canary, or shadow deployment — during the deployment design phase, before anything is shipped. Anyone who has experienced a same-day rollback knows exactly how painful this lesson is.
Not tracking data lineage — If you can't answer "what data was this model trained on?", auditing and regulatory compliance become extremely difficult. Using DVC or MLflow's dataset logging to keep records from the very beginning will save you a lot of pain down the road.

Practical Application

The three examples below are not independent code snippets — they represent a single flow. The data-validation step in Example 1 calls the drift_monitor.py in Example 3; once model quality passes the gate, the model is registered in the MLflow registry from Example 2, and Seldon adjusts traffic accordingly. Keeping the full picture in mind as you read will make things much easier to understand.

Example 1: CI/CD/CT Pipeline with GitHub Actions + Kubeflow

This is the most commonly used pattern. A code push serves as the trigger, and everything from training to deployment flows automatically.

yaml

# .github/workflows/mlops-pipeline.yml
name: MLOps CI/CD/CT Pipeline
 
on:
  push:
    branches: [main]
    paths:
      - 'src/models/**'
      - 'src/data/**'
  schedule:
    - cron: '0 2 * * 1'  # 매주 월요일 새벽 2시 정기 재학습
 
jobs:
  data-validation:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Validate data schema and drift
        # drift_monitor.py가 예시 3에서 구현한 Evidently AI 기반 스크립트입니다
        run: |
          python src/monitoring/drift_monitor.py \
            --reference data/reference_dataset.parquet \
            --current data/current_dataset.parquet \
            --threshold 0.05
 
  train-and-evaluate:
    needs: data-validation
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Train model
        run: python src/train.py --experiment-name ${{ github.sha }}
      - name: Evaluate quality gate
        id: eval
        run: |
          AUC=$(python src/evaluate.py --model-path outputs/model.pkl)
          echo "auc=$AUC" >> $GITHUB_OUTPUT
          # bc -l은 Linux 전용이므로, 이식성을 위해 Python으로 비교합니다
          python -c "import sys; sys.exit(0 if float('$AUC') >= 0.85 else 1)" || \
            (echo "Quality gate failed: AUC=$AUC" && exit 1)
 
  canary-deploy:
    needs: train-and-evaluate
    runs-on: ubuntu-latest
    steps:
      - name: Deploy canary (10% traffic)
        run: |
          kubectl apply -f k8s/seldon-canary-deployment.yaml
          # 30분짜리 모니터링을 Actions 잡 내에서 블로킹으로 돌리면
          # 비용·타임아웃 문제가 생깁니다. 실무에서는 Argo Rollouts나
          # 별도 스케줄러에 위임하거나, 아래처럼 비동기 모니터링을 트리거합니다.
          python scripts/trigger_canary_monitor.py --run-id ${{ github.run_id }}

Stage	Role	Behavior on Failure
`data-validation`	Detect data drift	Halt the entire pipeline
`train-and-evaluate`	Training + AUC quality gate	Block deployment and send alert
`canary-deploy`	Canary deployment at 10% traffic	Trigger automatic rollback

Canary Deployment: A deployment strategy that exposes a new version to only a portion of total traffic (e.g., 10%) first, verifies stability, then gradually increases the percentage. Because the blast radius is limited if something goes wrong, it is especially useful for ML model deployments.

Blue-Green Deployment: A strategy that keeps both the old version (Blue) and the new version (Green) environments running simultaneously, then switches traffic over all at once. Rollbacks are nearly instant, but infrastructure costs are doubled.

Example 2: Connecting Model Registry & Serving with MLflow + Seldon Core

This is the key link connecting experiment tracking to deployment. It's the pattern where Seldon Core automatically serves models registered in MLflow. The moment I first saw Seldon automatically adjust traffic right after transitioning an MLflow stage to Production — that was genuinely exciting.

python

# src/train.py — MLflow 실험 추적 및 모델 등록
# 데이터 로딩 구현은 src/data/loader.py 참조
import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import roc_auc_score
 
def train_and_register(X_train, y_train, X_val, y_val, params: dict):
    with mlflow.start_run() as run:
        mlflow.log_params(params)
 
        model = GradientBoostingClassifier(**params)
        model.fit(X_train, y_train)
 
        auc = roc_auc_score(y_val, model.predict_proba(X_val)[:, 1])
        mlflow.log_metric("auc", auc)
 
        if auc >= 0.85:
            mlflow.sklearn.log_model(
                model,
                artifact_path="model",
                registered_model_name="fraud-detector"
            )
            client = mlflow.tracking.MlflowClient()
 
            # run_id 기반으로 조회해야 동시 실험 시 레이스 컨디션을 방지할 수 있습니다
            # get_latest_versions(stages=["None"]) 패턴은 동시 실행 환경에서 위험합니다
            versions = client.search_model_versions(
                f"run_id='{run.info.run_id}'"
            )
            client.transition_model_version_stage(
                name="fraud-detector",
                version=versions[0].version,
                stage="Production"
            )
            print(f"[OK] 모델 등록 완료 -- AUC: {auc:.4f}")
        else:
            print(f"[FAIL] 품질 게이트 실패 -- AUC: {auc:.4f} (기준: 0.85)")
 
        return auc

yaml

# k8s/seldon-deployment.yaml — MLflow 모델 레지스트리와 연동
# kubectl apply -f k8s/seldon-deployment.yaml 로 클러스터에 적용합니다
# 참고: Seldon Core는 2024년 이후 유지보수가 축소되는 추세입니다.
#       현재는 KServe가 커뮤니티에서 더 활발히 유지되고 있으니 신규 도입 시 참고하세요.
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: fraud-detector
spec:
  predictors:
    - name: default
      traffic: 90
      graph:
        name: classifier
        implementation: MLFLOW_SERVER
        modelUri: "models:/fraud-detector/Production"
    - name: canary
      traffic: 10
      graph:
        name: classifier-canary
        implementation: MLFLOW_SERVER
        modelUri: "models:/fraud-detector/Staging"

The key to this setup is that the model registry stages (Staging / Production) are directly linked to the canary traffic split. When you promote a stage in MLflow, Seldon automatically adjusts the traffic. SeldonDeployment is a Kubernetes CRD (Custom Resource Definition); you apply the YAML above to a cluster using kubectl apply -f.

Example 3: Automated Data Drift Detection with Evidently AI

The data-validation step in Example 1 calls this script. When drift is detected, it can be wired to trigger a Kubeflow pipeline retraining run or send a notification via a Slack webhook.

python

# src/monitoring/drift_monitor.py
import requests
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, ModelPerformancePreset
import pandas as pd
 
def trigger_retraining(reason: str):
    """Kubeflow 파이프라인 API 또는 Slack 웹훅으로 재학습을 트리거합니다."""
    webhook_url = os.environ.get("SLACK_WEBHOOK_URL")
    if webhook_url:
        requests.post(webhook_url, json={"text": f"[MLOps] 재학습 트리거: {reason}"})
    # Kubeflow 파이프라인 연동 예시:
    # kfp.Client().create_run_from_pipeline_func(training_pipeline, arguments={})
 
def run_drift_report(reference_path: str, current_path: str) -> dict:
    reference = pd.read_parquet(reference_path)
    current = pd.read_parquet(current_path)
 
    report = Report(metrics=[
        DataDriftPreset(drift_share_threshold=0.3),  # 30% 이상 피처 드리프트 시 경고
        ModelPerformancePreset(),
    ])
    report.run(reference_data=reference, current_data=current)
 
    result = report.as_dict()
    drift_detected = result["metrics"][0]["result"]["dataset_drift"]
 
    if drift_detected:
        trigger_retraining(reason="data_drift_detected")
 
    report.save_html("reports/drift_report.html")
    return {"drift_detected": drift_detected, "report": "reports/drift_report.html"}

Closing Thoughts

The essence of an MLOps pipeline is "operating good models quickly, safely, and repeatably." Rather than trying to build the perfect system from day one, it's better to start by automating the single most painful point for your team right now.

Here are three steps you can start on immediately. These steps are not independent — each one naturally creates the need for the next.

Start experiment tracking with MLflow — After pip install mlflow, just add mlflow.start_run() and mlflow.log_metric() to your existing training code. Run mlflow ui to bring up the dashboard, and your experiment history will start becoming visible at a glance. As you use it, you'll naturally start thinking "we need a clear criterion for which model to deploy" — that's the signal to move to step 2.
Connect a quality gate with GitHub Actions — Write a workflow in .github/workflows/model-eval.yml that automatically evaluates the model when a PR is opened and blocks the merge if metrics like AUC fall below the threshold. This alone will prevent a significant number of "bad model slips into production" incidents. Once the quality gate is in place, the next natural question becomes "how do we catch performance degradation after deployment?" — which leads right into step 3.
Add production monitoring with Evidently AI — After pip install evidently, run a drift report comparing one week of serving data against training data as a weekly batch job. Even starting with just a weekly report will let you catch data quality issues much faster than before.

I should also be honest about what this post didn't cover. Feature Store integration, model explainability (SHAP, LIME), on-premises environment setup, and LLMOps — managing LLM fine-tuning and RAG pipelines — are each topics substantial enough to warrant their own dedicated post. I hope this article serves as a starting point for getting your bearings in the broader MLOps landscape: figuring out where you are right now, and where to head next.

Next post: A practical LLMOps guide to managing LLM fine-tuning and RAG pipelines in production — from prompt version control to building guardrails

References

MLOps Model Deployment Automation: Building a CI/CD/CT Pipeline with GitHub Actions + Kubeflow | DEV BAK - 기술블로그

DevOps

MLOps Model Deployment Automation: Building a CI/CD/CT Pipeline with GitHub Actions + Kubeflow

Core Concepts

MLOps Maturity: Where Is Your Team Right Now?

The MLOps maturity model defined by Google Cloud is divided into three levels. This framework works well as a self-diagnostic tool for figuring out "where our team currently stands."

Level	Name	Characteristics
Level 0	Manual Process	Experiments in notebooks, manual deployment. No reproducibility
Level 1	ML Pipeline Automation	Training pipeline automated, continuous training (CT) possible
Level 2	CI/CD Pipeline Automation	Code changes trigger automatic pipeline build and deployment

The Five Stages of an End-to-End Pipeline

A model deployment automation pipeline flows through roughly five stages.

css

[1. Data Collection & Preprocessing]
        ↓
[2. Model Training & Experiment Tracking]  ← MLflow / W&B
        ↓
[3. Model Validation & Registration]  ← Quality gate (AUC ≥ 0.85, etc.)
        ↓
[4. Deployment (Serving)]  ← REST/gRPC API
        ↓
[5. Monitoring & Feedback]  ← Drift detection → automatic retraining trigger

Data Drift vs. Concept Drift: Confusing Them Leads to the Wrong Response

Data Drift: The statistical distribution of the input data itself changes. Example: consumer patterns that shifted completely after COVID-19.

Concept Drift: The relationship between inputs and outputs (the pattern the model learned) changes. Example: fraud patterns evolve to the point where the existing model can no longer catch them.

Training-Serving Skew: The Most Dangerous Trap in Practice

Training-Serving Skew: A mismatch between the data processing logic used during model training and the preprocessing logic applied at serving time. A significant share of cases where the model seems fine but predictions come out wrong originates here.

Pros and Cons

Advantages

Item	Details
Deployment Speed	70%+ faster model deployment compared to manual processes (based on industry cases)
Reproducibility	Results can be reproduced at any time using the same pipeline
Collaboration	Clear role definition between data scientists ↔ infrastructure engineers
Stability	Safe, incremental rollouts via canary deployment and A/B testing
Experiment Tracking	60% reduction in experiment tracking overhead when adopting MLflow (based on industry cases)
Business Impact	Financial industry case: time-to-production for models reduced from 6 months → 2 weeks

Disadvantages and Caveats

Item	Details	Mitigation
Initial Build Cost	Platform setup can take months to 2 years	Start with managed services (SageMaker, Vertex AI)
Organizational Culture Change	ML team and infra team collaboration structure needs restructuring	Designate an MLOps champion role; adopt incrementally
Lack of Standardization	55% of companies cite lack of standardized practices as a major obstacle (ScienceDirect, 2025)	Start by documenting team conventions
Training-Serving Skew	Prediction errors caused by mismatched training and inference environments	Manage feature engineering code from a single source
Edge Deployment Complexity	Limited compute resources, unstable network environments	Lightweight models with TFLite or ONNX Runtime

The Most Common Mistakes in Practice

Implementing deployment automation without monitoring — Deployment is automated, but nobody notices when performance degrades. Drift detection and alerting should always be built alongside deployment automation. I once set up deployment automation first and later discovered a degraded model had been running for two weeks before anyone caught it.
Thinking about rollback strategy after deployment — By the time something goes wrong and you're looking for a rollback plan, it's already too late. It's strongly recommended to decide on the right strategy for your team — blue-green, canary, or shadow deployment — during the deployment design phase, before anything is shipped. Anyone who has experienced a same-day rollback knows exactly how painful this lesson is.
Not tracking data lineage — If you can't answer "what data was this model trained on?", auditing and regulatory compliance become extremely difficult. Using DVC or MLflow's dataset logging to keep records from the very beginning will save you a lot of pain down the road.

Practical Application

Example 1: CI/CD/CT Pipeline with GitHub Actions + Kubeflow

This is the most commonly used pattern. A code push serves as the trigger, and everything from training to deployment flows automatically.

yaml

# .github/workflows/mlops-pipeline.yml
name: MLOps CI/CD/CT Pipeline
 
on:
  push:
    branches: [main]
    paths:
      - 'src/models/**'
      - 'src/data/**'
  schedule:
    - cron: '0 2 * * 1'  # 매주 월요일 새벽 2시 정기 재학습
 
jobs:
  data-validation:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Validate data schema and drift
        # drift_monitor.py가 예시 3에서 구현한 Evidently AI 기반 스크립트입니다
        run: |
          python src/monitoring/drift_monitor.py \
            --reference data/reference_dataset.parquet \
            --current data/current_dataset.parquet \
            --threshold 0.05
 
  train-and-evaluate:
    needs: data-validation
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      - name: Install dependencies
        run: pip install -r requirements.txt
      - name: Train model
        run: python src/train.py --experiment-name ${{ github.sha }}
      - name: Evaluate quality gate
        id: eval
        run: |
          AUC=$(python src/evaluate.py --model-path outputs/model.pkl)
          echo "auc=$AUC" >> $GITHUB_OUTPUT
          # bc -l은 Linux 전용이므로, 이식성을 위해 Python으로 비교합니다
          python -c "import sys; sys.exit(0 if float('$AUC') >= 0.85 else 1)" || \
            (echo "Quality gate failed: AUC=$AUC" && exit 1)
 
  canary-deploy:
    needs: train-and-evaluate
    runs-on: ubuntu-latest
    steps:
      - name: Deploy canary (10% traffic)
        run: |
          kubectl apply -f k8s/seldon-canary-deployment.yaml
          # 30분짜리 모니터링을 Actions 잡 내에서 블로킹으로 돌리면
          # 비용·타임아웃 문제가 생깁니다. 실무에서는 Argo Rollouts나
          # 별도 스케줄러에 위임하거나, 아래처럼 비동기 모니터링을 트리거합니다.
          python scripts/trigger_canary_monitor.py --run-id ${{ github.run_id }}

Stage	Role	Behavior on Failure
`data-validation`	Detect data drift	Halt the entire pipeline
`train-and-evaluate`	Training + AUC quality gate	Block deployment and send alert
`canary-deploy`	Canary deployment at 10% traffic	Trigger automatic rollback

Canary Deployment: A deployment strategy that exposes a new version to only a portion of total traffic (e.g., 10%) first, verifies stability, then gradually increases the percentage. Because the blast radius is limited if something goes wrong, it is especially useful for ML model deployments.

Blue-Green Deployment: A strategy that keeps both the old version (Blue) and the new version (Green) environments running simultaneously, then switches traffic over all at once. Rollbacks are nearly instant, but infrastructure costs are doubled.

Example 2: Connecting Model Registry & Serving with MLflow + Seldon Core

python

# src/train.py — MLflow 실험 추적 및 모델 등록
# 데이터 로딩 구현은 src/data/loader.py 참조
import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import roc_auc_score
 
def train_and_register(X_train, y_train, X_val, y_val, params: dict):
    with mlflow.start_run() as run:
        mlflow.log_params(params)
 
        model = GradientBoostingClassifier(**params)
        model.fit(X_train, y_train)
 
        auc = roc_auc_score(y_val, model.predict_proba(X_val)[:, 1])
        mlflow.log_metric("auc", auc)
 
        if auc >= 0.85:
            mlflow.sklearn.log_model(
                model,
                artifact_path="model",
                registered_model_name="fraud-detector"
            )
            client = mlflow.tracking.MlflowClient()
 
            # run_id 기반으로 조회해야 동시 실험 시 레이스 컨디션을 방지할 수 있습니다
            # get_latest_versions(stages=["None"]) 패턴은 동시 실행 환경에서 위험합니다
            versions = client.search_model_versions(
                f"run_id='{run.info.run_id}'"
            )
            client.transition_model_version_stage(
                name="fraud-detector",
                version=versions[0].version,
                stage="Production"
            )
            print(f"[OK] 모델 등록 완료 -- AUC: {auc:.4f}")
        else:
            print(f"[FAIL] 품질 게이트 실패 -- AUC: {auc:.4f} (기준: 0.85)")
 
        return auc

yaml

# k8s/seldon-deployment.yaml — MLflow 모델 레지스트리와 연동
# kubectl apply -f k8s/seldon-deployment.yaml 로 클러스터에 적용합니다
# 참고: Seldon Core는 2024년 이후 유지보수가 축소되는 추세입니다.
#       현재는 KServe가 커뮤니티에서 더 활발히 유지되고 있으니 신규 도입 시 참고하세요.
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: fraud-detector
spec:
  predictors:
    - name: default
      traffic: 90
      graph:
        name: classifier
        implementation: MLFLOW_SERVER
        modelUri: "models:/fraud-detector/Production"
    - name: canary
      traffic: 10
      graph:
        name: classifier-canary
        implementation: MLFLOW_SERVER
        modelUri: "models:/fraud-detector/Staging"

Example 3: Automated Data Drift Detection with Evidently AI

The data-validation step in Example 1 calls this script. When drift is detected, it can be wired to trigger a Kubeflow pipeline retraining run or send a notification via a Slack webhook.

python

# src/monitoring/drift_monitor.py
import requests
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, ModelPerformancePreset
import pandas as pd
 
def trigger_retraining(reason: str):
    """Kubeflow 파이프라인 API 또는 Slack 웹훅으로 재학습을 트리거합니다."""
    webhook_url = os.environ.get("SLACK_WEBHOOK_URL")
    if webhook_url:
        requests.post(webhook_url, json={"text": f"[MLOps] 재학습 트리거: {reason}"})
    # Kubeflow 파이프라인 연동 예시:
    # kfp.Client().create_run_from_pipeline_func(training_pipeline, arguments={})
 
def run_drift_report(reference_path: str, current_path: str) -> dict:
    reference = pd.read_parquet(reference_path)
    current = pd.read_parquet(current_path)
 
    report = Report(metrics=[
        DataDriftPreset(drift_share_threshold=0.3),  # 30% 이상 피처 드리프트 시 경고
        ModelPerformancePreset(),
    ])
    report.run(reference_data=reference, current_data=current)
 
    result = report.as_dict()
    drift_detected = result["metrics"][0]["result"]["dataset_drift"]
 
    if drift_detected:
        trigger_retraining(reason="data_drift_detected")
 
    report.save_html("reports/drift_report.html")
    return {"drift_detected": drift_detected, "report": "reports/drift_report.html"}

Closing Thoughts

Here are three steps you can start on immediately. These steps are not independent — each one naturally creates the need for the next.

Start experiment tracking with MLflow — After pip install mlflow, just add mlflow.start_run() and mlflow.log_metric() to your existing training code. Run mlflow ui to bring up the dashboard, and your experiment history will start becoming visible at a glance. As you use it, you'll naturally start thinking "we need a clear criterion for which model to deploy" — that's the signal to move to step 2.
Connect a quality gate with GitHub Actions — Write a workflow in .github/workflows/model-eval.yml that automatically evaluates the model when a PR is opened and blocks the merge if metrics like AUC fall below the threshold. This alone will prevent a significant number of "bad model slips into production" incidents. Once the quality gate is in place, the next natural question becomes "how do we catch performance degradation after deployment?" — which leads right into step 3.
Add production monitoring with Evidently AI — After pip install evidently, run a drift report comparing one week of serving data against training data as a weekly batch job. Even starting with just a weekly report will let you catch data quality issues much faster than before.

Next post: A practical LLMOps guide to managing LLM fine-tuning and RAG pipelines in production — from prompt version control to building guardrails

Core Concepts

MLOps Maturity: Where Is Your Team Right Now?

The Five Stages of an End-to-End Pipeline

Data Drift vs. Concept Drift: Confusing Them Leads to the Wrong Response

Training-Serving Skew: The Most Dangerous Trap in Practice

Pros and Cons

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Practical Application

Example 1: CI/CD/CT Pipeline with GitHub Actions + Kubeflow

Example 2: Connecting Model Registry & Serving with MLflow + Seldon Core

Example 3: Automated Data Drift Detection with Evidently AI

Closing Thoughts

References

Core Concepts

MLOps Maturity: Where Is Your Team Right Now?

The Five Stages of an End-to-End Pipeline

Data Drift vs. Concept Drift: Confusing Them Leads to the Wrong Response

Training-Serving Skew: The Most Dangerous Trap in Practice

Pros and Cons

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Practical Application

Example 1: CI/CD/CT Pipeline with GitHub Actions + Kubeflow

Example 2: Connecting Model Registry & Serving with MLflow + Seldon Core

Example 3: Automated Data Drift Detection with Evidently AI

Closing Thoughts

References

Recommended Posts

Kubernetes Cost Optimization in Practice — From Namespace-Level Cost Tracking with OpenCost & Kubecost to HPA/VPA Tuning

Pattern Guide: Reducing EKS Spot Costs by 56% with OpenCost + Karpenter

Reducing Karpenter Costs by Up to 56% Through Kubernetes Resource Right-Sizing with Goldilocks + VPA

WebAssembly (Wasm) Serverless: The Complete Guide — Sub-1ms Cold Starts to Kubernetes Deployment

FinOps Practical Guide: Preventing Bill Shock Through Cloud Cost Optimization

Seeing Into the Kernel Without Changing a Single Line of Code with eBPF — A Practical Guide to Kubernetes Observability