Tuesday, September 9, 2025

Machine Learning(ML) | Intro.

 Machine Learning (ML) Intro - A progressive breakdown.

Intro:

    • Machine learning (ML) is a subfield of artificial intelligence (AI) that enables machines to automatically learn and improve from experience without being explicitly programmed. 
    • Instead of hard-coded instructions for every possible outcome, ML uses algorithms to analyze large datasets, learn from the insights gained, and then make informed decisions or predictions.
Core Concepts
    • Learning from Data: The fundamental principle of ML is that a system can identify patterns in data. 
    • The more relevant data provided, the better the model's performance typically becomes.
    • Generalization: A key goal of ML is for the model to generalize beyond the specific examples in the training data, applying its learned insights to new, unseen data.
    • Algorithms: A variety of algorithms are used in ML, ranging from simpler models like linear regression to complex ones such as neural networks used in deep learning. 
Types of Machine Learning

Machine learning algorithms generally fall into the following broad categories:
    • Supervised learning: The model is trained on labeled data, meaning each input has a corresponding output label. The goal is to learn a mapping from inputs to outputs (e.g., classifying emails as "spam" or "not spam").
    • Unsupervised learning: The model is given unlabeled data and must find patterns or structure within it on its own (e.g., grouping customers into different segments based on their purchasing behavior).
    • Reinforcement learning: An agent learns to make decisions by interacting with an environment, receiving rewards or penalties for its actions to maximize a cumulative reward.
    • Semi-supervised and self-supervised learning: These approaches combine aspects of supervised and unsupervised learning, often using large amounts of unlabeled data alongside smaller amounts of labeled data.
Applications of  Machine Learning (ML) 

ML is used in countless modern applications, including:
    • Fraud detection: Identifying suspicious financial transactions.
    • Healthcare: Assisting in disease classification and analyzing medical images.
    • Natural Language Processing (NLP): Powering conversational agents and text generation models.
    • Recommendation systems: Suggesting products or content based on user preferences.


1. Foundations of ML

  • Definition: ML is a subset of AI where systems learn patterns from data without explicit programming.
  • Categories:
    • Supervised Learning – labeled data regression, classification.
    • Unsupervised Learning – no labels clustering, dimensionality reduction.
    • Semi-Supervised Learning – mix of labeled + unlabeled data.
    • Reinforcement Learning (RL) – agents learn via reward signals in an environment.
  • Key Concepts:
    • Features & Labels
    • Training vs Testing data
    • Bias-Variance Tradeoff
    • Overfitting vs Underfitting

2. Mathematical Backbone

    • Linear Algebra: vectors, matrices, dot products (important for neural nets).
    • Probability & Statistics: distributions, Bayes’ theorem, likelihood, entropy.
    • Calculus: derivatives, gradients (used in optimization).
    • Optimization: Gradient Descent, Stochastic Gradient Descent (SGD), Adam, RMSprop.

3. Classical ML Algorithms

    • Regression: Linear Regression, Logistic Regression.
    • Decision Trees & Ensembles: Random Forests, Gradient Boosted Trees (XGBoost, LightGBM, CatBoost).
    • SVMs (Support Vector Machines)
    • Clustering: K-Means, DBSCAN, Hierarchical Clustering.
    • Dimensionality Reduction: PCA, t-SNE, UMAP.

4. Neural Networks & Deep Learning

  • Artificial Neural Networks (ANNs): Inspired by biological neurons.
  • Architecture:
    • Input layer Hidden layers Output layer
    • Activation functions: ReLU, Sigmoid, Tanh, Softmax.
  • Deep Learning Specializations:
    • Convolutional Neural Networks (CNNs) – image recognition, object detection.
    • Recurrent Neural Networks (RNNs), LSTMs, GRUs – sequence modeling (NLP, time series).
    • Transformers – attention mechanism, foundation for GPT, BERT, etc.
    • Generative Models: GANs (Generative Adversarial Networks), VAEs (Variational Autoencoders).

5. Modern ML Practices

    • Transfer Learning – using pretrained models (e.g., ImageNet, BERT) for new tasks.
    • Self-Supervised Learning – pretraining without labels (e.g., contrastive learning, masked prediction).
    • Foundation Models – GPT, LLaMA, PaLM.
    • Hyperparameter Optimization: Grid search, random search, Bayesian optimization, Hyperband.
    • Regularization: L1/L2, Dropout, BatchNorm, Data Augmentation.

6. ML in Production (MLOps)

    • Data Engineering: pipelines, feature stores.
    • Model Training: distributed training (Horovod, DeepSpeed).
    • Model Deployment: APIs, batch inference, real-time serving.
    • Monitoring: drift detection, retraining pipelines.
    • Tools: MLflow, Kubeflow, SageMaker, Vertex AI.

7. Cutting Edge Areas

    • Large Language Models (LLMs) – GPT, Claude, LLaMA.
    • Multimodal Models – text + images + audio + video (e.g., CLIP, DALL·E).
    • Reinforcement Learning with Human Feedback (RLHF) – fine-tuning AI with preferences.
    • Federated Learning – training across decentralized devices while preserving privacy.
    • Causal Inference in ML – moving beyond correlation to causation.

8. Real-World Applications

    • Computer Vision: medical imaging, autonomous vehicles.
    • Natural Language Processing (NLP): chatbots, sentiment analysis, translation.
    • Recommendation Systems: Netflix, Amazon, Spotify.
    • Finance: fraud detection, algorithmic trading.
    • Healthcare: drug discovery, personalized medicine.

9. Challenges in ML

    • Data Quality: garbage in garbage out.
    • Bias & Fairness: ML models inherit human biases.
    • Explainability: black-box models vs interpretable AI.
    • Scalability: cost of training huge models.
    • Ethics & Safety: misuse of AI, deepfakes, autonomous weapons.

Hands-on (CLI) 

    •  A compact, practical deep dive with ready-to-run code (PyTorch first, then a matching TensorFlow/Keras version),
    • Plus best-practice tips for training, debugging, and production. 

An end-to-end example: 

data  model training loop validation saving/loading a transfer-learning recipe mixed precision & tips.

Note: 

These examples are fully runnable on a machine with PyTorch (or TensorFlow), CUDA if available, and usual ML libs (torchvision, numpy). 

PyTorch — end-to-end (classification on CIFAR-10)

# file: twtech-train_pytorch_cifar10.py

import os

import math

import time

from pathlib import Path

import torch

import torch.nn as nn

import torch.optim as optim

from torch.utils.data import DataLoader

import torchvision

from torchvision import transforms, datasets, models

#  config 

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

OUT_DIR = Path("runs/cifar_experiment")

OUT_DIR.mkdir(parents=True, exist_ok=True)

BATCH_SIZE = 128

LR = 0.01

WEIGHT_DECAY = 1e-4

EPOCHS = 20

NUM_CLASSES = 10

NUM_WORKERS = 4

#  data 

train_tf = transforms.Compose([

    transforms.RandomCrop(32, padding=4),

    transforms.RandomHorizontalFlip(),

    transforms.ToTensor(),

    transforms.Normalize((0.4914, 0.4822, 0.4465),

                         (0.2023, 0.1994, 0.2010)),

])

val_tf = transforms.Compose([

    transforms.ToTensor(),

    transforms.Normalize((0.4914, 0.4822, 0.4465),

                         (0.2023, 0.1994, 0.2010)),

])

# NB: set download=True once to fetch data, then False on subsequent runs

train_ds = datasets.CIFAR10(root="data", train=True, transform=train_tf, download=True)

val_ds   = datasets.CIFAR10(root="data", train=False, transform=val_tf, download=True)

train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True, num_workers=NUM_WORKERS, pin_memory=True)

val_loader   = DataLoader(val_ds,   batch_size=256,       shuffle=False, num_workers=NUM_WORKERS, pin_memory=True)

#  Model (transfer learning with ResNet18) 

model = models.resnet18(pretrained=True)

# adapt final layer

in_feats = model.fc.in_features

model.fc = nn.Linear(in_feats, NUM_CLASSES)

model = model.to(DEVICE)

# optionally freeze early layers for faster convergence

for name, p in model.named_parameters():

    if "layer4" not in name and "fc" not in name:

        p.requires_grad = False

# Optimizer, loss, scheduler 

criterion = nn.CrossEntropyLoss()

optimizer = optim.SGD(filter(lambda p: p.requires_grad, model.parameters()),

                      lr=LR, momentum=0.9, weight_decay=WEIGHT_DECAY)

# cosine LR with warmup

def cosine_lr(epoch, base_lr=LR, T_max=EPOCHS, warmup=2):

    if epoch < warmup:

        return base_lr * (epoch + 1) / warmup

    return base_lr * 0.5 * (1 + math.cos(math.pi * (epoch - warmup) / (T_max - warmup)))

# Training & eval loops 

scaler = torch.cuda.amp.GradScaler(enabled=(DEVICE=="cuda"))  # mixed precision if CUDA available

def evaluate(model, loader):

    model.eval()

    correct = 0

    total = 0

    loss_sum = 0.0

    with torch.no_grad():

        for x, y in loader:

            x, y = x.to(DEVICE), y.to(DEVICE)

            logits = model(x)

            loss_sum += criterion(logits, y).item() * x.size(0)

            preds = logits.argmax(dim=1)

            correct += (preds == y).sum().item()

            total += x.size(0)

    return loss_sum / total, correct / total

best_val_acc = 0.0

for epoch in range(EPOCHS):

    model.train()

    epoch_loss = 0.0

    start = time.time()

    lr_mult = cosine_lr(epoch)

    for g in optimizer.param_groups:

        g['lr'] = LR * lr_mult

    for batch_idx, (x, y) in enumerate(train_loader):

        x, y = x.to(DEVICE), y.to(DEVICE)

        optimizer.zero_grad()

        with torch.cuda.amp.autocast(enabled=(DEVICE=="cuda")):

            logits = model(x)

            loss = criterion(logits, y)

        scaler.scale(loss).backward()

        scaler.step(optimizer)

        scaler.update()

        epoch_loss += loss.item() * x.size(0)

    train_loss = epoch_loss / len(train_loader.dataset)

    val_loss, val_acc = evaluate(model, val_loader)

    elapsed = time.time() - start

    print(f"Epoch {epoch+1}/{EPOCHS} | train_loss={train_loss:.4f} val_loss={val_loss:.4f} val_acc={val_acc:.4f} time={elapsed:.1f}s lr={optimizer.param_groups[0]['lr']:.3e}")

    # checkpoint best

    ckpt_path = OUT_DIR / "best.pth"

    if val_acc > best_val_acc:

        best_val_acc = val_acc

        torch.save({

            "epoch": epoch,

            "model_state": model.state_dict(),

            "optim_state": optimizer.state_dict(),

            "val_acc": val_acc,

        }, ckpt_path)

#  load & test 

print("Best val acc:", best_val_acc)

ckpt = torch.load(OUT_DIR/"best.pth", map_location=DEVICE)

model.load_state_dict(ckpt["model_state"])

test_loss, test_acc = evaluate(model, val_loader)

print(f"Test: loss={test_loss:.4f} acc={test_acc:.4f}")

Highlights & notes

    • Uses resnet18(pretrained=True) then adapts final layer — classic transfer learning.
    • Mixed precision (AMP) via torch.cuda.amp gives speed/memory wins on modern GPUs.
    • Freezes early layers to speed up training; fine-tune later if needed.
    • Cosine LR with warmup — simple but effective schedule.
    • Saves best checkpoint with torch.save(...).

Minimal PyTorch training loop (from-scratch MLP) — educational

# twtech-toy_mlp.py

import torch, torch.nn as nn, torch.optim as optim

from sklearn.datasets import make_classification

from torch.utils.data import TensorDataset, DataLoader

X, y = make_classification(5000, n_features=20, n_informative=10, n_classes=3, random_state=0)

X = torch.tensor(X, dtype=torch.float32)

y = torch.tensor(y, dtype=torch.long)

ds = TensorDataset(X, y)

loader = DataLoader(ds, batch_size=64, shuffle=True)

class MLP(nn.Module):

    def __init__(self, in_dim, hidden=128, out_dim=3):

        super().__init__()

        self.net = nn.Sequential(

            nn.Linear(in_dim, hidden),

            nn.ReLU(),

            nn.Linear(hidden, hidden),

            nn.ReLU(),

            nn.Linear(hidden, out_dim)

        )

    def forward(self, x):

        return self.net(x)

model = MLP(20).to("cpu")

opt = optim.Adam(model.parameters(), lr=1e-3)

crit = nn.CrossEntropyLoss()

for epoch in range(20):

    total_loss = 0

    for xb, yb in loader:

        logits = model(xb)

        loss = crit(logits, yb)

        opt.zero_grad(); loss.backward(); opt.step()

        total_loss += loss.item() * xb.size(0)

    print(f"Epoch {epoch+1} loss={total_loss/len(loader.dataset):.4f}")

TensorFlow / Keras equivalent (CIFAR-10, transfer learning)

# train_tf_cifar10.py

import tensorflow as tf

from tensorflow.keras import layers, models, optimizers, losses, callbacks, applications

BATCH = 128

EPOCHS = 20

AUTOTUNE = tf.data.AUTOTUNE 

# data

(x_train, y_train), (x_val, y_val) = tf.keras.datasets.cifar10.load_data()

x_train = x_train.astype("float32") / 255.0

x_val   = x_val.astype("float32") / 255.0

# preprocessing

train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(10000).batch(BATCH).map(

    lambda x,y: (tf.image.resize_with_crop_or_pad(x, 36, 36), y), num_parallel_calls=AUTOTUNE

).map(lambda x,y: (tf.image.random_crop(x, (32,32,3)), y), num_parallel_calls=AUTOTUNE).map(

    lambda x,y: (tf.image.random_flip_left_right(x), y), num_parallel_calls=AUTOTUNE

).map(lambda x,y: ((x-0.5)/0.5, y)).prefetch(AUTOTUNE)

val_ds = tf.data.Dataset.from_tensor_slices((x_val, y_val)).batch(BATCH).map(lambda x,y: ((x-0.5)/0.5, y)).prefetch(AUTOTUNE)

# model

base = applications.ResNet50(weights="imagenet", include_top=False, input_shape=(32,32,3), pooling="avg")

base.trainable = False

inputs = layers.Input(shape=(32,32,3))

x = base(inputs, training=False)

x = layers.Dropout(0.5)(x)

outputs = layers.Dense(10, activation="softmax")(x)

model = models.Model(inputs, outputs)

model.compile(optimizer=optimizers.SGD(learning_rate=0.01, momentum=0.9),

              loss=losses.SparseCategoricalCrossentropy(),

              metrics=["accuracy"])

cb = [callbacks.ModelCheckpoint("best_tf.h5", save_best_only=True, monitor="val_accuracy", mode="max")]

model.fit(train_ds, validation_data=val_ds, epochs=EPOCHS, callbacks=cb)

Practical tips & patterns (cheat-sheet)

Data

    • Normalize using dataset mean/std (or ImageNet stats for pretrained models).
    • Use prefetch, num_workers, and pinned memory for throughput.
    • Augment aggressively for small datasets (random crop, flip, color jitter, cutout).

Model & training

    • Start small: tiny model / small subset of data to verify pipeline correctness.
    • Use a sane baseline: pretrained model + small LR for fine-tuning.
    • Learning rate matters most use LR finder or sweeps (fast).
    • Use gradient clipping if exploding gradients.

Optimization tricks

    • Auto-mixed precision (torch.cuda.amp or tf.keras.mixed_precision) for speed & memory.
    • Use cosine or piecewise schedules; warmup helps.
    • Use weight decay (L2) and momentum; Adam works well but tune weight decay differently.

Debugging

    • Overfit tiny dataset (e.g., 32 samples). If twtech can't reach 0 train loss bug.
    • Print shapes & dtypes; ensure labels are correct type (long for PyTorch CE).
    • Monitor gradient norms; for p in model.parameters(): assert not torch.isnan(p.grad).any().

Saving & reproducibility

    • Save state_dict, optimizer state, rng states (torch, numpy, python).
    • Log hyperparams & metrics (Weave/MLflow/Weights & Biases or simple CSV).

Production

    • Export to TorchScript / ONNX for fast inference.
    • Use batching & model quantization (INT8) for latency-sensitive deployments.
    • Monitor drift & set up automations for retraining.

Extra — small checklist to run experiments correctly

    1. Reproducibility: set seeds (torch, numpy, random).
    2. Profiling: measure data loading vs GPU utilization (nvidia-smi + torch.cuda.memory_stats).
    3. Hyperparameter search: start with coarse grid/random search, then local Bayesian search.
    4. Logging: use TensorBoard/Weights & Biases. 
    5. Save checkpoints & experiment metadata.

 MLOps & scaling ML systems end-to-end

    • principles, reference architectures, 
    • concrete components, 
    • patterns for production, 
    • cost/ops trade-offs, 
    • sample CI/CD + serving snippets,
    • Best practices,
    • Tooling (Feast, MLflow, KServe, Databricks/Neptune summaries) to fint into the current ecosystem state.

1) High-level goals of MLOps

Treat ML systems like software plus data: 

    • Reproducibility, 
    • Automated delivery (retrain test promote), 
    • Observability (data + model)
    • governance (lineage, access), 
    • Cost-efficient scaling. 

NB:

  • Organizations are increasingly unifying DevOps, DevSecOps and MLOps, for software supply chain.

2) Core building blocks (reference map)

  1. Source control & experiment tracking
    • Git for code + infra.
    • Experiment tracking & metadata: MLflow / Weights & Biases / Neptune for runs, metrics, artifacts; MLflow also provides a Model Registry.
  2. Data pipelines & feature engineering
    • ETL/ELT pipelines (Airflow, Prefect, Dagster) to build offline features.
    • Feature store (e.g., Feast or commercial alternatives) to centralize feature definitions, versioning, online serving and consistency between train/inference. This is essential to avoid training-serving skew.
  3. Model training & hyperparam tuning
    • Reproducible training jobs (containerized; tracked by experiments). Use distributed frameworks (Ray, Horovod) for scale.
    • Automated HPO (Optuna, Ray Tune, Katib).
  4. Model registry & lineage
    • Centralized registry to version models, store metadata (which dataset, code commit, metrics), stage transitions (staging → prod). MLflow Model Registry is a common open option. 
  5. CI/CD for models
    • Automated pipelines that run tests (unit, data validation, model performance tests), create model artifacts, and promote to staging/production upon checks.
  6. Serving / inference
    • Options: batch inference, online microservice, or streaming.
    • Serving frameworks for K8s: KServe, Seldon Core, BentoML, etc. Choose based on ops maturity (KServe lighter; Seldon more feature rich but complex). 
  7. Monitoring & observability
    • Model performance (accuracy, latency), data drift, feature distributions, input-output schemas, and business KPIs.
    • Alerting + automated retrain triggers when drift/metric degradation detected.
  8. Governance & compliance
    • Auditable lineage, access control, model explainability reports, and retraining/rollback policies.

3) Typical reference architectures (3 patterns)

A — Small team / cloud-managed (fastest path)

    • Cloud-managed experiments & registry (SageMaker / Vertex AI / Databricks + MLflow), cloud feature store or Feast hosted, serverless endpoint for inference.
    • Pros: fast setup, less ops.
    • Cons: vendor lock, cost at scale; still need governance.

B — Mid-size / hybrid (most common)

    • Git + CI, Airflow/Dagster for pipelines, Feast for feature store, MLflow for tracking & registry, K8s cluster with KServe or Seldon for serving, Prometheus + Grafana for metrics.
    • Automations: CI run training job (k8s/Pipeline) evaluate MLflow register orchestrated deploy to KServe.
    • Pros: flexible, reproducible, can optimize cost.
    • Cons: requires platform engineering.

C — High-scale real-time (latency-critical)

    • Online feature store (low-latency cache), model shards with autoscaling (scale-to-zero support), prediction caching, asynchronous batch fallback, A/B / canary rollout, and autoscaling with KEDA/HPA.
    • Serving frameworks: KServe / Seldon with advanced routing, or custom gRPC microservices with optimized inference runtimes (TorchScript/ONNX Runtime/TVM) for minimal latency. 

4) Key operational patterns & recipes

Versioning & reproducibility

    • Version: code (git commit), data (data versioning via DVC or hashed dataset URIs), model (registry), environment (container + pip/conda spec).
    • Save provenance in registry: dataset hash, experiment id, commit, feature versions. MLflow supports these metadata fields.

CI/CD pipeline for models (example steps)

    1. On merge to main: run unit tests + static lint.
    2. Run data validation on recent sample(s) (schema checks).
    3. Launch training job (containerized), produce MLflow run with artifacts.
    4. Run model evaluation suite: holdout tests, fairness checks, explainability summary, and performance vs baseline.
    5. If checks pass register model into registry as Staging.
    6. Run integration test deployment in staging (smoke tests, canary traffic).
    7. Promote to Production via registry API (manual or gated).
      (Use GitHub Actions / GitLab CI / Tekton to implement; see snippet below.)
    8. (Use GitHub Actions / GitLab CI / Tekton to implement; see snippet below.)

# Sample: minimal GitHub Actions job to run tests + register model

name: ml-pipeline

on: [push]

jobs:

  train:

    runs-on: ubuntu-latest

    steps:

      - uses: actions/checkout@v4

      - name: Set up Python

        uses: actions/setup-python@v4

        with: python-version: '3.10'

      - name: Install deps

        run: pip install -r requirements.txt mlflow

      - name: Run unit & smoke tests

        run: pytest -q

      - name: Start training job

        run: python train.py --output ./model_artifact

      - name: Register model to MLflow

        run: |

          mlflow models build-docker -m ./model_artifact -n mymodel:${{ github.sha }}

          # Alternatively, use MLflow Registry REST API to create a Model version and transition stages

NB:

  • Adjust to twtech infra: if training runs on k8s, the CI step would submit a k8s job rather than run locally.

Feature store recipe (train vs online)

    • Compute batch features for training (store offline features in warehouse).
    • Register feature definitions (names, transforms) in Feast.
    • Serve online features via Feast’s online store (Redis/Bigtable) to ensure same feature code at inference. 

Monitoring checklist (must-have metrics)

    • Data: input schema violations, feature value distributions, missingness.
    • Model: prediction distribution, top-k classes, confidence histograms.
    • Performance: latency p50/p95/p99, throughput, error rate.
    • Business: SLAs, uplift over baseline, revenue impact.
      Set thresholds and automations (alerts, gating retrain).
    • Set thresholds and automations (alerts, gating retrain).

5) Scaling considerations & cost tradeoffs

  • Batch vs Online inference
    • Batch: cheaper, simpler; okay for non-latency tasks.
    • Online: adds complexity (low-latency storage for features, autoscaling). Use optimized runtimes (ONNX/TorchScript) and warm pools to reduce cold start latency.
  • Autoscaling & scale-to-zero
    • Evaluate whether your serving infra supports scale-to-zero (serverless) to save cost for sporadic traffic. KServe + KEDA can implement autoscaling; implementation complexity varies. 
  • Caching & precomputation
    • Cache popular predictions or precompute expensive features to reduce runtime compute.
  • Model size & sharding
    • For huge LLMs, use model parallelism, offload embedding caches to Redis/FAISS, and consider quantization. Managed LLM services can be more cost-effective than self-hosting at some scales.
  • Observability at scale
    • Sampling is necessary; full capture of every inference (payload) is often impractical. Store aggregated statistics plus sampled traces and explainability outputs.

6) Tooling recommendations (how to pick)

    • If twtech needs speed-to-market / have limited infra skill: use managed cloud MLOps (Vertex AI, SageMaker, Databricks + hosted MLflow). 
    • If twtech needs flexibility and want open-source control: MLflow (tracking & registry) + Feast (feature store) + KServe/Bento/Seldon for serving; orchestrate with Airflow/Dagster.
    • If twtech operates Kubernetes at scale: choose KServe or Seldon Core depending on required features; Seldon is feature-rich but heavier to operate. 

7) Concrete mini-playbook: 30-90 day roadmap to productionize a model

Days 0–7:

Build reproducible training artifact (script container), add tracking (MLflow), versioning (git), unit tests.

Days 8–21: 

Add data validation, small CI to run tests and trigger training. Add model evaluation tests (metrics, fairness).

Days 22–45:

Add model registry and promote workflow (staging). Implement feature store for consistent features. 

Days 46–75: 

Deploy to staging serving infra (KServe/Seldon), add monitoring (Prometheus/Grafana + custom metrics), build alerting.

Days 76–90:

Do Canary/A-B rollouts, validate with production traffic (shadow mode), then promote to prod and automate retrain triggers on drift.

8) Quick code snippets you’ll find useful

# Register a model with MLflow (Python)

import mlflow

from mlflow.entities import ViewType

mlflow.set_tracking_uri("http://mlflow-server:5000")

with mlflow.start_run() as run:

    mlflow.log_metric("val_acc", 0.92)

    mlflow.pytorch.log_model(model, "model")

    result = mlflow.register_model(f"runs:/{run.info.run_id}/model", "my-model")

    # transition to staging

    client = mlflow.tracking.MlflowClient()

    client.transition_model_version_stage("my-model", result.version, "Staging")

# NB:

# MLflow docs show model lifecycle APIs and GenAI-oriented data model extensions.

# Simple KServe inference YAML (serving a model from a container)

apiVersion: serving.kserve.io/v1beta1

kind: InferenceService

metadata:

  name: twtech-model

spec:

  predictor:

    containers:

      - image: docker.io/myorg/mymodel:latest

        name: kserve-container

        resources:

          limits:

            cpu: "1"

            memory: "2Gi"

NB:

  • KServe is lightweight and integrates with Knative/KEDA for autoscaling. 

9) Pitfalls & lessons from the field

    • Training-serving skew (features computed differently in production) is one of the most common production bugs — fix it with a feature store + identical transforms. 
    • Siloed teams cause models to die in staging — unify pipelines, treat models as artifacts and share ownership. 
    • Premature optimization: don’t self-host massive serving infra until you’ve validated load & cost; managed offerings are often cheaper/time-saving early on. 

10) Next steps

    • Build a concrete CI/CD pipeline (GitHub Actions + k8s job + MLflow register) with full YAML and scripts.
    • Create a minimal reproducible repo scaffold (training script, Dockerfile, MLflow logging, helm/KServe manifest).
    • Prototype an online feature store flow with Feast: sample code to register features and fetch online features during inference.
    • Design a monitoring dashboard (Prometheus + Grafana + sample metrics + alerting rules) and sample queries.

No comments:

Post a Comment

Amazon EventBridge | Overview.

Amazon EventBridge - Overview. Scope: Intro, Core Concepts, Key Benefits, Link to official documentation, What EventBridge  Really  Is (Deep...