Skip to main content

AI / ML Ops Learning Path

Where DevOps meets Machine Learning — build, deploy, and operate ML systems at scale.

Why MLOps?

As AI becomes core to modern applications, the gap between building ML models and running them in production needs to be bridged. MLOps combines DevOps principles with ML-specific workflows.

Stage 1: Foundations

  • Python for ML (NumPy, Pandas, Scikit-learn)
  • Docker for ML workloads
  • Git for ML projects (DVC, Git-LFS)
  • Basic ML concepts (training, inference, evaluation)

Stage 2: ML Infrastructure

  • Model versioning and experiment tracking (MLflow, Weights & Biases)
  • Feature stores (Feast, Tecton)
  • Data pipelines (Apache Airflow, Kubeflow Pipelines)
  • GPU infrastructure and cloud ML services

Stage 3: Model Serving & Deployment

  • Model serving frameworks (TensorFlow Serving, Triton, BentoML)
  • Kubernetes for ML (Kubeflow, Seldon Core)
  • A/B testing and canary deployments for models
  • Edge deployment and model optimization

Stage 4: Production ML

  • Monitoring model performance and data drift
  • Automated retraining pipelines
  • LLMOps — running large language models in production
  • Cost optimization for ML workloads
  • Responsible AI and governance

Key Tools

CategoryTools
Experiment TrackingMLflow, W&B, Neptune
PipelinesKubeflow, Airflow, Argo Workflows
ServingSeldon, BentoML, TF Serving
MonitoringEvidently, WhyLabs, Fiddler
LLMOpsvLLM, Ollama, LangChain