AI / ML Ops Learning Path

Where DevOps meets Machine Learning — build, deploy, and operate ML systems at scale.

Why MLOps?

As AI becomes core to modern applications, the gap between building ML models and running them in production needs to be bridged. MLOps combines DevOps principles with ML-specific workflows.

Stage 1: Foundations

Python for ML (NumPy, Pandas, Scikit-learn)
Docker for ML workloads
Git for ML projects (DVC, Git-LFS)
Basic ML concepts (training, inference, evaluation)

Stage 2: ML Infrastructure

Model versioning and experiment tracking (MLflow, Weights & Biases)
Feature stores (Feast, Tecton)
Data pipelines (Apache Airflow, Kubeflow Pipelines)
GPU infrastructure and cloud ML services

Stage 3: Model Serving & Deployment

Model serving frameworks (TensorFlow Serving, Triton, BentoML)
Kubernetes for ML (Kubeflow, Seldon Core)
A/B testing and canary deployments for models
Edge deployment and model optimization

Stage 4: Production ML

Monitoring model performance and data drift
Automated retraining pipelines
LLMOps — running large language models in production
Cost optimization for ML workloads
Responsible AI and governance

Key Tools

Category	Tools
Experiment Tracking	MLflow, W&B, Neptune
Pipelines	Kubeflow, Airflow, Argo Workflows
Serving	Seldon, BentoML, TF Serving
Monitoring	Evidently, WhyLabs, Fiddler
LLMOps	vLLM, Ollama, LangChain

Why MLOps?​

Stage 1: Foundations​

Stage 2: ML Infrastructure​

Stage 3: Model Serving & Deployment​

Stage 4: Production ML​

Key Tools​