AI / ML Ops Learning Path
Where DevOps meets Machine Learning — build, deploy, and operate ML systems at scale.
Why MLOps?
As AI becomes core to modern applications, the gap between building ML models and running them in production needs to be bridged. MLOps combines DevOps principles with ML-specific workflows.
Stage 1: Foundations
- Python for ML (NumPy, Pandas, Scikit-learn)
- Docker for ML workloads
- Git for ML projects (DVC, Git-LFS)
- Basic ML concepts (training, inference, evaluation)
Stage 2: ML Infrastructure
- Model versioning and experiment tracking (MLflow, Weights & Biases)
- Feature stores (Feast, Tecton)
- Data pipelines (Apache Airflow, Kubeflow Pipelines)
- GPU infrastructure and cloud ML services
Stage 3: Model Serving & Deployment
- Model serving frameworks (TensorFlow Serving, Triton, BentoML)
- Kubernetes for ML (Kubeflow, Seldon Core)
- A/B testing and canary deployments for models
- Edge deployment and model optimization
Stage 4: Production ML
- Monitoring model performance and data drift
- Automated retraining pipelines
- LLMOps — running large language models in production
- Cost optimization for ML workloads
- Responsible AI and governance
Key Tools
| Category | Tools |
|---|---|
| Experiment Tracking | MLflow, W&B, Neptune |
| Pipelines | Kubeflow, Airflow, Argo Workflows |
| Serving | Seldon, BentoML, TF Serving |
| Monitoring | Evidently, WhyLabs, Fiddler |
| LLMOps | vLLM, Ollama, LangChain |