Kubernetes Workloads & Scheduling
Learn to master Kubernetes workload management, from Deployments and StatefulSets to advanced scheduling, auto-scaling, and health checks. This guide covers everything needed to deploy, manage, and optimize workloads in production Kubernetes clusters.
Table of Contents
- Deployments
- StatefulSets
- DaemonSets
- Jobs and CronJobs
- Scheduling
- Auto-Scaling
- Resource Management
- Health Checks
- Exercises
Workload Types Overview
Kubernetes offers multiple workload types for different application patterns:
Workload Types Diagram
Deployments
Deployments are the primary way to run stateless applications in Kubernetes. They manage Pods and ReplicaSets with automatic rollouts, rollbacks, and scaling capabilities.
Creating Deployments
A Deployment declaratively describes the desired state of your application. Here's a complete example with all common fields:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-app
namespace: default
labels:
app: nginx
version: v1
spec:
# Number of desired Pod replicas
replicas: 3
# Strategy for rolling out new versions
strategy:
type: RollingUpdate
rollingUpdate:
# Maximum number of Pods that can be created above the desired replicas
maxSurge: 1
# Maximum number of Pods that can be unavailable during the update
maxUnavailable: 0
# Selector to match Pods managed by this Deployment
selector:
matchLabels:
app: nginx
# Revision history limit
revisionHistoryLimit: 10
# Progressive deadline seconds for the Deployment
progressDeadlineSeconds: 300
# Pod template specification
template:
metadata:
labels:
app: nginx
version: v1
spec:
# Pod restart policy
restartPolicy: Always
containers:
- name: nginx
image: nginx:1.21
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 80
protocol: TCP
# Resource requests (guaranteed resources)
resources:
requests:
cpu: 100m
memory: 128Mi
# Resource limits (maximum resources)
limits:
cpu: 500m
memory: 512Mi
# Liveness probe (is the container alive?)
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
# Readiness probe (is the container ready to accept traffic?)
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
# Volume mounts
volumeMounts:
- name: config
mountPath: /etc/nginx/conf.d
readOnly: true
# Pod-level volumes
volumes:
- name: config
configMap:
name: nginx-config
Field Explanations
| Field | Purpose |
|---|---|
replicas | Desired number of Pod copies |
maxSurge | How many extra Pods allowed during update |
maxUnavailable | How many Pods can be down during update |
revisionHistoryLimit | Keep this many old ReplicaSets for rollbacks |
progressDeadlineSeconds | Timeout for Deployment to make progress |
imagePullPolicy | Always, IfNotPresent, or Never |
Deployment Strategies
RollingUpdate (Default)
Gradually replace old Pods with new ones:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Create 1 extra Pod during update
maxUnavailable: 0 # Never kill Pods, always have replicas available
Use case: Zero-downtime deployments where gradual rollout is acceptable.
Recreate
Delete all Pods, then create new ones:
strategy:
type: Recreate
Use case: Applications that cannot run multiple versions simultaneously (rare).
Rolling Updates
Trigger a rolling update by changing the image:
# Method 1: Set image directly
kubectl set image deployment/nginx-app nginx=nginx:1.22 \
--record=true
# Method 2: Edit the Deployment
kubectl edit deployment nginx-app
# Method 3: Apply updated YAML
kubectl apply -f deployment.yaml
# Monitor the rollout
kubectl rollout status deployment/nginx-app
# Watch progress in real-time
kubectl rollout status deployment/nginx-app --watch
Rollbacks
Kubernetes keeps a revision history of Deployments. Rollback to a previous version:
# View rollout history
kubectl rollout history deployment/nginx-app
# View details of a specific revision
kubectl rollout history deployment/nginx-app --revision=2
# Rollback to the previous revision
kubectl rollout undo deployment/nginx-app
# Rollback to a specific revision
kubectl rollout undo deployment/nginx-app --to-revision=2
# Pause rollout (useful for canary deployments)
kubectl rollout pause deployment/nginx-app
# Resume rollout
kubectl rollout resume deployment/nginx-app
Manual Scaling
Scale Deployments up or down:
# Scale to 5 replicas
kubectl scale deployment nginx-app --replicas=5
# Scale down to 2
kubectl scale deployment nginx-app --replicas=2
Blue/Green Deployment Pattern
Run two complete environments and switch traffic:
# Blue deployment (v1)
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
template:
metadata:
labels:
app: myapp
version: blue
spec:
containers:
- name: app
image: myapp:1.0
---
# Green deployment (v2)
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: green
template:
metadata:
labels:
app: myapp
version: green
spec:
containers:
- name: app
image: myapp:2.0
---
# Service routes to blue initially
apiVersion: v1
kind: Service
metadata:
name: app-service
spec:
selector:
app: myapp
version: blue # Points to blue
ports:
- port: 80
targetPort: 8080
To switch to green:
# Test green deployment
kubectl get pods -l version=green
# Switch traffic
kubectl patch service app-service -p '{"spec":{"selector":{"version":"green"}}}'
# If issues, switch back
kubectl patch service app-service -p '{"spec":{"selector":{"version":"blue"}}}'
Canary Deployment Pattern
Gradually shift traffic to a new version:
# Canary deployment (v2) with fewer replicas
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-canary
spec:
replicas: 1 # Start with 1 replica
selector:
matchLabels:
app: myapp
version: canary
template:
metadata:
labels:
app: myapp
version: canary
spec:
containers:
- name: app
image: myapp:2.0
---
# Main deployment (v1)
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-stable
spec:
replicas: 9 # More replicas
selector:
matchLabels:
app: myapp
version: stable
template:
metadata:
labels:
app: myapp
version: stable
spec:
containers:
- name: app
image: myapp:1.0
---
# Service sends traffic to both (10% to canary, 90% to stable)
apiVersion: v1
kind: Service
metadata:
name: app-service
spec:
selector:
app: myapp # Routes to both stable AND canary
ports:
- port: 80
targetPort: 8080
Monitor metrics, then gradually increase canary replicas:
kubectl scale deployment app-canary --replicas=3
kubectl scale deployment app-stable --replicas=7
# Continue until fully rolled out, then delete app-stable
StatefulSets
StatefulSets manage stateful applications that require stable, persistent identity. Use them for databases, message queues, and other workloads requiring ordered deployment and stable hostnames.
Key Characteristics
- Stable network identity: Pod names are predictable (e.g.,
mysql-0,mysql-1) - Ordered deployment: Pods are created/deleted in order (0 → 1 → 2...)
- Persistent storage: VolumeClaimTemplates create separate PVCs per Pod
- Headless Service: Required to provide stable DNS names
When to Use StatefulSets
- Databases: MySQL, PostgreSQL, MongoDB (with replicas)
- Message queues: Kafka, RabbitMQ
- Distributed systems: ZooKeeper, etcd
- Any app requiring stable identity and persistent storage
Headless Services
StatefulSets require a Headless Service (clusterIP: None) for stable DNS:
apiVersion: v1
kind: Service
metadata:
name: mysql
spec:
clusterIP: None # Headless Service
selector:
app: mysql
ports:
- port: 3306
targetPort: 3306
DNS names resolve to individual Pods:
mysql-0.mysql.default.svc.cluster.local→ Pod mysql-0mysql-1.mysql.default.svc.cluster.local→ Pod mysql-1
StatefulSet Example: MySQL Replica
---
# Headless Service
apiVersion: v1
kind: Service
metadata:
name: mysql
namespace: default
spec:
clusterIP: None
selector:
app: mysql
ports:
- port: 3306
name: mysql
---
# ConfigMap for MySQL replication config
apiVersion: v1
kind: ConfigMap
metadata:
name: mysql-config
data:
primary.cnf: |
[mysqld]
log_bin
server-id=1
replica.cnf: |
[mysqld]
server-id=2
---
# StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: mysql # Headless Service name
replicas: 2
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: password
ports:
- containerPort: 3306
name: mysql
# Volume mounts
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql
- name: mysql-config
mountPath: /etc/mysql/conf.d
# Readiness probe
readinessProbe:
exec:
command:
- sh
- -c
- mysql -uroot -p$MYSQL_ROOT_PASSWORD -e "SELECT 1"
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 1000m
memory: 2Gi
# Persistent volume claims for each Pod
volumeClaimTemplates:
- metadata:
name: mysql-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard
resources:
requests:
storage: 10Gi
Deploy MySQL and verify:
# Create secret for password
kubectl create secret generic mysql-secret \
--from-literal=password=secretpass
# Apply StatefulSet
kubectl apply -f mysql-statefulset.yaml
# Check Pods are created in order
kubectl get pods -l app=mysql
# Verify PVCs
kubectl get pvc
# Connect to primary (mysql-0)
kubectl exec -it mysql-0 -- mysql -uroot -psecretpass
# Connect to replica (mysql-1)
kubectl exec -it mysql-1 -- mysql -uroot -psecretpass
StatefulSet Example: Redis Cluster
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: redis
replicas: 3
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7.0
command:
- redis-server
- --cluster-enabled
- "yes"
- --cluster-config-file
- /data/nodes.conf
- --cluster-node-timeout
- "5000"
ports:
- containerPort: 6379
name: client
- containerPort: 16379
name: gossip
volumeMounts:
- name: redis-data
mountPath: /data
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 1Gi
volumeClaimTemplates:
- metadata:
name: redis-data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 5Gi
DaemonSets
DaemonSets ensure a Pod runs on every node (or a subset via selectors). Perfect for cluster-wide concerns like logging, monitoring, and networking.
Use Cases
- Log collectors: Fluentd, Logstash, Filebeat
- Monitoring agents: Prometheus node-exporter, Datadog agent
- Network plugins: Calico, Weave, Cilium
- Machine learning: GPU drivers, specialized runtime initialization
- Intrusion detection: Falco, osquery
DaemonSet Characteristics
- One Pod per node (by default)
- Automatically created on new nodes
- Automatically deleted when nodes are removed
- Respects node selectors, taints, and tolerations
DaemonSet Example: Prometheus Node Exporter
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
# Run on all nodes, including control plane (if tolerated)
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
hostNetwork: true # Access host network
hostPID: true # Access host PID namespace
containers:
- name: node-exporter
image: prom/node-exporter:latest
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)
ports:
- name: metrics
containerPort: 9100
hostPort: 9100
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
resources:
requests:
cpu: 100m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
Deploy and verify:
# Deploy DaemonSet
kubectl apply -f node-exporter-daemonset.yaml
# Verify one Pod per node
kubectl get pods -n monitoring -o wide
# View DaemonSet status
kubectl describe daemonset node-exporter -n monitoring
DaemonSet Example: Fluentd Log Collector
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd
namespace: logging
spec:
selector:
matchLabels:
app: fluentd
template:
metadata:
labels:
app: fluentd
spec:
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
serviceAccountName: fluentd
containers:
- name: fluentd
image: fluent/fluentd:v1.16
env:
- name: FLUENTD_UID
value: "0"
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch.logging.svc.cluster.local"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
volumeMounts:
- name: varlog
mountPath: /var/log
readOnly: true
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: config
mountPath: /fluentd/etc/fluent.conf
subPath: fluent.conf
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: config
configMap:
name: fluentd-config
Jobs and CronJobs
Jobs run one-off tasks to completion. CronJobs schedule Jobs to run periodically.
Jobs
A Job ensures one or more Pods successfully complete a task.
apiVersion: batch/v1
kind: Job
metadata:
name: data-backup
spec:
# Number of Pods that must successfully complete
completions: 1
# Number of Pods to run in parallel
parallelism: 1
# Number of retries before marking failed
backoffLimit: 3
# Time limit for the Job (seconds)
activeDeadlineSeconds: 600
# Keep successful/failed Pods for inspection
ttlSecondsAfterFinished: 3600
template:
spec:
# Don't restart the container after completion
restartPolicy: Never
containers:
- name: backup
image: backup-tool:latest
command:
- /bin/sh
- -c
- |
echo "Starting backup..."
mysqldump -h mysql-primary -u root -p$DB_PASSWORD \
--all-databases > /backup/dump.sql
echo "Backup complete!"
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-secret
key: password
volumeMounts:
- name: backup-storage
mountPath: /backup
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 2000m
memory: 2Gi
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: backup-pvc
Job commands:
# Create Job from YAML
kubectl apply -f job.yaml
# List Jobs
kubectl get jobs
# View Job details and events
kubectl describe job data-backup
# View Pod logs
kubectl logs -l job-name=data-backup
# Delete Job and Pods
kubectl delete job data-backup
# Watch Job progress
kubectl get jobs --watch
Parallel Jobs
Run multiple Pods in parallel:
spec:
completions: 10 # Need 10 successful completions
parallelism: 3 # Run 3 Pods at a time
CronJobs
Schedule Jobs to run periodically using cron syntax:
apiVersion: batch/v1
kind: CronJob
metadata:
name: database-cleanup
spec:
# Cron schedule (minute hour day-of-month month day-of-week)
schedule: "0 2 * * *" # 2 AM daily
# How many completed Jobs to keep
successfulJobsHistoryLimit: 3
# How many failed Jobs to keep
failedJobsHistoryLimit: 1
# Concurrency policy
concurrencyPolicy: Forbid # Don't run if previous is still active
# Time to consider Job missed if not started
startingDeadlineSeconds: 300
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: cleanup
image: database-tools:latest
command:
- /bin/sh
- -c
- |
echo "Running database cleanup..."
mysql -h mysql-primary -u root -p$DB_PASSWORD \
-e "DELETE FROM logs WHERE created_at < DATE_SUB(NOW(), INTERVAL 30 DAY);"
echo "Cleanup complete!"
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-secret
key: password
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
Cron Schedule Format
minute (0-59)
| hour (0-23)
| | day-of-month (1-31)
| | | month (1-12)
| | | | day-of-week (0-6, 0=Sunday)
| | | | |
* * * * *
Examples:
"0 0 * * *" # Midnight daily
"0 */6 * * *" # Every 6 hours
"*/5 * * * *" # Every 5 minutes
"0 9-17 * * 1-5" # 9 AM-5 PM, weekdays only
"0 0 1 * *" # First day of month at midnight
CronJob commands:
# List CronJobs
kubectl get cronjobs
# View next scheduled run
kubectl describe cronjob database-cleanup
# Manually trigger a CronJob
kubectl create job --from=cronjob/database-cleanup manual-cleanup
# Check completed Jobs
kubectl get jobs
Scheduling
Kubernetes offers powerful mechanisms to control which nodes Pods run on, ensuring optimal resource utilization and application requirements.
Node Selectors
The simplest way to constrain Pods to specific nodes:
spec:
nodeSelector:
disktype: ssd # Only run on nodes with this label
gpu: "true" # Requires GPU nodes
First, label nodes:
# Label a node
kubectl label nodes node-1 disktype=ssd
kubectl label nodes node-2 gpu=true
# Verify labels
kubectl get nodes --show-labels
Then schedule to them:
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
nodeSelector:
disktype: ssd
containers:
- name: nginx
image: nginx:latest
Node Affinity
More expressive scheduling with required (must match) and preferred (best effort) rules:
spec:
affinity:
nodeAffinity:
# Must match to be scheduled
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
- key: node-type
operator: In
values:
- compute
- gpu
# Prefer these nodes (soft constraint)
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: zone
operator: In
values:
- us-west-2a
- weight: 50
preference:
matchExpressions:
- key: disktype
operator: In
values:
- ssd
Operators for matchExpressions:
In: Value in the supplied listNotIn: Value not in the supplied listExists: Key exists (values ignored)DoesNotExist: Key doesn't existGt: Value greater than (numeric)Lt: Value less than (numeric)
Pod Affinity and Anti-Affinity
Schedule Pods relative to other Pods:
spec:
affinity:
# Pod affinity: co-locate with other Pods
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- database
# Co-locate in same topology (usually same node)
topologyKey: kubernetes.io/hostname
# Pod anti-affinity: spread across nodes
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- cache
topologyKey: kubernetes.io/hostname
Use cases:
- Pod affinity: App and database on same node for performance
- Pod anti-affinity: Replicas on different nodes for high availability
Taints and Tolerations
Taints prevent Pods from being scheduled on nodes; tolerations allow Pods to tolerate taints.
Taints: Applied to nodes to repel Pods
# Add a taint
kubectl taint nodes node-1 key=value:effect
# Effects: NoSchedule, NoExecute, PreferNoSchedule
# NoSchedule: Don't schedule new Pods
# NoExecute: Evict existing Pods
# PreferNoSchedule: Try not to schedule
Example: GPU nodes
# Taint GPU nodes
kubectl taint nodes gpu-node-1 gpu=true:NoSchedule
Tolerations: Allow Pods to ignore taints
spec:
tolerations:
- key: gpu
operator: Equal
value: "true"
effect: NoSchedule
containers:
- name: gpu-app
image: cuda-app:latest
Example: Control plane nodes
# Control plane nodes have this taint by default
kubectl taint nodes control-plane node-role.kubernetes.io/control-plane=:NoSchedule
To allow Pods on control plane:
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
Priority and Preemption
Assign priority to Pods; high-priority Pods can evict low-priority ones:
# Define PriorityClass
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000
globalDefault: false
description: "High priority for critical workloads"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority
value: 100
globalDefault: false
description: "Low priority for batch jobs"
---
# Use in Pod/Deployment
spec:
priorityClassName: high-priority
Resource Requests and Limits
Declare resource needs and constraints:
resources:
# Guaranteed resources (reserved on node)
requests:
cpu: 250m # 0.25 CPU cores
memory: 512Mi # 512 megabytes
# Hard limits (Pod killed if exceeded)
limits:
cpu: 1000m # 1 CPU core
memory: 1Gi # 1 gigabyte
CPU units:
1000m= 1 CPU core100m= 0.1 CPU cores0.5= 500m
Memory units:
Ki,Mi,Gi(powers of 1024)K,M,G(powers of 1000)
Scheduling decisions use requests; limits prevent resource hogging.
Auto-Scaling
Automatically scale workloads based on demand, cost, and events.
Horizontal Pod Autoscaler (HPA)
Scale the number of Pod replicas based on metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-app
minReplicas: 2
maxReplicas: 10
# Scaling policies
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100 # Double the replicas
periodSeconds: 60
- type: Pods
value: 4 # Or add 4 Pods
periodSeconds: 60
selectPolicy: Max # Choose whichever scales more
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50 # Reduce by 50%
periodSeconds: 120
metrics:
# CPU utilization target
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale up when >70%
# Memory utilization target
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 # Scale up when >80%
Deploy HPA:
# Apply HPA
kubectl apply -f hpa.yaml
# View HPA status
kubectl get hpa
kubectl describe hpa app-hpa
# Watch scaling in action
kubectl get hpa --watch
# Monitor Pod scaling
kubectl get pods --watch
HPA with Custom Metrics
Scale based on custom application metrics (requires metrics server and Prometheus):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-custom-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app
minReplicas: 2
maxReplicas: 20
metrics:
# Custom metric from Prometheus
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
# External metric (e.g., queue length)
- type: External
external:
metric:
name: kafka_consumer_lag
selector:
matchLabels:
topic: events
target:
type: AverageValue
averageValue: "30"
Vertical Pod Autoscaler (VPA)
Automatically adjust resource requests/limits based on actual usage:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-app
updatePolicy:
# Modes: Auto (restart if needed), Recreate, Initial, Off
updateMode: Auto
resourcePolicy:
containerPolicies:
- containerName: nginx
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2000m
memory: 4Gi
controlledResources:
- cpu
- memory
VPA modes:
Off: Provide recommendations onlyInitial: Set on Pod creation onlyRecreate: Update by restarting PodsAuto: Update with minimal disruption
Cluster Autoscaler
Automatically add/remove nodes based on Pod scheduling needs:
# Install Cluster Autoscaler (example for AWS)
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
# View logs
kubectl logs -n kube-system deployment/cluster-autoscaler | tail -20
KEDA (Kubernetes Event Driven Autoscaling)
Scale based on external events and metrics:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-scaler
spec:
scaleTargetRef:
name: consumer-app
minReplicaCount: 1
maxReplicaCount: 100
triggers:
- type: kafka
metadata:
brokers: kafka-broker-1:9092,kafka-broker-2:9092
topic: events
consumerGroup: my-consumer-group
lagThreshold: "100" # Scale when lag > 100
offsetResetPolicy: latest
Resource Management
Control how much CPU and memory Pods can use, and how resources are shared.
Resource Requests and Limits
We covered requests/limits in scheduling, but let's explore QoS classes:
# Guaranteed QoS (requests = limits)
spec:
containers:
- name: app
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 500m # Same as requests
memory: 512Mi # Same as requests
---
# Burstable QoS (requests < limits)
spec:
containers:
- name: app
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 1000m # More than requests
memory: 1Gi
---
# BestEffort QoS (no requests/limits)
spec:
containers:
- name: app
# No resource declarations
QoS Classes:
- Guaranteed: No eviction (has highest priority)
- Burstable: Evicted if node memory/CPU is scarce
- BestEffort: Evicted first (lowest priority)
LimitRanges
Enforce min/max resources per Pod or container:
apiVersion: v1
kind: LimitRange
metadata:
name: resource-limits
namespace: default
spec:
limits:
# Per container
- type: Container
min:
cpu: 50m
memory: 64Mi
max:
cpu: 2000m
memory: 2Gi
default:
cpu: 500m # Default limit if not specified
memory: 512Mi
defaultRequest:
cpu: 100m # Default request if not specified
memory: 128Mi
# Per Pod
- type: Pod
min:
cpu: 100m
memory: 128Mi
max:
cpu: 4000m
memory: 4Gi
ResourceQuotas
Limit total resource consumption per namespace:
apiVersion: v1
kind: ResourceQuota
metadata:
name: quota-team-a
namespace: team-a
spec:
hard:
requests.cpu: "10" # Total CPU requests
requests.memory: "20Gi" # Total memory requests
limits.cpu: "20" # Total CPU limits
limits.memory: "40Gi" # Total memory limits
pods: "100" # Max Pods in namespace
services.loadbalancers: "2" # Max LoadBalancer services
persistentvolumeclaims: "5" # Max PVCs
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values: ["high-priority"] # Only applies to high-priority Pods
Check quota usage:
kubectl describe resourcequota -n team-a
kubectl get resourcequota -n team-a
Health Checks
Kubernetes uses probes to monitor Pod health and make scheduling decisions.
Liveness Probes
Determines if a container is alive. If failing, Kubernetes restarts it:
spec:
containers:
- name: app
# HTTP liveness probe
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 30 # Wait 30s before first check
periodSeconds: 10 # Check every 10 seconds
timeoutSeconds: 5 # Timeout after 5 seconds
successThreshold: 1 # 1 success = healthy
failureThreshold: 3 # 3 failures = restart
# TCP liveness probe
livenessProbe:
tcpSocket:
port: 5432
initialDelaySeconds: 20
periodSeconds: 10
failureThreshold: 3
# Exec liveness probe
livenessProbe:
exec:
command:
- /bin/sh
- -c
- redis-cli ping | grep PONG
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 2
Readiness Probes
Determines if a Pod is ready to accept traffic. Failing readiness removes the Pod from the Service:
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
Liveness vs Readiness:
- Liveness: "Is the container alive?" → Restart if failing
- Readiness: "Is it ready for traffic?" → Remove from Service if failing
Startup Probes
Disables liveness/readiness checks until the app starts (useful for slow-starting apps):
startupProbe:
httpGet:
path: /startup
port: 8080
failureThreshold: 30 # Allow 30 * 5s = 150s to start
periodSeconds: 5
Once startup probe succeeds, liveness/readiness begin.
Complete Health Check Example
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
replicas: 3
selector:
matchLabels:
app: app
template:
metadata:
labels:
app: app
spec:
containers:
- name: app
image: myapp:latest
ports:
- containerPort: 8080
# Startup probe: wait for app to initialize
startupProbe:
httpGet:
path: /startup
port: 8080
initialDelaySeconds: 0
periodSeconds: 10
failureThreshold: 30
# Liveness probe: restart if hanging
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
# Readiness probe: remove from traffic if not ready
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
Monitor health checks:
# View Pod events (includes probe failures)
kubectl describe pod app-xyz
# View logs
kubectl logs app-xyz
# Check if Pod is ready
kubectl get pods -o wide
# Watch probe failures
kubectl get events -n default --sort-by='.lastTimestamp'
Exercises
Master Kubernetes workloads with these hands-on exercises.
Exercise 1: Deploy Stateless App with Rolling Updates and Rollback
Objective: Deploy an app, perform a rolling update, then rollback to the previous version.
Steps:
- Create a Deployment with 3 replicas:
cat > app-deployment.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: echo-app
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: echo
template:
metadata:
labels:
app: echo
spec:
containers:
- name: echo
image: hashicorp/http-echo:0.2.3
args:
- "-text=Version 1.0"
ports:
- containerPort: 5678
livenessProbe:
httpGet:
path: /
port: 5678
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /
port: 5678
initialDelaySeconds: 2
periodSeconds: 5
resources:
requests:
cpu: 100m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
EOF
kubectl apply -f app-deployment.yaml
- Verify rollout:
kubectl get pods -l app=echo
kubectl get deployment echo-app
- Update to version 2.0:
kubectl set image deployment/echo-app \
echo=hashicorp/http-echo:0.2.3 \
--record=true
# Wait for new version arg
kubectl patch deployment echo-app -p '{"spec":{"template":{"spec":{"containers":[{"name":"echo","args":["-text=Version 2.0"]}]}}}}'
# Monitor rollout
kubectl rollout status deployment/echo-app
- Check rollout history:
kubectl rollout history deployment/echo-app
- Rollback to previous version:
kubectl rollout undo deployment/echo-app
kubectl rollout status deployment/echo-app
- Verify Pods are running the original version:
kubectl get pods -l app=echo
Exercise 2: Create a CronJob That Runs Every 5 Minutes
Objective: Create a CronJob that executes a task every 5 minutes and verify execution.
Steps:
- Create the CronJob:
cat > backup-cronjob.yaml <<EOF
apiVersion: batch/v1
kind: CronJob
metadata:
name: log-timestamp
spec:
schedule: "*/5 * * * *" # Every 5 minutes
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: timestamp
image: busybox:latest
command:
- /bin/sh
- -c
- echo "Timestamp: $(date)" > /tmp/log.txt && cat /tmp/log.txt
resources:
requests:
cpu: 100m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
EOF
kubectl apply -f backup-cronjob.yaml
- Verify the CronJob is created:
kubectl get cronjobs
kubectl describe cronjob log-timestamp
- Wait for a Job to be created (within 5 minutes):
# Watch for Jobs
kubectl get jobs --watch
# Once a Job is created, check its status
kubectl get jobs -l cronjob-name=log-timestamp
- View Job logs:
# Get the first completed Job
kubectl logs -l cronjob-name=log-timestamp --tail=10
# Or get logs from a specific Pod
kubectl get pods -l cronjob-name=log-timestamp
kubectl logs <pod-name>
- After 10+ minutes, verify multiple Jobs:
kubectl get jobs -l cronjob-name=log-timestamp
- Suspend the CronJob:
kubectl patch cronjob log-timestamp -p '{"spec":{"suspend":true}}'
# Verify it's suspended
kubectl get cronjobs
- Resume:
kubectl patch cronjob log-timestamp -p '{"spec":{"suspend":false}}'
Exercise 3: Set Up HPA for CPU-Based Auto-Scaling
Objective: Create a Deployment with HPA that scales based on CPU utilization.
Steps:
- Ensure metrics-server is installed:
# Check if metrics-server is running
kubectl get deployment metrics-server -n kube-system
# If not, install it
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
- Create a Deployment with CPU requests:
cat > cpu-app-deployment.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: load-test-app
spec:
replicas: 2
selector:
matchLabels:
app: load-test
template:
metadata:
labels:
app: load-test
spec:
containers:
- name: app
image: polinux/stress
command:
- stress
args:
- "--cpu=1" # Stress 1 CPU
- "--verbose"
resources:
requests:
cpu: 100m # Request 100m CPU
memory: 64Mi
limits:
cpu: 500m
memory: 256Mi
EOF
kubectl apply -f cpu-app-deployment.yaml
- Wait for metrics to be available (30-60 seconds):
# Check metrics
kubectl top pods -l app=load-test
kubectl top nodes
- Create HPA:
cat > hpa.yaml <<EOF
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: load-test-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: load-test-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50 # Scale when CPU > 50%
EOF
kubectl apply -f hpa.yaml
- Monitor HPA status:
# Check HPA
kubectl get hpa -w
# View detailed status
kubectl describe hpa load-test-hpa
- Watch Pods scale up (within 1-2 minutes):
kubectl get pods -l app=load-test --watch
- View current metrics:
kubectl top pods -l app=load-test
- Stop the load:
kubectl set env deployment/load-test-app STRESS_CPU=0
- Watch Pods scale down (after ~5 minutes of low usage):
kubectl get pods -l app=load-test --watch
- Verify HPA returned to minReplicas:
kubectl get hpa
Summary
You've mastered Kubernetes workloads and scheduling:
- Deployments: Stateless apps with rolling updates, rollbacks, and scaling
- StatefulSets: Stateful apps with stable identity and persistent storage
- DaemonSets: Cluster-wide workloads like logging and monitoring
- Jobs/CronJobs: One-off tasks and scheduled jobs
- Scheduling: Node selectors, affinity, taints, tolerations, and priority
- Auto-Scaling: HPA, VPA, Cluster Autoscaler, and KEDA
- Resource Management: Requests, limits, QoS classes, and quotas
- Health Checks: Liveness, readiness, and startup probes
These concepts form the foundation for production Kubernetes deployments. Practice the exercises, then explore advanced patterns like GitOps, service meshes, and advanced observability.