Docker Security & Production

Running Docker securely in production requires a comprehensive approach to image security, runtime hardening, monitoring, and deployment best practices. This guide covers defensive strategies, monitoring solutions, and practical CI/CD patterns for production environments.

Security

Container Security Overview

Docker containers introduce a unique attack surface compared to traditional applications. While containerization provides process isolation, the shared kernel and dynamic nature of container orchestration require specific security controls.

Threat Model:

Image Supply Chain: Compromised or outdated base images
Runtime Exploits: Kernel vulnerabilities affecting the host
Container Escape: Privilege escalation allowing container-to-host access
Lateral Movement: One compromised container accessing others
Data Exposure: Secrets in images, unencrypted storage/transit

Defense in Depth: Security Layers

Defense in Depth Strategy:

Secure the image supply chain
Harden runtime configuration
Implement network isolation
Monitor container behavior
Apply principle of least privilege across all layers

Image Security

Image security is the foundation of container security. Compromised or vulnerable images create security debt that persists across every container instance.

Use Official and Verified Images:

Pull from the Docker Official Images library (distinguished by blue badges)
Verify digital signatures and image digests
Check image provenance and maintenance status
Review Dockerfile source code when possible

Example - Verifying Image Digest:

# Pull image with digest verification
docker pull ubuntu@sha256:a7b8ffd33d65d7f0f2d2c0d2d0d0d0d0d0d0d0d0d0d0d0d0d0d0d0d0d0d0d0

# List images with digests
docker images --digests

Vulnerability Scanning:

Modern container registries and tools provide vulnerability scanning capabilities:

Docker Scout: Native Docker platform for scanning images and providing remediation guidance
```
docker scout cves myimage:latest
docker scout recommendations myimage:latest
```

Trivy: Open-source scanner detecting vulnerabilities and misconfigurations

trivy image myimage:latest
trivy image --severity HIGH,CRITICAL myimage:latest

Snyk: Developer-focused vulnerability scanning with fix recommendations
```
snyk container test myimage:latest
```

Avoid the Latest Tag:

# Bad - unpredictable base image version
FROM ubuntu:latest

# Good - specific stable version
FROM ubuntu:24.04

Pinning versions ensures consistency and allows testing before upgrades.

Minimal Base Images:

Smaller images reduce attack surface and improve deployment speed:

Alpine Linux (~5 MB): Minimal Linux distribution

FROM alpine:3.19
RUN apk add --no-cache python3

Distroless Images (Google): Contains only application + runtime

FROM gcr.io/distroless/python3-debian12
COPY app.py .
CMD ["app.py"]

Scratch Image (~0 MB): Empty base for statically compiled binaries
```
FROM scratch
COPY app /
ENTRYPOINT ["/app"]
```

Build Security

Secrets Management - Never Store Secrets in Images:

# Bad - secrets in image layer
FROM ubuntu:24.04
RUN echo "AWS_KEY=AKIAIOSFODNN7EXAMPLE" > /app/.env

# Good - secrets via build args (only available during build)
ARG DATABASE_PASSWORD
RUN echo "Password accepted during build"

# Better - secrets via Docker BuildKit secrets
FROM ubuntu:24.04
RUN --mount=type=secret,id=db_password \
    cat /run/secrets/db_password > /tmp/setup.sql

Multi-Stage Builds:

Reduce final image size and exclude build tools:

# Build stage
FROM golang:1.21 AS builder
WORKDIR /src
COPY . .
RUN go build -o app .

# Runtime stage - only includes compiled binary
FROM alpine:3.19
RUN apk add --no-cache ca-certificates
COPY --from=builder /src/app /usr/local/bin/
ENTRYPOINT ["app"]

Multi-stage builds eliminate build dependencies from final images.

.dockerignore:

Exclude unnecessary files to reduce context size and prevent secrets from being included:

.git
.env
.env.local
*.log
node_modules
.npm
.cache
__pycache__
*.pyc
.DS_Store
.vscode
.idea
dist/
build/
coverage/
.dockerignore
Dockerfile
.gitignore

Pin Dependency Versions:

# Bad - unpredictable versions
RUN apt-get update && apt-get install -y curl

# Good - specific versions
RUN apt-get update && apt-get install -y curl=7.68.0-1ubuntu1.17

Runtime Security

Run as Non-Root User:

Running containers as root increases blast radius if the container is compromised:

# Create dedicated user
RUN groupadd -r appuser && useradd -r -g appuser appuser
USER appuser
COPY --chown=appuser:appuser . .

# Verify user at runtime
docker run -u 1000:1000 myapp
docker exec <container> whoami

Read-Only Filesystems:

Prevent runtime modifications by mounting root filesystem as read-only:

docker run --read-only myapp

For containers requiring temporary files, use tmpfs:

docker run --read-only --tmpfs /tmp:size=100m myapp

Drop Unnecessary Capabilities:

Linux capabilities provide granular privilege control. Drop what's not needed:

# Drop all capabilities, add only required ones
docker run --cap-drop ALL --cap-add NET_BIND_SERVICE myapp

# View default capabilities
docker run --rm alpine grep CapEff /proc/self/status

Common capability mappings:

NET_BIND_SERVICE: Bind to ports < 1024
NET_RAW: Raw socket creation
SYS_ADMIN: Administrative operations
SETFCAP: Set file capabilities

Limit Resources:

Prevent resource exhaustion and container escape attempts:

# CPU limits (2 CPUs)
docker run --cpus=2 myapp

# Memory limits (512 MB)
docker run --memory=512m myapp

# Memory + swap limits
docker run --memory=512m --memory-swap=1g myapp

# Combined resource limits
docker run --cpus=1.5 --memory=512m --oom-kill-disable=false myapp

Seccomp Profiles:

Seccomp restricts system calls a container can invoke:

# Use default seccomp profile
docker run --security-opt seccomp=default myapp

# Use custom seccomp profile
docker run --security-opt seccomp=/path/to/profile.json myapp

Example seccomp profile (whitelist common syscalls):

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "defaultErrnoRet": 1,
  "archMap": [{"architecture": "SCMP_ARCH_X86_64"}],
  "syscalls": [
    {
      "names": ["read", "write", "open", "close", "stat", "fstat"],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

AppArmor:

Provide mandatory access control for containers:

# Run with AppArmor profile
docker run --security-opt apparmor=docker-default myapp

# Create custom AppArmor profile
docker run --security-opt apparmor=/path/to/profile myapp

Network Security

Use User-Defined Networks:

Default bridge network exposes all containers to each other via DNS discovery:

# Create isolated network
docker network create --driver bridge isolated-net

# Run containers on isolated network
docker run --network isolated-net myapp
docker run --network isolated-net mydb

# Containers cannot discover services outside this network

Don't Expose Unnecessary Ports:

# Bad - exposes all ports
EXPOSE 0-65535

# Good - only required ports
EXPOSE 8080

# Bad - publishes to all interfaces
docker run -p 5432:5432 postgres

# Good - bind to localhost only
docker run -p 127.0.0.1:5432:5432 postgres

Use TLS for Docker Daemon:

Secure communication between client and daemon:

# Generate certificates
openssl genrsa -out ca-key.pem 2048
openssl req -new -x509 -days 365 -key ca-key.pem -out ca.pem

# Configure daemon to use TLS
# /etc/docker/daemon.json
{
  "tls": true,
  "tlscert": "/etc/docker/server.pem",
  "tlskey": "/etc/docker/server-key.pem",
  "tlscacert": "/etc/docker/ca.pem"
}

# Connect with TLS
docker --tlsverify --tlscacert=ca.pem --tlscert=cert.pem --tlskey=key.pem -H=tcp://hostname:2376

Docker Security Checklist

Comprehensive checklist for hardening containers in production:

Monitoring

Container Monitoring Challenges

Container monitoring differs from traditional VM/server monitoring due to:

Ephemeral Nature: Containers are created and destroyed frequently, making historical correlation difficult
Scale: Orchestrated environments run hundreds or thousands of containers
Dynamic Environments: Container IPs, ports, and locations change constantly
Resource Sharing: Multiple containers share kernel and resources, requiring careful metric interpretation
Log Volume: Container logs grow rapidly at scale without proper rotation and centralization

Effective monitoring requires container-aware tooling that understands orchestration platforms and container lifecycle.

Monitoring Tools

Docker Stats (Native):

Built-in container statistics from the Docker daemon:

# Real-time container stats
docker stats

# Specific container
docker stats myapp

# No-stream mode (single snapshot)
docker stats --no-stream myapp

# Show container names
docker stats --format "{{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}"

Output includes CPU %, memory usage, network I/O, and block I/O.

cAdvisor (Container Advisor):

Google's container monitoring tool providing detailed resource metrics:

# Run cAdvisor (exposes metrics on port 8080)
docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:ro \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --detach=true \
  --name=cadvisor \
  --publish=8080:8080 \
  gcr.io/cadvisor/cadvisor:latest

Access metrics at http://localhost:8080

Prometheus + Grafana:

Industry-standard monitoring stack for container environments:

# docker-compose.yml
version: '3.8'
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    ports:
      - "9090:9090"
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana_data:/var/lib/grafana

volumes:
  prometheus_data:
  grafana_data:

# prometheus.yml - Scrape Docker stats
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['localhost:8080']

Datadog:

SaaS monitoring platform with native Docker integration:

# Run Datadog agent
docker run -d --name datadog-agent \
  -e DD_API_KEY=<your-api-key> \
  -e DD_SITE=datadoghq.com \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  datadog/agent:latest

Sysdig:

Container security and monitoring focused on runtime behavior:

# Install Sysdig
docker run -it --rm \
  --volume /var/run/docker.sock:/host/var/run/docker.sock \
  --volume /proc:/host/proc:ro \
  sysdig/sysdig

ELK Stack (Elasticsearch, Logstash, Kibana):

Distributed logging platform for container log aggregation:

# docker-compose.yml
version: '3.8'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.0.0
    environment:
      - xpack.security.enabled=false
      - discovery.type=single-node
    ports:
      - "9200:9200"

  kibana:
    image: docker.elastic.co/kibana/kibana:8.0.0
    ports:
      - "5601:5601"
    depends_on:
      - elasticsearch

  logstash:
    image: docker.elastic.co/logstash/logstash:8.0.0
    volumes:
      - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
    depends_on:
      - elasticsearch

Key Metrics to Monitor

Essential metrics for production container health:

CPU Metrics:

container_cpu_user_seconds_total: User-space CPU time
container_cpu_system_seconds_total: Kernel-space CPU time
container_cpu_usage_seconds_total: Total CPU time
CPU percentage as percentage of allocated limit
Throttling events (container hitting CPU limit)

Memory Metrics:

container_memory_usage_bytes: Total memory used
container_memory_max_usage_bytes: Peak memory usage
Memory percentage as percentage of limit
Out-of-Memory (OOM) events
Memory page faults (major and minor)

Network I/O:

container_network_receive_bytes_total: Bytes received
container_network_transmit_bytes_total: Bytes sent
Packets received/transmitted
Network errors and dropped packets

Disk I/O:

container_fs_read_bytes_total: Bytes read from storage
container_fs_write_bytes_total: Bytes written to storage
I/O operations per second
Disk usage and capacity percentage

Container Lifecycle:

Container count (running, stopped, total)
Container creation/deletion rate
Container restart count and frequency
Container uptime

Health Status:

HEALTHCHECK success/failure rate
Health check latency
Number of unhealthy containers

Alert thresholds should be adjusted based on application requirements and expected behavior.

Logging Best Practices

Log to stdout/stderr:

Containers should output logs to standard streams for Docker to capture:

# Node.js application
FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm install
# Logs go to stdout
CMD ["node", "app.js"]

// Log to stdout (not files)
console.log('Application started on port 3000');
console.error('Error occurred:', error);

Docker Logging Drivers:

Configure how Docker captures and stores container logs:

# View default logging driver
docker info | grep "Logging Driver"

# Set logging driver in run command
docker run --log-driver json-file myapp
docker run --log-driver syslog myapp
docker run --log-driver splunk myapp

Common logging drivers:

Driver	Use Case
`json-file`	Default, stores logs in JSON format on host
`syslog`	Sends logs to syslog server
`fluentd`	Forwards to Fluentd for processing
`awslogs`	AWS CloudWatch Logs
`splunk`	Splunk HTTP Event Collector
`gcplogs`	Google Cloud Logging

Centralized Logging Configuration:

# daemon.json - Docker daemon default logging driver
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3",
    "labels": "service_name,service_version"
  }
}

# Per-container override
docker run \
  --log-driver fluentd \
  --log-opt fluentd-address=localhost:24224 \
  --log-opt tag=myapp \
  myapp

Log Rotation:

Prevent log files from consuming excessive disk space:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3",
    "compress": "true"
  }
}

This keeps maximum 30 MB of logs per container (3 files × 10 MB).

Production Best Practices

CI/CD with Docker

Build → Test → Push → Deploy Pipeline:

Typical CI/CD workflow for Docker:

# .github/workflows/docker-build.yml (GitHub Actions example)
name: Docker Build and Push

on:
  push:
    branches: [ main ]
    tags: [ 'v*' ]

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      # Checkout code
      - uses: actions/checkout@v3
        with:
          fetch-depth: 0  # For git SHA

      # Build image
      - name: Build Docker image
        run: |
          docker build -t myregistry.azurecr.io/myapp:${{ github.sha }} .
          docker build -t myregistry.azurecr.io/myapp:latest .

      # Scan image for vulnerabilities
      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: myregistry.azurecr.io/myapp:latest
          format: 'sarif'
          output: 'trivy-results.sarif'

      # Test image
      - name: Test Docker image
        run: |
          docker run --rm myregistry.azurecr.io/myapp:${{ github.sha }} npm test

      # Login to registry
      - name: Login to registry
        run: |
          echo "${{ secrets.REGISTRY_PASSWORD }}" | docker login \
            -u ${{ secrets.REGISTRY_USERNAME }} \
            --password-stdin myregistry.azurecr.io

      # Push image
      - name: Push to registry
        run: |
          docker push myregistry.azurecr.io/myapp:${{ github.sha }}
          docker push myregistry.azurecr.io/myapp:latest

      # Deploy (trigger deployment system)
      - name: Deploy
        run: |
          curl -X POST https://deploy.example.com/trigger \
            -H "Authorization: Bearer ${{ secrets.DEPLOY_TOKEN }}" \
            -d "image_tag=${{ github.sha }}"

Image Tagging Strategy:

Use semantic versioning combined with git SHA for traceability:

# Tag with version and git SHA
COMMIT_SHA=$(git rev-parse --short HEAD)
VERSION=$(git describe --tags --always)

docker build -t myapp:${VERSION}-${COMMIT_SHA} .
docker build -t myapp:${VERSION} .
docker build -t myapp:latest .

# Tag strategy: v1.2.3-abc1234 (version-commit)
# This allows rollback to specific build while tracking version history

Image Registry Management

Private Container Registries:

Protect proprietary images using private registries:

Registry	Platform	Features
Harbor	Self-hosted	Vulnerability scanning, RBAC, image signing
ECR	AWS	Integrated with IAM, lifecycle policies
ACR	Azure	RBAC, webhook integration, image security
GCR	Google Cloud	Integration with Cloud Build, Cloud Run
Quay	Red Hat/Self-hosted	Security scanning, RBAC, CDN

Image Retention Policies:

Prevent registry bloat and reduce storage costs:

# Example: Harbor lifecycle policy
# Keep last 10 images, delete images older than 30 days

# AWS ECR lifecycle policy
{
  "rules": [
    {
      "rulePriority": 1,
      "description": "Keep last 10 images",
      "selection": {
        "tagStatus": "any",
        "countType": "imageCountMoreThan",
        "countNumber": 10
      },
      "action": {
        "type": "expire"
      }
    },
    {
      "rulePriority": 2,
      "description": "Delete untagged images older than 30 days",
      "selection": {
        "tagStatus": "untagged",
        "countType": "sinceImagePushed",
        "countUnit": "days",
        "countNumber": 30
      },
      "action": {
        "type": "expire"
      }
    }
  ]
}

Resource Management

CPU Limits:

Prevent single container from starving other containers:

# Limit to 2 CPUs
docker run --cpus=2 myapp

# Limit to 0.5 CPUs (50% of single CPU)
docker run --cpus=0.5 myapp

# CPU shares (relative priority, default 1024)
docker run --cpu-shares=512 myapp  # Half priority

Memory Limits:

Prevent OOM (Out-Of-Memory) killer from terminating containers unexpectedly:

# Hard limit: 512 MB
docker run --memory=512m myapp

# Soft limit + hard limit
docker run --memory=512m --memory-reservation=256m myapp

# Disable OOM killer (not recommended)
docker run --memory=512m --oom-kill-disable myapp

# Monitor memory usage
docker stats --no-stream myapp

OOM Killer Behavior:

# Default: kill container if it exceeds memory limit
docker run --memory=512m myapp

# Adjust OOM killer priority (-1000 to 1000, higher = more likely to kill)
docker run --memory=512m --oom-score-adj=500 myapp

Health Checks

HEALTHCHECK Instruction:

Define how Docker determines if container is healthy:

# Basic health check
FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm install
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1
CMD ["npm", "start"]

Health check options:

--interval=30s: Check every 30 seconds
--timeout=3s: Fail if check takes > 3 seconds
--start-period=5s: Grace period before first check
--retries=3: Mark unhealthy after 3 failures

Health Check in Docker Compose:

services:
  web:
    image: myapp:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 3s
      retries: 3
      start_period: 5s
    ports:
      - "3000:3000"

  db:
    image: postgres:15
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

Monitor Health Status:

# View container health status
docker ps --format "table {{.Names}}\t{{.Status}}"

# Output example:
# NAMES                 STATUS
# web                   Up 5 minutes (healthy)
# db                    Up 5 minutes (unhealthy)

# View health check logs
docker inspect --format='{{.State.Health}}' myapp

Graceful Shutdown

SIGTERM Handling:

Containers receive SIGTERM signal during stop. Applications must handle this gracefully:

# Python example - graceful shutdown
import signal
import sys
import time

def signal_handler(sig, frame):
    print('Shutting down gracefully...')
    # Close connections, flush buffers
    close_database_connections()
    flush_logs()
    sys.exit(0)

signal.signal(signal.SIGTERM, signal_handler)
signal.signal(signal.SIGINT, signal_handler)

# Application logic
while True:
    time.sleep(1)

// Node.js example
process.on('SIGTERM', async () => {
  console.log('SIGTERM received, shutting down gracefully');

  // Stop accepting new requests
  server.close(() => {
    console.log('HTTP server closed');
  });

  // Close database connections
  await db.close();

  // Wait for existing requests to complete (with timeout)
  setTimeout(() => {
    console.log('Forced shutdown');
    process.exit(0);
  }, 30000);
});

server.listen(3000);

Docker Compose Stop Grace Period:

services:
  web:
    image: myapp:latest
    stop_grace_period: 30s  # Allow 30s for graceful shutdown
    stop_signal: SIGTERM    # Send SIGTERM (default)

# Cli equivalent
docker stop --time=30 myapp

Production Deployment Checklist

Comprehensive checklist for deploying containers to production:

Security:

All images scanned for vulnerabilities
No hardcoded secrets in Dockerfile or images
Running as non-root user
Linux capabilities dropped to minimum required
Read-only root filesystem enabled
Seccomp and AppArmor profiles applied
Network policies restrict container communication
TLS enabled for inter-service communication

Resource Management:

CPU limits configured for all containers
Memory limits configured with OOM prevention
Request/limit ratios tested under load
Disk space monitoring and cleanup policies in place

Monitoring and Logging:

Application logs sent to stdout/stderr
Centralized logging configured and tested
Prometheus/monitoring agent running
Alerting rules configured for key metrics
Health checks configured and tested

Networking:

Containers use user-defined networks
Only required ports exposed
Network policies restrict traffic
Secrets not transmitted over unencrypted channels
Service discovery configured properly

High Availability:

Health checks properly implemented
Container restart policies configured
Multiple replicas running (if applicable)
Graceful shutdown handling verified
Database migrations handled properly

Data and Storage:

Persistent data uses volumes/bind mounts
Temporary data uses tmpfs where appropriate
Backups configured and tested
Data encryption at rest enabled

Deployment and Rollback:

Blue-green or canary deployment strategy defined
Rollback procedure documented and tested
Database backward compatibility verified
Load balancer/service mesh configuration correct
DNS propagation time considered

Documentation:

Dockerfile documented and optimized
Deployment procedures documented
Runbook for common issues created
Disaster recovery plan established
Team trained on deployment process

Quick Reference

Security Commands

# Run as non-root user
docker run --user 1000:1000 myapp

# Run as specific user and group
docker run --user appuser:appgroup myapp

# Read-only root filesystem
docker run --read-only myapp

# Read-only with tmpfs for temp files
docker run --read-only --tmpfs /tmp:size=100m myapp

# Drop all capabilities
docker run --cap-drop ALL myapp

# Drop all, add only NET_BIND_SERVICE
docker run --cap-drop ALL --cap-add NET_BIND_SERVICE myapp

# Apply seccomp profile
docker run --security-opt seccomp=default myapp

# Apply AppArmor profile
docker run --security-opt apparmor=docker-default myapp

# Limit resources
docker run --cpus=2 --memory=512m myapp

# Scan image for vulnerabilities
docker scout cves myimage:latest
trivy image myimage:latest
snyk container test myimage:latest

# Verify image digest
docker pull ubuntu@sha256:a7b8ffd33d65d7f0f2d2c0d2d0d0d0d0d0d0d0d0d0d0d0d0d0d0d0d0d0d0d0

# Check container user
docker exec myapp whoami

# View health status
docker inspect --format='{{.State.Health.Status}}' myapp

# View container stats
docker stats myapp

# View security options
docker inspect --format='{{.HostConfig.SecurityOpt}}' myapp

Exercises

Exercise 1: Audit a Running Container for Security Issues

Select a production container and perform a comprehensive security audit:

Identify base image and check for vulnerabilities

docker inspect myapp --format='{{.Config.Image}}'
trivy image <base-image>

Check running user

docker exec myapp whoami
docker exec myapp id

Verify Linux capabilities

docker inspect --format='{{.HostConfig.CapAdd}}' myapp
docker inspect --format='{{.HostConfig.CapDrop}}' myapp

Check resource limits

docker inspect myapp --format='{{.HostConfig.Memory}}'
docker inspect myapp --format='{{.HostConfig.CpuQuota}}'

Verify seccomp/AppArmor

docker inspect --format='{{.HostConfig.SecurityOpt}}' myapp

Check open ports
```
docker exec myapp netstat -tlnp
```

Document findings and create remediation plan.

Exercise 2: Set Up Prometheus + Grafana to Monitor Docker Containers

Create a monitoring stack for container metrics:

Create docker-compose.yml with Prometheus, Grafana, and cAdvisor
Configure Prometheus to scrape cAdvisor metrics
Deploy stack with docker-compose up
Access Grafana (port 3000) and create dashboard
Add data source pointing to Prometheus (localhost:9090)
Create graphs for:
- CPU usage per container
- Memory usage per container
- Network I/O
- Container count
Set up alerting rules for:
- CPU > 80%
- Memory > 90% of limit
- Container restart count > 5
Generate load and verify metrics are collected
Document dashboard for team

Exercise 3: Create a Secure Dockerfile Following Best Practices

Build a hardened Node.js application image:

Use specific Node.js version, not latest
Use multi-stage build to exclude development dependencies
Create non-root user for running application
Implement .dockerignore file
Pin all npm dependency versions
Add HEALTHCHECK instruction
Use read-only filesystem with tmpfs for temporary files
Drop unnecessary capabilities
Scan final image with Trivy
Write documentation explaining security decisions

# Reference template
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

FROM alpine:3.19
RUN apk add --no-cache dumb-init tini
RUN addgroup -g 1001 -S appuser && adduser -u 1001 -S appuser -G appuser
WORKDIR /app
COPY --from=builder --chown=appuser:appuser /app/node_modules ./node_modules
COPY --chown=appuser:appuser . .
USER appuser
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD node healthcheck.js
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["node", "server.js"]

Document all security decisions and test thoroughly before deployment.

Security​

Container Security Overview​

Image Security​

Build Security​

Runtime Security​

Network Security​

Docker Security Checklist​

Monitoring​

Container Monitoring Challenges​

Monitoring Tools​

Key Metrics to Monitor​

Logging Best Practices​

Production Best Practices​

CI/CD with Docker​

Image Registry Management​

Resource Management​

Health Checks​

Graceful Shutdown​

Production Deployment Checklist​

Quick Reference​

Security Commands​

Exercises​