DevOps Fundamentals

Master the foundational principles, culture, and frameworks that underpin modern DevOps practices.

Learning Focus

DevOps is first a culture and mindset, then a set of practices and tools. Understand the "why" before the "how."

The DevOps Lifecycle

DevOps follows an infinite loop that optimizes the entire software lifecycle:

Plan → Code → Build → Test → Release → Deploy → Operate → Monitor → (back to Plan)

Each stage feeds into the next, creating continuous feedback loops that enable rapid iteration and improvement.

Key Lifecycle Principles

Continuous Feedback: Each stage provides insights for improvement
Automation: Minimize manual steps and human error
Collaboration: Break silos between development and operations
Measurement: Track metrics at every stage

The Three Ways of DevOps

The foundational principles that guide DevOps culture and practice:

1. The First Way: Systems Thinking (Flow)

Focus on the entire value stream from concept to customer.

Optimize for end-to-end lead time
Identify and eliminate bottlenecks
Understand dependencies between teams
Visualize the workflow (value stream mapping)
Work towards fast and reliable flow

Key Metric: Lead time from code check-in to feature release

2. The Second Way: Amplify Feedback Loops

Build rapid feedback mechanisms into every process.

Fast feedback from production (monitoring, logging)
Left-shift quality (test early and often)
Share knowledge across teams
Conduct blameless postmortems
Implement continuous monitoring and alerting

Key Principle: The faster you detect problems, the faster you can fix them

3. The Third Way: Culture of Continual Experimentation

Foster a culture of continuous learning and improvement.

Encourage risk-taking and experimentation
Learn from failures through blameless postmortems
Allocate time for innovation and improvement
Practice chaos engineering (controlled failure testing)
Celebrate learnings, not just successes

Cultural Shift: From "blame culture" to "learning culture"

CALMS Framework

A holistic view of DevOps spanning five dimensions:

Culture

Collaboration: Break down silos between dev and ops
Shared Responsibility: Everyone owns quality and reliability
Psychological Safety: Teams feel safe experimenting and reporting issues
Transparency: Open communication about wins and failures

Automation

CI/CD Pipelines: Automate build, test, and deploy
Infrastructure Automation: IaC (Terraform, Ansible, CloudFormation)
Self-Service: Developers can provision and manage resources
Compliance Automation: Policy enforcement as code

Lean

Eliminate Waste: Remove unnecessary processes and handoffs
Flow: Optimize for speed from idea to production
Small Batches: Deploy frequently in small increments
Just-in-Time: Provision resources when needed

Measurement

Observability: Comprehensive visibility into systems
Metrics-Driven: Make decisions based on data
Continuous Monitoring: Track system health and business metrics
Feedback Loops: Use data to guide improvements

Knowledge Sharing: Document learnings and best practices
Communities of Practice: Cross-functional learning groups
Blameless Postmortems: Psychological safety in incident reviews
Open Source: Contribute to and leverage community tools

Value Stream Mapping

A lean technique for visualizing and optimizing the software delivery process.

Steps to Create a Value Stream Map

Identify the Process: Map the current state end-to-end
Collect Data: Measure time at each step (processing time + wait time)
Identify Waste: Handoffs, approval delays, rework, waiting
Visualize: Create a visual representation
Optimize: Target the biggest bottlenecks
Implement: Make changes incrementally
Measure: Track improvements over time

Common Bottlenecks

Manual testing and approval processes
Handoffs between teams (dev → QA → ops)
Long change approval windows
Infrastructure provisioning delays
Siloed knowledge and documentation

Team Topologies

Structure teams for effective DevOps and platform engineering.

Four Fundamental Team Types

1. Stream-Aligned Teams

Organized around a value stream or business capability
Own the full lifecycle of their service
Minimal cross-team dependencies
Goal: Fast, independent delivery

2. Enabling Teams

Support and coach stream-aligned teams
Share expertise (security, performance, infrastructure)
Help teams adopt new practices and tools
Goal: Build internal capability

3. Platform Teams

Provide self-service capabilities and tools
Own shared infrastructure and golden paths
Reduce cognitive load on stream-aligned teams
Goal: Enable fast, safe delivery at scale

4. Complicated-Subsystem Teams

Manage complex systems requiring deep expertise
Examples: real-time processing, machine learning models
Goal: Reduce cognitive load for dependent teams

Team Interaction Patterns

Collaboration: Teams work closely for a defined period
X-as-a-Service: One team provides a service to others
Facilitating: Enabling teams help others grow capability

DORA Metrics (DevOps Research and Assessment)

The four key metrics that predict software delivery performance:

1. Deployment Frequency

How often do you successfully release to production?

Elite: On-demand, multiple deployments per day
High: Between 1 day and 1 week
Medium: Between 1 week and 1 month
Low: Less than 1 month

Why It Matters: Frequent deployments reduce risk per change and improve time-to-value

2. Lead Time for Changes

How long from code commit to production release?

Elite: Less than 1 day
High: 1 day to 1 week
Medium: 1 week to 1 month
Low: More than 1 month

Why It Matters: Short lead times enable rapid feedback and iteration

3. Mean Time to Recovery (MTTR)

How quickly can you restore service after a production failure?

Elite: Less than 1 hour
High: 1 hour to 1 day
Medium: 1 day to 1 week
Low: More than 1 week

Why It Matters: Fast recovery minimizes impact and customer pain

4. Change Failure Rate

What percentage of deployments result in production incidents?

Elite: 0-15%
High: 15-45%
Medium: 45-60%
Low: 60%+

Why It Matters: Low failure rates indicate quality practices and proper testing

Using DORA Metrics

Benchmark: Compare your metrics to industry standards
Track Trends: Monitor improvements over time
Identify Bottlenecks: Low deployment frequency or high lead time indicates process issues
Correlate Changes: Measure impact of new tools or practices

Research-Backed

DORA metrics come from the Accelerate research program at Google Cloud, backed by years of industry research showing these metrics predict organizational performance.

Blameless Postmortems

A structured process for learning from production incidents without blame.

Principles

No Blame: Focus on systems and processes, not individuals
Psychological Safety: Encourage honest discussion of what happened
Root Cause Analysis: Understand the underlying conditions that led to the incident
Continuous Improvement: Extract learnings to prevent future incidents

Postmortem Format

Incident Summary
- What happened?
- When did it occur?
- How long did it last?
- What was the impact?
Timeline
- When was the issue detected?
- What actions were taken?
- When was service restored?
- Key decision points
Contributing Factors
- Technical failures
- Process gaps
- Tooling limitations
- Environmental conditions
Root Causes
- Why did the technical failure occur?
- What system or process conditions enabled it?
- Multiple causes are common
Action Items
- Process improvements
- Tooling enhancements
- Documentation updates
- Training or capability building

Key Questions

What was supposed to happen?
What actually happened?
Why was there a gap between the two?
What system or process changes would prevent this?

Culture

Hold postmortems regardless of severity or duration
Participate across all levels (engineers, managers, leadership)
Focus on "How do we improve?" not "Who caused this?"
Act on recommendations to show postmortems matter
Share learnings across teams

DevOps Metrics & Key Performance Indicators

Beyond DORA, track these additional metrics:

Process Metrics

Code Review Time: How long before feedback on a pull request?
Deploy Failure Rate: Percentage of deployments with rollbacks
Defect Escape Rate: Bugs found in production vs. in development
Test Coverage: Percentage of code covered by automated tests

Reliability Metrics

Service Availability: Uptime percentage (e.g., 99.99%)
Error Rate: Percentage of requests resulting in errors
System Latency: Response time at p50, p95, p99
CPU/Memory Utilization: Resource consumption patterns

Culture Metrics

Employee Satisfaction: Survey scores on collaboration and tools
Team Velocity: Story points completed per sprint
Knowledge Sharing: Documentation updates, internal talks
On-Call Burden: Pages per engineer per week

Business Metrics

Time-to-Market: Days from idea to feature in production
Mean Time to Value: When does a feature start generating ROI?
Customer Satisfaction: NPS, bug severity, feature requests
Cost per Feature: Infrastructure cost divided by features released

Exercises & Practices

Exercise 1: Value Stream Mapping

Objective: Visualize and analyze your deployment process

Gather your team and map the current deployment process
Identify each step: development, testing, approval, deployment, monitoring
Measure time at each step (elapsed time + waiting time)
Calculate total lead time
Identify top three bottlenecks
Propose improvements for each bottleneck
Estimate potential lead time reduction

Exercise 2: DORA Metrics Baseline

Objective: Establish current performance metrics

Collect data for the past 3 months:
- How many deployments per week? (Deployment Frequency)
- Average time from commit to production? (Lead Time)
- Average time to fix production incidents? (MTTR)
- How many deployments had to be rolled back? (Change Failure Rate)
Compare your metrics to DORA benchmarks
Identify which metric needs most improvement
Plan initiatives to improve the lowest-performing metric

Exercise 3: Postmortem Practice

Objective: Build blameless postmortem culture

Schedule a postmortem for a recent incident (even low-severity ones)
Follow the postmortem format above
Focus questions on "What system conditions led to this?"
Generate at least 3 action items
Assign owners and due dates
Track completion of action items
Share learnings with broader teams

Exercise 4: Team Topology Assessment

Objective: Evaluate if your team structure supports fast flow

Map your current teams
For each stream-aligned team, identify:
- Value stream they own
- Key dependencies on other teams
- Manual handoffs in their process
Assess: Is each team able to deliver independently?
Identify: Where would a platform team help?
Plan: How could you restructure for better flow?

Key Takeaways

DevOps is Culture First: Tools amplify good practices, but culture change comes first
Optimize the Whole System: Focus on end-to-end flow, not individual component optimization
Measure What Matters: DORA metrics predict organizational performance
Learn from Failures: Blameless postmortems drive continuous improvement
Structure for Flow: Team topologies should enable independent, rapid delivery

Next Steps

Measure your current DORA metrics
Conduct a value stream mapping exercise
Establish blameless postmortem culture
Read: The DevOps Handbook and Accelerate by Gene Kim

The DevOps Lifecycle​

Key Lifecycle Principles​

The Three Ways of DevOps​

1. The First Way: Systems Thinking (Flow)​

2. The Second Way: Amplify Feedback Loops​

3. The Third Way: Culture of Continual Experimentation​

CALMS Framework​

Culture​

Automation​

Lean​

Measurement​

Sharing​

Value Stream Mapping​

Steps to Create a Value Stream Map​

Common Bottlenecks​

Team Topologies​

Four Fundamental Team Types​

1. Stream-Aligned Teams​

2. Enabling Teams​

3. Platform Teams​

4. Complicated-Subsystem Teams​

Team Interaction Patterns​

DORA Metrics (DevOps Research and Assessment)​

1. Deployment Frequency​

2. Lead Time for Changes​

3. Mean Time to Recovery (MTTR)​

4. Change Failure Rate​

Using DORA Metrics​

Blameless Postmortems​

Principles​

Postmortem Format​

Key Questions​

Culture​

DevOps Metrics & Key Performance Indicators​

Process Metrics​

Reliability Metrics​

Culture Metrics​

Business Metrics​

Exercises & Practices​

Exercise 1: Value Stream Mapping​

Exercise 2: DORA Metrics Baseline​

Exercise 3: Postmortem Practice​

Exercise 4: Team Topology Assessment​

Key Takeaways​

Next Steps​

The DevOps Lifecycle

Key Lifecycle Principles

The Three Ways of DevOps

1. The First Way: Systems Thinking (Flow)

2. The Second Way: Amplify Feedback Loops

3. The Third Way: Culture of Continual Experimentation

CALMS Framework

Culture

Automation

Lean

Measurement

Sharing

Value Stream Mapping

Steps to Create a Value Stream Map

Common Bottlenecks

Team Topologies

Four Fundamental Team Types

1. Stream-Aligned Teams

2. Enabling Teams

3. Platform Teams

4. Complicated-Subsystem Teams

Team Interaction Patterns

DORA Metrics (DevOps Research and Assessment)

1. Deployment Frequency

2. Lead Time for Changes

3. Mean Time to Recovery (MTTR)

4. Change Failure Rate

Using DORA Metrics

Blameless Postmortems

Principles

Postmortem Format

Key Questions

Culture

DevOps Metrics & Key Performance Indicators

Process Metrics

Reliability Metrics

Culture Metrics

Business Metrics

Exercises & Practices

Exercise 1: Value Stream Mapping

Exercise 2: DORA Metrics Baseline

Exercise 3: Postmortem Practice

Exercise 4: Team Topology Assessment

Key Takeaways

Next Steps