Skip to main content

Cloud Architecture & Design Patterns

Cloud Design Principles

Before diving into specific patterns, understand the core principles of effective cloud architecture:

1. Scalability

Systems should grow efficiently with demand.

Types:

  • Vertical Scaling (Scale Up): Increase resources (CPU, memory) on existing servers
  • Horizontal Scaling (Scale Out): Add more servers to handle load

Cloud-native approach: Horizontal scaling through load balancing and auto-scaling

2. Reliability & Resilience

Systems should continue operating despite failures.

Key concepts:

  • Fault Tolerance: System continues despite component failures
  • Self-Healing: Automatic recovery from failures
  • Redundancy: Multiple copies of critical components
  • Graceful Degradation: Reduced functionality rather than complete failure

3. Elasticity

Resources automatically adjust to match demand.

Without elasticity: Fixed capacity, over-provision for peaks, waste during valleys With elasticity: Right-sized capacity at all times, cost optimization

4. Security

Security is everyone's responsibility (shared responsibility model).

Key practices:

  • Least privilege access
  • Defense in depth (multiple security layers)
  • Encryption at rest and in transit
  • Regular security audits

5. Cost Optimization

Design for efficiency without compromising functionality.

Strategies:

  • Use managed services to avoid infrastructure management
  • Right-size resources based on actual needs
  • Leverage auto-scaling for variable workloads
  • Monitor and optimize continuously

Common Cloud Architecture Patterns

3-Tier Architecture

The most common web application architecture.

Characteristics:

  • Clear separation of concerns
  • Each tier can scale independently
  • Easier to secure (network rules per tier)
  • Straightforward to implement and test

When to use: Traditional web applications, REST APIs, content management systems

Implementation example:

Tier 1: CloudFront CDN → ALB (Load Balancer)
Tier 2: EC2 Auto Scaling Group → Stateless app servers
Tier 3: RDS Multi-AZ → Read Replicas

Microservices vs Monolith Architecture

Characteristics:

  • Small, focused services (single responsibility)
  • Independent deployment
  • Technology flexibility (different tech stacks)
  • Distributed system complexity

Advantages:

  • Easy to scale individual services
  • Faster development and deployment
  • Fault isolation (one service failure doesn't crash all)
  • Technology flexibility

Challenges:

  • Complexity: more services to manage
  • Network latency between services
  • Distributed debugging
  • Data consistency across services

Best practices:

  • Keep services loosely coupled
  • Use API gateways for routing
  • Implement circuit breakers for fault tolerance
  • Use correlation IDs for request tracing

Serverless Architecture

Event-driven architecture where providers manage servers.

Events → Lambda → Database

API Gateway

Characteristics:

  • No server management
  • Pay per execution (truly pay-as-you-go)
  • Automatic scaling
  • Built-in high availability

When to use:

  • Asynchronous processing (image processing, notifications)
  • Scheduled jobs (backups, reports)
  • Webhooks and integrations
  • APIs with variable traffic
  • Real-time data processing

Advantages:

  • Lowest operational overhead
  • Fastest scaling (milliseconds)
  • Extreme cost optimization for variable workloads
  • Easy deployment

Limitations:

  • Cold starts (initial invocation slower)
  • Execution time limits (typically 15 minutes max)
  • State management complexity
  • Vendor lock-in

Event-Driven Architecture

Services communicate asynchronously via events.

┌─────────────────────────────────┐
│ Event Producer │
│ (User signup, order placed) │
└────────────┬────────────────────┘


┌─────────────────────────────────┐
│ Event Bus/Message Queue │
│ (SNS, Kafka, RabbitMQ) │
└────────────┬────────────────────┘

┌────────┴────────┬────────────┐
▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌────────┐
│Email │ │Analytics │ │Payment │
│Service │ │Service │ │Service │
└────────┘ └──────────┘ └────────┘

Characteristics:

  • Loose coupling between services
  • Asynchronous communication
  • Improved scalability and resilience

When to use:

  • Services need real-time notifications
  • Multiple systems need same data
  • Building event-sourcing patterns

Cloud-Native Principles

Cloud-native applications are designed for cloud deployment from the start.

12-Factor Application

A methodology for building scalable cloud applications:

FactorPrinciple
1. CodebaseOne codebase per app, version controlled
2. DependenciesExplicitly declared, no implicit globals
3. ConfigurationEnvironment variables, not hardcoded
4. Backing ServicesTreat databases/queues as attached resources
5. Build/RunStrictly separate build and run stages
6. ProcessesStateless and share-nothing processes
7. Port BindingExport HTTP via port binding (no external server)
8. ConcurrencyScale horizontally via process model
9. DisposabilityFast startup/shutdown for elasticity
10. Dev/Prod ParitySame tools and services locally and in production
11. LogsWrite logs to stdout, let environment handle storage
12. Admin TasksRun as one-off processes, not background jobs

Key Cloud-Native Practices

Containerization:

  • Package applications in containers (Docker)
  • Enables consistent deployment across environments
  • Simplifies scaling and orchestration

Infrastructure as Code (IaC):

  • Define infrastructure via code (Terraform, CloudFormation)
  • Version control for infrastructure
  • Reproducible, automated deployments

Observability:

  • Structured logging
  • Metrics and monitoring
  • Distributed tracing
  • Understand system behavior in production

CI/CD:

  • Continuous integration: automated testing on every commit
  • Continuous deployment: automated release to production
  • Fast feedback loops

Load Balancing

Distribute traffic across multiple servers to prevent overload.

Types of Load Balancers

1. Network Load Balancer (Layer 4)

  • Works at transport layer (TCP/UDP)
  • Ultra-high performance, lowest latency
  • Best for: Non-HTTP protocols, extreme performance

2. Application Load Balancer (Layer 7)

  • Works at application layer (HTTP/HTTPS)
  • Intelligent routing based on URL paths, hostnames, headers
  • Best for: Web applications, microservices

3. Classic Load Balancer

  • Legacy option, simpler
  • Works across layers 4 and 7
  • Avoid for new applications

Load Balancing Algorithms

AlgorithmUse Case
Round RobinEqual distribution to all servers
WeightedServers with more capacity get more traffic
Least ConnectionsSend to server with fewest active connections
IP HashSame client always routes to same server (session persistence)
RandomRandomize distribution

Auto-Scaling

Automatically adjust capacity based on demand.

Scaling Policies

1. Reactive Scaling

  • Scale when metrics exceed thresholds
  • Slower response (lag between demand and scaling)
  • Example: "Add instance if CPU > 70% for 2 minutes"

2. Predictive Scaling

  • Machine learning predicts future demand
  • Faster scaling, better user experience
  • Requires historical data

3. Scheduled Scaling

  • Scale on a fixed schedule
  • Example: "Double capacity at 9 AM on weekdays"
  • Best for: Predictable patterns

Metrics for Auto-Scaling

Common metrics:

  • CPU utilization
  • Memory usage
  • Request count
  • Network bandwidth
  • Custom application metrics

Best practices:

  • Scale on application metrics, not just CPU
  • Set appropriate min/max capacity limits
  • Avoid rapid scaling fluctuations ("flapping")
  • Test scaling behavior before production

Disaster Recovery & Business Continuity

Recovery Metrics

RTO (Recovery Time Objective)

  • How quickly must the system recover?
  • Measured in hours, minutes, or seconds
  • Business impact: downtime cost

RPO (Recovery Point Objective)

  • How much data loss is acceptable?
  • Measured in hours, minutes, or point-in-time
  • Business impact: data loss consequences

Disaster Recovery Strategies

1. Backup & Restore

  • Take regular backups
  • Restore from backups when disaster occurs
  • RTO: Hours to days
  • RPO: Hours to days
  • Cost: Lowest

2. Pilot Light

  • Minimal hot standby in another region
  • Scale up when needed
  • RTO: 10s of minutes
  • RPO: Minutes to hours
  • Cost: Low to moderate

3. Warm Standby

  • Scaled-down copy running in another region
  • Scale up quickly
  • RTO: Minutes
  • RPO: Seconds to minutes
  • Cost: Moderate

4. Hot Standby (Active-Active)

  • Fully operational copy in another region
  • Instant failover
  • RTO: Seconds
  • RPO: Seconds or real-time replication
  • Cost: Highest

Implementing Disaster Recovery

Database replication:

  • Primary-replica setup for continuous sync
  • Read replicas for load distribution

Multi-region deployment:

  • Applications running in multiple regions
  • Traffic routing between regions

Data backup strategy:

  • Regular snapshots of databases
  • Geographic distribution of backups
  • Test recovery procedures regularly

Hands-On Exercises

Exercise 1: Architecture Pattern Selection

For each scenario, recommend an appropriate architecture pattern:

  1. A social media platform handling billions of reads and millions of writes
  2. A mobile app that processes photos
  3. A real-time analytics dashboard
  4. A traditional enterprise business application

Suggested answers:

  1. Microservices (read-optimized + write-optimized services)
  2. Serverless (event-driven image processing)
  3. Microservices + event-driven (real-time data)
  4. 3-tier architecture

Exercise 2: Design a Scalable E-commerce Platform

Design architecture for handling peak traffic (10x normal):

  • How would you handle traffic spikes?
  • What services would you use?
  • Where would you add load balancing?
  • How would you scale the database?

Exercise 3: Disaster Recovery Plan

Create a DR plan for a critical payment processing system:

  • What is an acceptable RTO and RPO?
  • Which DR strategy would you choose and why?
  • What services would you use?
  • How would you test the plan?

Exercise 4: Cloud-Native Evaluation

Evaluate an existing application against 12-factor principles:

  • Which factors does it follow?
  • Which factors need improvement?
  • Create an action plan to improve cloud-native readiness

Next Steps