Skip to main content

AWS Databases & Storage

AWS provides a comprehensive suite of database and storage services designed for different use cases, from high-performance relational databases to scalable NoSQL solutions and petabyte-scale data warehouses. This guide covers the most commonly used services and best practices for selecting and managing them.

S3 (Simple Storage Service)

S3 is AWS's object storage service, designed for durability, availability, and scalability. It's the foundation for many AWS architectures.

Core Concepts

  • Buckets: Top-level containers for objects (like directories)
  • Objects: Files stored in S3, consisting of data and metadata
  • Keys: Unique identifiers for objects within a bucket (full path)
  • Versioning: Enable to maintain multiple versions of objects for protection against accidental deletion

Storage Classes Comparison

Storage ClassMin StorageAccess TimeCost per GBBest For
StandardNoneImmediateHighestFrequent access, active data
Standard-IA30 daysImmediateLowerInfrequent access (1 month+)
One Zone-IA30 daysImmediateLowest warmInfrequent, non-critical data
Intelligent-TieringNoneImmediateVariableUnknown access patterns
Glacier Instant90 daysMinutesLowArchive with quick retrieval
Glacier Flexible90 days3-5 hoursLowerArchive, predictable retrieval
Glacier Deep Archive180 days12 hoursLowestLong-term archive

Lifecycle Policies

Automatically transition objects between storage classes or delete them based on age:

{
"Rules": [
{
"Id": "ArchiveAfter90Days",
"Status": "Enabled",
"Prefix": "logs/",
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 90,
"StorageClass": "GLACIER_IR"
}
],
"Expiration": {
"Days": 365
},
"NoncurrentVersionTransitions": [
{
"NoncurrentDays": 7,
"StorageClass": "GLACIER_FLEXIBLE"
}
]
}
]
}

Encryption

  • SSE-S3: Server-side encryption with S3-managed keys (default)
  • SSE-KMS: Server-side encryption with AWS KMS (customer control, auditable)
  • SSE-C: Server-side encryption with customer-provided keys
  • Client-side: Encrypt before uploading (CloudFront, SDK)

Access Control

  • Bucket Policies: JSON policies attached to buckets (recommended)
  • ACLs: Object and bucket-level access (legacy, not recommended)
  • Access Points: Simplified access management for applications
  • Block Public Access: Four settings to prevent accidental public exposure

S3 Static Website Hosting

# Enable static website hosting
aws s3 website s3://mybucket --index-document index.html --error-document error.html

# Upload files
aws s3 cp . s3://mybucket --recursive --exclude ".git/*"

Advanced Features

  • Cross-Region Replication (CRR): Automatically replicate objects to another region
  • S3 Transfer Acceleration: Use CloudFront edge locations for faster uploads
  • Event Notifications: Trigger Lambda, SNS, or SQS on object changes
  • S3 Batch Operations: Apply actions to millions of objects at scale

AWS CLI Examples

# Create bucket
aws s3 mb s3://my-unique-bucket --region us-east-1

# Upload file
aws s3 cp myfile.txt s3://my-bucket/

# Upload directory
aws s3 sync ./local-dir s3://my-bucket/remote-dir --delete

# List objects
aws s3 ls s3://my-bucket/ --recursive

# Set lifecycle policy
aws s3api put-bucket-lifecycle-configuration \
--bucket my-bucket \
--lifecycle-configuration file://lifecycle.json

# Enable versioning
aws s3api put-bucket-versioning \
--bucket my-bucket \
--versioning-configuration Status=Enabled

# Block public access
aws s3api put-public-access-block \
--bucket my-bucket \
--public-access-block-configuration \
"BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"

EBS (Elastic Block Store)

EBS provides block-level storage volumes for EC2 instances. Choose the right volume type based on performance and cost requirements.

Volume Types Comparison

TypeIOPSThroughputUse Case
gp33,000-16,000125-1,000 MB/sGeneral purpose, web apps, databases
gp2100-16,00040-250 MB/sLegacy general purpose (use gp3)
io264,000-256,0001,000-4,000 MB/sHigh-performance databases, NoSQL
io1100-64,000500-1,000 MB/sLegacy high-IOPS (use io2)
st1500500 MB/sThroughput-optimized (Hadoop, logs)
sc1250250 MB/sCold storage, infrequent access

Key Features

  • Snapshots: Point-in-time backups (incremental, stored in S3)
  • Encryption: At-rest encryption with AWS KMS
  • Multi-Attach: Attach single io1/io2 volume to multiple instances (shared storage)
  • Fast Snapshot Restore: Create volumes from snapshots with full performance immediately
  • EBS-Optimized: Instances with dedicated EBS bandwidth

EBS vs Instance Store

FeatureEBSInstance Store
PersistenceSurvives instance terminationLost on termination
PerformanceNetwork-attachedLocal, higher throughput
CostPay per GB storedNo additional cost
Use CaseDatabases, file systemsTemporary, cache

AWS CLI Examples

# Create volume (100 GB gp3)
aws ec2 create-volume \
--size 100 \
--volume-type gp3 \
--iops 3000 \
--throughput 125 \
--availability-zone us-east-1a

# Create snapshot
aws ec2 create-snapshot --volume-id vol-12345 --description "DB backup"

# Create volume from snapshot
aws ec2 create-volume \
--snapshot-id snap-12345 \
--availability-zone us-east-1a

# Attach volume to instance
aws ec2 attach-volume \
--volume-id vol-12345 \
--instance-id i-12345 \
--device /dev/sdf

EFS (Elastic File System)

EFS provides managed NFS (Network File System) storage for EC2 instances, supporting thousands of concurrent connections.

Key Features

  • Shared Access: Multiple instances mount the same file system
  • Performance Modes:
    • General Purpose (default): Lowest latency, suitable for most workloads
    • Max I/O: Higher levels of throughput and operations per second
  • Storage Classes:
    • Standard: Frequently accessed files
    • Infrequent Access (IA): Automatic transition to lower cost for rarely used files
  • Encryption: At-rest with KMS, in-transit with TLS
  • Lifecycle Policies: Automatic transition to IA after 30 days (configurable)

Comparison: EFS vs EBS vs S3

FeatureEFSEBSS3
AccessMultiple instancesSingle instanceUnlimited
LatencySub-millisecondMicrosecondMilliseconds
ProtocolNFSBlockHTTP/S
CostPer GB storedPer GB storedPer GB stored
Use CaseShared file systemsInstance volumesObject storage

RDS (Relational Database Service)

AWS-managed relational database service eliminating administrative overhead while providing high availability and scalability.

Supported Engines

  • MySQL: Open-source RDBMS
  • PostgreSQL: Advanced open-source RDBMS with JSON, UUID support
  • MariaDB: MySQL fork with improved performance
  • Oracle Database: Enterprise RDBMS
  • SQL Server: Microsoft enterprise database
  • Aurora: AWS-native MySQL and PostgreSQL compatible (covered separately)

High Availability & Reliability

  • Multi-AZ: Synchronous standby replica in different AZ with automatic failover (RTO: 1-2 minutes)
  • Read Replicas: Asynchronous replicas for read scaling (same region or cross-region)
  • Automated Backups: Daily snapshots retained for 7-35 days
  • Manual Snapshots: User-initiated backups retained indefinitely
  • Backup Window: Specify preferred time for automated backups

Performance & Scaling

  • Instance Classes: db.t3.micro to db.x1e.32xlarge (compute-optimized, memory-optimized)
  • Storage Auto-Scaling: Automatically expand storage as needed
  • RDS Proxy: Connection pooling and IAM authentication
    • Reduces database connections by 90%
    • Improves application resilience
    • Enables seamless failover

Parameter Groups & Option Groups

  • Parameter Group: Database engine configuration (max connections, charset, timeout)
  • Option Group: Add-ons like backup, replication, encryption

AWS CLI Examples

# Create MySQL instance with Multi-AZ
aws rds create-db-instance \
--db-instance-identifier mydb \
--db-instance-class db.t3.micro \
--engine mysql \
--master-username admin \
--master-user-password MyPassword123 \
--allocated-storage 100 \
--multi-az \
--backup-retention-period 7

# Create read replica
aws rds create-db-instance-read-replica \
--db-instance-identifier mydb-replica \
--source-db-instance-identifier mydb

# Create snapshot
aws rds create-db-snapshot \
--db-snapshot-identifier mydb-snapshot \
--db-instance-identifier mydb

# Restore from snapshot
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier mydb-restored \
--db-snapshot-identifier mydb-snapshot

# Describe instances
aws rds describe-db-instances --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceStatus]'

Amazon Aurora

AWS-native relational database designed for cloud with 5x MySQL and 3x PostgreSQL performance, combined with high availability and scalability.

Architecture

  • Shared Storage: Cluster volume across multiple AZs (auto-replicating, self-healing)
  • Compute: Multiple instance types for different performance tiers
  • Replicas: Up to 15 read replicas with automatic failover
  • Performance: Optimized for throughput and low latency

Aurora Serverless v2

  • Auto-scaling: Automatically adjust capacity based on demand
  • Per-second Billing: Pay for what you use, not reserved capacity
  • Cold Start: Starts in seconds with warm pools
  • Ideal For: Unpredictable workloads, development/testing, variable traffic

Advanced Features

  • Aurora Global Database: Read-only replicas in other regions for DR and low-latency reads
  • Aurora Backtrack: Rewind database to any point in last 72 hours (no restore needed)
  • Parallel Query: Distributed queries across cluster for fast scans
  • Smart Driver: Connection awareness and read/write routing

Aurora vs RDS Comparison

FeatureAuroraRDS (MySQL/PostgreSQL)
Performance5x/3x fasterStandard
AvailabilityBuilt-in with 15 replicasMulti-AZ standby only
CostHigher computeLower compute
ScalingAutomatic cluster scalingManual or Auto-Scaling
RecoveryBacktrackSnapshots only

AWS CLI Examples

# Create Aurora MySQL cluster (serverless v2)
aws rds create-db-cluster \
--db-cluster-identifier aurora-cluster \
--engine aurora-mysql \
--engine-version 8.0.mysql_aurora.3.04.0 \
--master-username admin \
--master-user-password MyPassword123

# Add instance to cluster
aws rds create-db-instance \
--db-instance-identifier aurora-primary \
--db-instance-class db.serverless \
--engine aurora-mysql \
--db-cluster-identifier aurora-cluster

# Create read replica
aws rds create-db-instance \
--db-instance-identifier aurora-replica \
--db-instance-class db.t3.small \
--engine aurora-mysql \
--db-cluster-identifier aurora-cluster

DynamoDB

Fully managed NoSQL database supporting key-value and document data models with automatic scaling and millisecond latency.

Core Concepts

  • Table: Collection of items
  • Item: Similar to a row (max 400 KB)
  • Attribute: Similar to a column (no fixed schema)
  • Primary Key: Uniquely identifies an item
    • Partition Key: Hash key for distribution across partitions
    • Sort Key: Optional, enables range queries (partition key + sort key = composite key)

Capacity Modes

  • Provisioned: Reserve read/write capacity units (RCU/WCU)
    • RCU: 4 KB read per second
    • WCU: 1 KB write per second
    • Auto-scaling available
  • On-Demand: Pay per request, suitable for unpredictable workloads
    • No capacity planning
    • Higher per-request cost

Secondary Indexes

  • Global Secondary Index (GSI):
    • Different partition/sort key from base table
    • Eventual consistency
    • Separate provisioned capacity
    • Can be added/removed after creation
  • Local Secondary Index (LSI):
    • Same partition key, different sort key
    • Strong consistency
    • 10 GB size limit
    • Must be created at table creation

Advanced Features

  • DynamoDB Streams: Capture item-level modifications (INSERT, UPDATE, DELETE)
    • Useful for Lambda triggers, CDC, replication
    • 24-hour retention
  • DAX (DynamoDB Accelerator): In-memory cache for microsecond latency
  • Global Tables: Multi-region, fully replicated tables with eventual consistency
  • TTL: Automatic expiration of items based on timestamp attribute

Design Patterns

  • Sparse Indexes: Some items have sort key, others don't (flexible schema)
  • Composite Sort Keys: JSON-like sorting for complex queries
  • Batch Operations: Use batch-get-item, batch-write-item for efficiency

AWS CLI Examples

# Create table
aws dynamodb create-table \
--table-name Users \
--attribute-definitions \
AttributeName=UserId,AttributeType=S \
AttributeName=CreatedAt,AttributeType=N \
--key-schema \
AttributeName=UserId,KeyType=HASH \
AttributeName=CreatedAt,KeyType=RANGE \
--billing-mode PAY_PER_REQUEST

# Put item
aws dynamodb put-item \
--table-name Users \
--item '{"UserId": {"S": "user123"}, "Name": {"S": "John Doe"}, "Email": {"S": "john@example.com"}}'

# Get item
aws dynamodb get-item \
--table-name Users \
--key '{"UserId": {"S": "user123"}}'

# Query items by partition key
aws dynamodb query \
--table-name Users \
--key-condition-expression "UserId = :uid" \
--expression-attribute-values '{":uid": {"S": "user123"}}'

# Scan table (full table scan - use with caution)
aws dynamodb scan --table-name Users --limit 10

# Update item
aws dynamodb update-item \
--table-name Users \
--key '{"UserId": {"S": "user123"}}' \
--update-expression "SET #n = :val" \
--expression-attribute-names '{"#n": "Name"}' \
--expression-attribute-values '{":val": {"S": "Jane Doe"}}'

# Create Global Secondary Index
aws dynamodb update-table \
--table-name Users \
--attribute-definitions AttributeName=Email,AttributeType=S \
--global-secondary-index-updates '[{"Create": {"IndexName": "EmailIndex", "Keys": [{"AttributeName": "Email", "KeyType": "HASH"}], "Projection": {"ProjectionType": "ALL"}, "ProvisionedThroughput": {"ReadCapacityUnits": 5, "WriteCapacityUnits": 5}}}]'

# Enable DynamoDB Streams
aws dynamodb update-table \
--table-name Users \
--stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES

# Set TTL
aws dynamodb update-time-to-live \
--table-name Users \
--time-to-live-specification AttributeName=ExpiresAt,Enabled=true

Amazon Redshift

Fully managed data warehouse service using columnar storage and massively parallel processing (MPP) for analytics at scale.

Architecture

  • Columnar Storage: Compress similar data types for faster queries
  • MPP: Distribute queries across multiple nodes in parallel
  • Leader Node: Coordinates query execution (smallest cluster size: 2 nodes)
  • Compute Nodes: Perform calculations (dc2.large to ra3.4xlplus)

Key Features

  • Redshift Spectrum: Query data directly in S3 without loading into Redshift
  • Redshift Serverless: Auto-scaling data warehouse without cluster management
  • Query Editor: Web-based SQL editor with sample datasets
  • Concurrency Scaling: Automatically add capacity for peak loads
  • Automated Backups: Snapshots to S3 with 1-35 day retention

Use Cases

  • Historical analysis
  • Trend detection
  • Large-scale reporting
  • Ad hoc queries on structured data

AWS CLI Examples

# Create cluster
aws redshift create-cluster \
--cluster-identifier my-cluster \
--node-type dc2.large \
--number-of-nodes 2 \
--master-username awsuser \
--master-user-password MyPassword123 \
--db-name mydb \
--publicly-accessible

# Create snapshot
aws redshift create-cluster-snapshot \
--snapshot-identifier my-snapshot \
--cluster-identifier my-cluster

# Restore from snapshot
aws redshift restore-from-cluster-snapshot \
--cluster-identifier my-cluster-restored \
--snapshot-identifier my-snapshot

# Query Redshift Spectrum (S3 data)
# First, create external schema and table pointing to S3
# Then query as normal SQL

# Example query (executed via Query Editor or psql)
# SELECT * FROM s3_data.my_table WHERE year = 2024;

ElastiCache

Fully managed in-memory data store (Redis or Memcached) for caching and session management with microsecond latency.

Redis vs Memcached

FeatureRedisMemcached
Data TypesStrings, lists, sets, hashes, sorted sets, streamsStrings only
PersistenceOptional (RDB snapshots, AOF logs)None (volatile)
ReplicationPrimary/replica clustersNone
TransactionsACID transactionsNone
Pub/SubSupportedNot supported
Lua ScriptingSupportedNot supported
ClusterShard-basedMemcached Cluster Mode
Use CaseComplex caching, sessions, leaderboards, queuesSimple high-throughput caching

Caching Strategies

  • Lazy Loading (Cache-Aside):

    • Application checks cache first
    • On miss, fetch from database and populate cache
    • Simple, but slower on miss
    GET key
    IF result is NULL
    result = DB.query(key)
    CACHE.set(key, result, TTL=3600)
    RETURN result
  • Write-Through:

    • Write to cache AND database simultaneously
    • Ensures cache is always up-to-date
    • Higher write latency
    CACHE.set(key, value)
    DB.write(key, value)

Cluster Modes

  • Disabled: Single primary node with optional replicas (no sharding)
  • Enabled: Multiple shards, each with primary and replicas (horizontal scaling)

AWS CLI Examples

# Create Redis cluster
aws elasticache create-cache-cluster \
--cache-cluster-id my-redis \
--cache-node-type cache.t3.micro \
--engine redis \
--num-cache-nodes 1

# Create Redis cluster with replicas and Multi-AZ
aws elasticache create-replication-group \
--replication-group-description "Multi-AZ Redis" \
--replication-group-id my-redis-group \
--engine redis \
--cache-node-type cache.t3.small \
--num-cache-clusters 2 \
--automatic-failover-enabled

# Create Memcached cluster
aws elasticache create-cache-cluster \
--cache-cluster-id my-memcached \
--cache-node-type cache.t3.micro \
--engine memcached \
--num-cache-nodes 3

# Describe cluster
aws elasticache describe-cache-clusters --cache-cluster-id my-redis

# Create parameter group (for custom Redis settings)
aws elasticache create-cache-parameter-group \
--cache-parameter-group-name my-params \
--cache-parameter-group-family redis7 \
--description "Custom Redis parameters"

# Modify parameter
aws elasticache modify-cache-parameter-group \
--cache-parameter-group-name my-params \
--parameter-name maxmemory-policy \
--parameter-value allkeys-lru

Database Decision Guide

Choosing the right database is critical for application performance and cost. Use this guide to select the appropriate AWS service:

START: What is your primary data structure?
├─ Relational data (tables, joins, SQL)
│ ├─ Large-scale analytics queries? → REDSHIFT
│ ├─ Need 5x performance & auto-scaling? → AURORA
│ └─ Standard RDBMS? → RDS (MySQL/PostgreSQL)

├─ Key-value or document data
│ ├─ Unpredictable traffic? → DYNAMODB (On-Demand)
│ ├─ Predictable, high-volume? → DYNAMODB (Provisioned)
│ ├─ Simple caching? → ELASTICACHE (Memcached)
│ └─ Complex caching & sessions? → ELASTICACHE (Redis)

├─ Time-series data
│ ├─ Metrics & monitoring → TIMESTREAM
│ └─ Application logs → CLOUDWATCH LOGS

├─ Graph data (social networks, recommendations)
│ └─ NEPTUNE

├─ Immutable audit trail
│ └─ QLDB (Quantum Ledger Database)

├─ Search and analytics on documents
│ └─ OPENSEARCH

└─ Unstructured data (files, media)
└─ S3

Quick Selection Matrix

Use CaseBest ServiceAlternative
Web application dataRDS or AuroraDynamoDB
Mobile app backendDynamoDBFirebase Realtime DB
Session storageElastiCache RedisDynamoDB with TTL
Leaderboards/rankingsElastiCache Redis (sorted sets)DynamoDB with GSI
Full-text searchOpenSearchDynamoDB with Elasticsearch
Analytics queriesRedshiftAthena (S3)
Cache layerElastiCacheCloudFront
Data lakeS3 + Athena + GlueRedshift Spectrum
Real-time streamingKinesisLambda + DynamoDB
Graph relationshipsNeptuneDynamoDB with adjacency lists

Data Management Best Practices

Backup & Disaster Recovery

  • Regular Snapshots: Automate backups for all databases
  • Cross-Region Replicas: Protect against regional failures
  • Test Restores: Periodically restore from backups to verify integrity
  • Retention Policy: Balance storage costs with compliance requirements

Performance Optimization

  • Connection Pooling: Use RDS Proxy for databases, DAX for DynamoDB
  • Caching Strategy: Implement multi-layer caching (ElastiCache → Database)
  • Indexing: Create indexes on frequently queried columns
  • Query Optimization: Use EXPLAIN to analyze slow queries
  • Sharding/Partitioning: Distribute data for horizontal scaling

Security

  • Encryption: Enable at-rest and in-transit encryption
  • IAM Policies: Use least-privilege access
  • Secrets Management: Store credentials in AWS Secrets Manager
  • Network Isolation: Use VPC security groups and subnets
  • Audit Logging: Enable CloudTrail and database audit logs

Cost Optimization

  • Right-Sizing: Monitor utilization and adjust instance types
  • Storage Classes: Use S3 lifecycle policies for cost reduction
  • Reserved Instances: Commit to long-term capacity for discounts
  • On-Demand for Variables: Use DynamoDB On-Demand for unpredictable workloads
  • Monitoring: Set up CloudWatch alarms for anomalies

Hands-On Exercises

Exercise 1: Create S3 Bucket with Lifecycle Policy and Versioning

# Create bucket
aws s3 mb s3://my-lifecycle-bucket-$(date +%s)

# Enable versioning
aws s3api put-bucket-versioning \
--bucket my-lifecycle-bucket-1234567890 \
--versioning-configuration Status=Enabled

# Create lifecycle policy file (lifecycle.json)
cat > lifecycle.json << 'EOF'
{
"Rules": [
{
"Id": "TransitionRule",
"Status": "Enabled",
"Prefix": "logs/",
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 90,
"StorageClass": "GLACIER_IR"
}
],
"Expiration": {
"Days": 365
}
}
]
}
EOF

# Apply lifecycle policy
aws s3api put-bucket-lifecycle-configuration \
--bucket my-lifecycle-bucket-1234567890 \
--lifecycle-configuration file://lifecycle.json

# Upload test files
echo "Test log 1" > logs/app-2024-01.log
echo "Test log 2" > logs/app-2024-02.log
aws s3 cp logs/ s3://my-lifecycle-bucket-1234567890/logs/ --recursive

# Verify versioning
aws s3api list-object-versions --bucket my-lifecycle-bucket-1234567890

Exercise 2: Launch RDS MySQL with Multi-AZ and Read Replica

# Create MySQL instance with Multi-AZ
aws rds create-db-instance \
--db-instance-identifier myapp-db \
--db-instance-class db.t3.micro \
--engine mysql \
--engine-version 8.0.35 \
--master-username admin \
--master-user-password MySecurePassword123! \
--allocated-storage 20 \
--multi-az \
--backup-retention-period 7 \
--storage-encrypted \
--enable-cloudwatch-logs-exports error,general,slowquery

# Wait for DB to be available (check status)
aws rds describe-db-instances \
--db-instance-identifier myapp-db \
--query 'DBInstances[0].DBInstanceStatus'

# Create read replica in same region
aws rds create-db-instance-read-replica \
--db-instance-identifier myapp-db-replica \
--source-db-instance-identifier myapp-db

# Verify replication lag
aws rds describe-db-instances \
--db-instance-identifier myapp-db-replica \
--query 'DBInstances[0].[DBInstanceStatus,ReplicationLag]'

# Connect to primary (get endpoint)
ENDPOINT=$(aws rds describe-db-instances \
--db-instance-identifier myapp-db \
--query 'DBInstances[0].Endpoint.Address' \
--output text)

echo "Connect to primary: mysql -h $ENDPOINT -u admin -p"
echo "Connect to replica: mysql -h $(aws rds describe-db-instances --db-instance-identifier myapp-db-replica --query 'DBInstances[0].Endpoint.Address' --output text) -u admin -p"

Exercise 3: Create DynamoDB Table and Query with CLI

# Create Users table
aws dynamodb create-table \
--table-name Users \
--attribute-definitions \
AttributeName=UserId,AttributeType=S \
AttributeName=CreatedAt,AttributeType=N \
--key-schema \
AttributeName=UserId,KeyType=HASH \
AttributeName=CreatedAt,KeyType=RANGE \
--billing-mode PAY_PER_REQUEST

# Wait for table creation
aws dynamodb wait table-exists --table-name Users

# Add sample items
aws dynamodb put-item \
--table-name Users \
--item '{
"UserId": {"S": "user001"},
"CreatedAt": {"N": "1694894400"},
"Name": {"S": "Alice"},
"Email": {"S": "alice@example.com"}
}'

aws dynamodb put-item \
--table-name Users \
--item '{
"UserId": {"S": "user002"},
"CreatedAt": {"N": "1694894500"},
"Name": {"S": "Bob"},
"Email": {"S": "bob@example.com"}
}'

# Query user by ID
aws dynamodb query \
--table-name Users \
--key-condition-expression "UserId = :uid" \
--expression-attribute-values '{":uid": {"S": "user001"}}'

# Scan table (get all items)
aws dynamodb scan --table-name Users

# Create Global Secondary Index on Email
aws dynamodb update-table \
--table-name Users \
--attribute-definitions AttributeName=Email,AttributeType=S \
--global-secondary-index-updates '[{
"Create": {
"IndexName": "EmailIndex",
"Keys": [{"AttributeName": "Email", "KeyType": "HASH"}],
"Projection": {"ProjectionType": "ALL"},
"ProvisionedThroughput": {"ReadCapacityUnits": 5, "WriteCapacityUnits": 5}
}
}]'

# Query by email using GSI
aws dynamodb query \
--table-name Users \
--index-name EmailIndex \
--key-condition-expression "Email = :email" \
--expression-attribute-values '{":email": {"S": "alice@example.com"}}'

Exercise 4: Set Up Redshift Cluster and Run Analytics Query

# Create Redshift cluster (note: this takes 10-15 minutes)
aws redshift create-cluster \
--cluster-identifier analytics-cluster \
--node-type dc2.large \
--number-of-nodes 2 \
--master-username awsuser \
--master-user-password MyPassword123 \
--db-name analytics \
--publicly-accessible

# Wait for cluster availability
aws redshift describe-clusters \
--cluster-identifier analytics-cluster \
--query 'Clusters[0].ClusterStatus'

# Get cluster endpoint
ENDPOINT=$(aws redshift describe-clusters \
--cluster-identifier analytics-cluster \
--query 'Clusters[0].Endpoint.Address' \
--output text)

echo "Cluster endpoint: $ENDPOINT"

# Connect using psql (install if needed: brew install libpq)
# psql -h $ENDPOINT -U awsuser -d analytics

# Create sample table and load data
cat > redshift-setup.sql << 'EOF'
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
date DATE,
product_name VARCHAR(100),
quantity INT,
price DECIMAL(10,2)
);

INSERT INTO sales VALUES
(1, '2024-01-01', 'Widget A', 100, 29.99),
(2, '2024-01-02', 'Widget B', 50, 49.99),
(3, '2024-01-03', 'Widget A', 75, 29.99);

SELECT product_name, SUM(quantity) as total_sold, SUM(quantity * price) as total_revenue
FROM sales
GROUP BY product_name
ORDER BY total_revenue DESC;
EOF

# Execute script (requires psql installed and network access)
# psql -h $ENDPOINT -U awsuser -d analytics -f redshift-setup.sql

Resources