AWS Architecture Guide 2026: Build Scalable Cloud Apps

AWS powers 31% of the global cloud market and runs everything from two-person startups to Netflix, Airbnb, and NASA. But with 200+ services and a near-infinite number of ways to build on it, getting your architecture right from the start matters enormously.

A poorly designed architecture leads to runaway costs, security breaches, and systems that can't scale. A well-designed one gives you predictable performance, automatic scaling, and infrastructure costs that scale proportionally with your business.

At Codazz, we've architected AWS systems for startups scaling from 0 to millions of users. This guide distills everything into actionable best practices for 2026.

The AWS Well-Architected Framework: 5 Pillars

The Well-Architected Framework is AWS's blueprint for building production systems. Every architecture decision should be evaluated against these five pillars:

Operational Excellence

Automate everything: deployments, scaling, recovery, alerting
Infrastructure as Code (Terraform, CDK, CloudFormation) for all resources
Runbooks and playbooks for common operational tasks
Continuous improvement through post-incident reviews

Security

Implement a strong identity foundation with least-privilege IAM
Enable traceability: CloudTrail, Config, GuardDuty, Security Hub
Apply security at all layers: network, compute, data, application
Automate security best practices using AWS Config Rules and SCPs

Reliability

Automatically recover from failure using health checks and auto-scaling
Test recovery procedures: chaos engineering, game days
Scale horizontally to increase aggregate system availability
Stop guessing capacity: use auto-scaling groups and serverless where possible

Performance Efficiency

Choose the right resource type: Graviton4 for general compute, GPU for ML
Use managed services to reduce undifferentiated heavy lifting
Use serverless architectures to remove operational burden
Benchmark regularly and review performance metrics quarterly

Cost Optimization

Adopt a consumption model: pay only for what you use
Measure overall efficiency with AWS Cost Explorer and Trusted Advisor
Stop spending money on undifferentiated heavy lifting
Analyze and attribute expenditure with cost allocation tags

Serverless Architecture: Lambda, API Gateway & DynamoDB

Serverless is the default recommendation for new APIs and event-driven workloads in 2026. With Lambda SnapStart eliminating cold starts and DynamoDB on-demand pricing, the cost and operational benefits are compelling.

Service	Role	Pricing	Best For
Lambda	Compute	$0.20/1M reqs + compute	Event-driven functions, APIs
API Gateway HTTP	Request routing	$1.00/1M requests	REST/WebSocket APIs
DynamoDB On-Demand	Database	$1.25/1M writes, $0.25/1M reads	Key-value, variable traffic
SQS	Message queue	$0.40/1M requests	Async processing, decoupling
EventBridge	Event bus	$1.00/1M events	Service-to-service events
Step Functions	Orchestration	$25/1M state transitions	Complex workflows

Lambda + API Gateway: Production Pattern

# SAM template: Lambda API with SnapStart enabled
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Globals:
  Function:
    Timeout: 30
    MemorySize: 512
    Environment:
      Variables:
        TABLE_NAME: !Ref AppTable
        STAGE: !Ref Stage

Resources:
  ApiFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: index.handler
      Runtime: nodejs20.x
      SnapStart:
        ApplyOn: PublishedVersions  # Sub-200ms cold starts
      AutoPublishAlias: live
      Events:
        Api:
          Type: HttpApi
          Properties:
            Path: /api/{proxy+}
            Method: ANY
            Auth:
              Authorizer: JwtAuthorizer

  # DynamoDB with on-demand pricing (no capacity planning)
  AppTable:
    Type: AWS::DynamoDB::Table
    Properties:
      BillingMode: PAY_PER_REQUEST
      TableClass: STANDARD
      PointInTimeRecoverySpecification:
        PointInTimeRecoveryEnabled: true
      SSESpecification:
        SSEEnabled: true
      AttributeDefinitions:
        - AttributeName: PK
          AttributeType: S
        - AttributeName: SK
          AttributeType: S
      KeySchema:
        - AttributeName: PK
          KeyType: HASH
        - AttributeName: SK
          KeyType: RANGE

Serverless Cost Reality Check

A serverless API handling 10 million requests/month costs approximately $12/month (Lambda) + $10/month (API Gateway) + DynamoDB usage. Compare that to $150-300/month for equivalent EC2/RDS infrastructure. At low-to-medium traffic, serverless wins on cost. At very high, sustained traffic (>100M req/month), committed EC2 with Savings Plans may be cheaper.

Containerized Apps: ECS vs EKS

When serverless doesn't fit (long-running processes, WebSockets, CPU-intensive workloads), containers on ECS or EKS are the answer. Here's when to choose each:

Factor	ECS (Elastic Container Service)	EKS (Elastic Kubernetes)
Complexity	Low — AWS-native, simpler API	High — Kubernetes expertise required
Control plane cost	Free	$0.10/hr per cluster (~$73/mo)
Ecosystem	AWS services only	Huge CNCF/Kubernetes ecosystem
Scaling	Service Auto Scaling, KEDA	HPA, KEDA, Karpenter
Best for	Startups, AWS-only shops	Multi-cloud, large teams, k8s expertise
Launch type	EC2 or Fargate (serverless)	EC2 or Fargate
Migration effort	Lower from Docker Compose	Lower from existing k8s

ECS Fargate: Auto-Scaling Configuration

# Terraform: ECS Fargate with Application Load Balancer
resource "aws_ecs_service" "api" {
  name            = "api-service"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.api.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = aws_subnet.private_app[*].id
    security_groups  = [aws_security_group.ecs_tasks.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.api.arn
    container_name   = "api"
    container_port   = 3000
  }

  deployment_controller { type = "ECS" }
  deployment_minimum_healthy_percent = 100
  deployment_maximum_percent         = 200
}

# Target tracking: scale on CPU utilization
resource "aws_appautoscaling_policy" "cpu_scaling" {
  name               = "cpu-auto-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.ecs.resource_id
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"

  target_tracking_scaling_policy_configuration {
    target_value       = 65.0  # Keep CPU below 65%
    scale_in_cooldown  = 300   # 5 min to scale in
    scale_out_cooldown = 60    # 1 min to scale out

    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
  }
}

RDS vs DynamoDB vs Aurora: Database Selection

Database selection is one of the most consequential architecture decisions. Choose based on access patterns, not familiarity. Here's a complete comparison:

Factor	RDS PostgreSQL	DynamoDB	Aurora Serverless v3
Type	Relational SQL	NoSQL key-value	Relational SQL (serverless)
Latency	1-10ms	<1ms single-digit	1-10ms (cold: ~1s)
Scale to zero	No (min $15/mo)	Yes (on-demand)	Yes (ACU = 0)
Max throughput	Vertical scaling	Unlimited (horizontal)	Auto-scales to 128 ACUs
Complex queries	Full SQL, JOINs, ACID	Limited — single table	Full SQL, JOINs, ACID
Starting cost	~$15/mo (t4g.micro)	$0 (pay per request)	$0 (serverless)
Best for	Complex relational data	High-throughput, key-value	Variable traffic, SQL

Our Database Recommendation at Codazz

New projects with variable traffic: Aurora Serverless v3. SQL + scales to zero = best of both worlds.
High-throughput key-value (gaming, sessions, carts): DynamoDB on-demand.
Existing PostgreSQL teams with predictable load: RDS with Multi-AZ and read replicas.
Always add: ElastiCache Redis as a caching layer to reduce database load by 80-90%.

S3 Best Practices & CloudFront CDN

S3 stores virtually unlimited data at $0.023/GB/month. CloudFront is AWS's CDN with 450+ Points of Presence globally. Together they handle static assets, user uploads, and media delivery at any scale.

S3 Storage Classes (Choose Wisely)

Standard ($0.023/GB): frequently accessed. Intelligent-Tiering: unpredictable access, auto-moves objects between tiers. Standard-IA ($0.0125/GB): infrequent access, ~30-day minimum. Glacier Instant Retrieval ($0.004/GB): archives with ms retrieval. Glacier Deep Archive ($0.00099/GB): 7-10 year compliance retention.

CloudFront: Cache Configuration

Use Origin Access Control (OAC) to restrict S3 bucket access to CloudFront only. Enable Brotli + Gzip compression (15-25% smaller files). Cache TTL strategy: hashed JS/CSS assets = 1 year, index.html = 60 seconds, API responses = no-cache. Use Cache Policies and Origin Request Policies for fine-grained control.

Pre-Signed URLs for User Uploads

Generate short-lived (15 min) pre-signed PUT URLs server-side. Client uploads directly to S3, bypassing your servers entirely — no bandwidth cost, no memory pressure. Validate file type and size with an S3 event trigger invoking Lambda before moving to the final location.

S3 Lifecycle Policies

Auto-transition objects: Standard → Standard-IA after 30 days, → Glacier after 90 days. Delete incomplete multipart uploads after 7 days (surprisingly common source of waste). Expire non-current versions after 30 days. These policies alone typically save 40-70% on storage costs.

IAM Security Best Practices

Security misconfigurations are the #1 cause of cloud data breaches. Here are non-negotiable IAM practices for production AWS accounts:

1. Least Privilege IAM Roles

Every service, ECS task, and Lambda function gets its own IAM role with minimum required permissions. Never use AdministratorAccess in production. Use IAM Access Analyzer to generate least-privilege policies from actual access patterns. Review and tighten policies quarterly.

2. Secrets Manager — Never Hardcode

Store all secrets (database passwords, API keys, OAuth credentials) in AWS Secrets Manager. Enable automatic rotation for database credentials (RDS, Aurora natively supported). Lambda and ECS tasks retrieve secrets at runtime. Cost: $0.40/secret/month — the cheapest insurance you can buy.

3. Encryption Everywhere

S3: default encryption with SSE-S3 or SSE-KMS. RDS/Aurora: enable encryption at rest (must be set at creation). DynamoDB: encryption at rest enabled by default. EBS volumes: encrypt all volumes. Use AWS Certificate Manager (free) for TLS 1.3 on all public endpoints.

4. Multi-Account Organization Structure

Use AWS Organizations with separate accounts for: production, staging, development, shared services (DNS, monitoring), and security audit. Apply Service Control Policies (SCPs) to prevent dangerous actions (disabling CloudTrail, removing MFA). Use AWS Control Tower for automated governance.

5. Threat Detection & Compliance

Enable GuardDuty in all regions ($3-30/month depending on usage) for ML-powered threat detection. Use Security Hub to aggregate findings. Enable AWS Config with managed rules for continuous compliance checks. Set up CloudWatch Alarms for root account usage, unauthorized API calls, and MFA failures.

Cost Optimization: Reserved vs Spot Instances

Most teams overspend on AWS by 30-50%. These are the highest-ROI cost reduction levers, in order of impact:

Pricing Model	Discount vs On-Demand	Commitment	Best For
On-Demand	0%	None	Dev, testing, variable workloads
Savings Plans (1yr)	~40%	1 year spend	Predictable compute (EC2, Fargate, Lambda)
Savings Plans (3yr)	~60%	3 year spend	Stable long-term workloads
Reserved Instances (1yr)	~40%	1 year + instance family	Specific instance types, RDS
Reserved Instances (3yr)	~72%	3 year + instance family	Locked-in, stable databases
Spot Instances	60-90%	Can be interrupted 2-min notice	Batch, CI/CD, fault-tolerant workers

Right-Sizing (Do This First)

AWS Compute Optimizer analyzes 14 days of CloudWatch metrics and recommends right-sized instances. Most teams run 2-4x more compute than needed. Downsizing is the highest-impact, lowest-risk cost action. Typical savings: 30-50% of compute costs.

Spot for CI/CD and Dev Environments

Your GitHub Actions runners, Jenkins agents, and dev environments don't need guaranteed availability. Use Spot instances with a mixed strategy (On-Demand + Spot) to maintain availability while cutting costs 60-90%. ECS capacity providers make this straightforward.

NAT Gateway Elimination

NAT Gateways cost $0.045/GB processed — often the #1 surprise on AWS bills. Add S3 and DynamoDB gateway endpoints (free) to route traffic directly. Add interface endpoints for ECR, Secrets Manager, and CloudWatch to reduce NAT data. Typical savings: $100-2,000+/month.

S3 Intelligent-Tiering

Enable S3 Intelligent-Tiering for all buckets with objects you access unpredictably. AWS automatically moves objects between Frequent Access and Infrequent Access tiers. No retrieval fees. Monitoring charge: $0.0025 per 1,000 objects. Break-even at ~30 days of infrequent access.

Multi-Region Architecture & Disaster Recovery

Multi-region architecture protects against regional AWS outages (rare but catastrophic). It also reduces latency for globally distributed users. Here are the four DR strategies, ranked by cost and recovery capability:

Backup & Restore

Periodic backups to S3 Cross-Region Replication. Restore from backups on disaster. Lowest cost, longest recovery time. Good for non-critical systems.

RTO: Hours

RPO: Hours

Pilot Light

Minimal secondary region footprint: database replicas, no compute. Scale up compute on failover. Moderate cost with reasonable recovery. Best for most applications.

RTO: ~15 min

RPO: Minutes

Warm Standby

Scaled-down but running secondary environment. Route 53 health-check failover. Fast recovery with moderate cost. Good for business-critical applications.

RTO: <5 min

RPO: <1 min

$$$

Active-Active

Full capacity in multiple regions simultaneously. Route 53 latency-based routing + health checks. DynamoDB Global Tables for active-active database. 2x cost but best resilience.

RTO: <1 min

RPO: Near zero

$$$$

Aurora Global Database: Cross-Region Replication

# Terraform: Aurora Global Database (primary + replica)
resource "aws_rds_global_cluster" "main" {
  global_cluster_identifier = "app-global-cluster"
  engine                    = "aurora-postgresql"
  engine_version            = "16.2"
  database_name             = "app"
}

# Primary cluster (us-east-1)
resource "aws_rds_cluster" "primary" {
  provider                  = aws.us_east_1
  engine                    = "aurora-postgresql"
  engine_mode               = "provisioned"
  global_cluster_identifier = aws_rds_global_cluster.main.id
  cluster_identifier        = "app-primary"
  master_username           = var.db_username
  manage_master_user_password = true

  serverlessv2_scaling_configuration {
    min_capacity = 0.5
    max_capacity = 32
  }
}

# Secondary read replica (eu-west-1) — <1s replication lag
resource "aws_rds_cluster" "secondary" {
  provider                  = aws.eu_west_1
  engine                    = "aurora-postgresql"
  global_cluster_identifier = aws_rds_global_cluster.main.id
  cluster_identifier        = "app-secondary"
  # Read-only: promotes to primary on failover in <30 seconds
}

Frequently Asked Questions

How much does a production AWS architecture cost per month?

A typical early-stage startup (ECS Fargate 2 tasks, Aurora Serverless, S3 + CloudFront, ALB) runs $200-500/month. A mid-scale SaaS (auto-scaling ECS, RDS Multi-AZ, ElastiCache, WAF) costs $1,500-5,000/month. An enterprise multi-region architecture starts at $5,000-20,000+/month. Serverless (Lambda + DynamoDB) can start as low as $10/month for low traffic.

Should I choose ECS or EKS for containerized applications?

Choose ECS if: you're a startup or small team, you're AWS-only, and you want simplicity. Choose EKS if: you have existing Kubernetes expertise, you need multi-cloud portability, or you rely heavily on the CNCF/Kubernetes ecosystem (Istio, Karpenter, ArgoCD). For most startups and mid-size companies, ECS + Fargate is the right default — less complexity, no control plane cost, and full AWS integration.

What is the AWS Well-Architected Framework and why does it matter?

The Well-Architected Framework is AWS's set of guidelines across 5 pillars: Operational Excellence, Security, Reliability, Performance Efficiency, and Cost Optimization. It matters because it gives you a systematic way to evaluate your architecture against proven best practices. AWS offers free Well-Architected Reviews through the console or AWS Partner Network. At Codazz, every architecture we design is reviewed against all 5 pillars before launch.

What is the best disaster recovery strategy for a startup?

Start with Pilot Light: Aurora Global Database replica in a second region, S3 Cross-Region Replication, and Route 53 health-check failover. This gives you ~15-minute RTO with less than 1-minute RPO at a fraction of the cost of active-active. Test your failover quarterly. As you grow and SLAs tighten, evolve to Warm Standby, then Active-Active.

How do I reduce my AWS bill without impacting production?

Follow this sequence: (1) Use Compute Optimizer to identify over-provisioned instances and right-size. (2) Purchase 1-year Compute Savings Plans for predictable workloads — 40% savings, no risk. (3) Add S3/DynamoDB VPC endpoints to eliminate NAT Gateway data charges. (4) Enable S3 Intelligent-Tiering. (5) Move CI/CD and dev environments to Spot instances. These five steps typically reduce AWS bills by 35-50% with zero production impact.

Need Help Designing Your AWS Architecture?

We'll review your current setup (or design from scratch), identify cost savings, and deliver a production-ready architecture with full Terraform/CDK code.

Get a Free AWS Architecture Review

AWS Architecture Best Practices 2026: Scalable, Secure Cloud Design