Step 4: Scale Refinement

Step 4 of 6: R - Refinement for Scale

Optimize the architecture to handle billions of users with database sharding, global distribution, and advanced caching

🔥 Scale Challenges at 2 Billion Users

⚠️ Where Our Core Architecture Breaks

🔥 Current Bottlenecks

• Single DynamoDB table: 40K WCU/RCU limits hit
• Celebrity posts: 100M followers = 100M writes
• Hot partitions: Popular users overload specific shards
• Global latency: 500ms+ for cross-continent requests
• Cache misses: Redis memory limits at 100TB scale

📊 Scale Requirements

• Read QPS: 7 million/sec (peak: 10M)
• Write QPS: 65K/sec (peak: 100K)
• Storage: 456 PB/year growth
• Bandwidth: 950 GB/sec egress
• P95 Latency: Must stay under 100ms

🗄️ DynamoDB Sharding Strategy

📊 Horizontal Sharding Architecture

DynamoDB Sharding Strategy (32 Shards)

Sharding Router (Feed Service)Shard = hash(userId) % 32Posts Tables (Sharded by userId)Posts_062.5M users15M posts/day40K WCUPosts_162.5M users15M posts/day40K WCUPosts_262.5M users15M posts/day40K WCU...Posts_3162.5M users15M posts/day40K WCUFollow Tables (Sharded by followerId)Follow_0+ GSI62.5M users20K WCUFollow_1+ GSI62.5M users20K WCU...Follow_31+ GSI62.5M users20K WCUPrecomputedFeed Tables (Sharded by userId)Feed_062.5M usersTTL: 30 days100K RCUFeed_162.5M usersTTL: 30 days100K RCU...Feed_3162.5M usersTTL: 30 days100K RCUCross-Shard Query ServiceFor analytics & admin queriesElasticsearch + Aggregation layer

✅ Sharding Implementation

Shard Routing Logic:

// Deterministic shard routing
function getShardNumber(userId: string): number {
const hash = murmurHash3(userId);
return hash % 32;  // 32 total shards
}

// Service implementation
class FeedService {
async createPost(userId: string, post: Post) {
  const shard = getShardNumber(userId);
  const tableName = `Posts_${shard}`;

  await dynamodb.putItem({
    TableName: tableName,
    Item: {
      postId: generateId(),
      userId: userId,
      content: post.content,
      createdAt: Date.now()
    }
  });
}

async getFollowers(userId: string) {
  const shard = getShardNumber(userId);
  const tableName = `Follow_${shard}`;

  // Query GSI on the sharded table
  return await dynamodb.query({
    TableName: tableName,
    IndexName: 'FollowingId-FollowerId-Index',
    KeyConditionExpression: 'followingId = :userId',
    ExpressionAttributeValues: {
      ':userId': userId
    }
  });
}
}

Benefits:

• Each shard stays under DynamoDB limits (40K WCU/RCU)
• Linear scaling - add more shards as needed
• Fault isolation - shard failure affects only 1/32 of users
• Predictable performance per shard

⚠️ Sharding Challenges & Solutions

Hot Shard Problem:

Celebrity users can overload their shard. Solution: Implement virtual sharding where popular users span multiple shards.

Cross-Shard Queries:

Analytics need data from all shards. Solution: Async ETL to data warehouse (ClickHouse) for aggregated queries.

Resharding:

Growing from 32 to 64 shards requires data migration. Solution: Consistent hashing with virtual nodes for smoother transitions.

Shard Key Selection:

userId ensures all user data is co-located, but creates hotspots. Alternative: Composite keys (userId + timestamp) for time-series optimization.

🌍 Global Multi-Region Architecture

Global Distribution Strategy - 6 Regions

US East (Primary)ServicesRedisDynamoDBS3 PrimaryKafkaML ModelsShards: 0-7Users: 500MStatus: Read/WriteUS West (Secondary)ServicesRedisDynamoDBShards: 8-15Users: 400MStatus: Read/WriteEU (GDPR Primary)ServicesRedisDynamoDBShards: 16-23Users: 600MStatus: Read/WriteAsia PacificSingapore + MumbaiShards: 24-27Users: 300MStatus: Read/WriteSouth AmericaSão PauloShards: 28-29Users: 150MStatus: Read HeavyAfrica/Middle EastCape Town + DubaiShards: 30-31Users: 50MStatus: Read HeavyGlobal Load BalancerGeoDNS + AnycastGlobal CDN200+ Edge LocationsAsyncAsync

🌐 Regional Strategy

Data Residency:

• EU users' data stays in EU (GDPR compliance)
• China users served from separate infrastructure
• Cross-region replication for public content only

Request Routing:

• GeoDNS routes to nearest region
• Anycast IP for ultra-low latency
• Sticky sessions for consistency

Failover Strategy:

• US East → US West (10 seconds)
• EU → US East for non-GDPR users
• Health checks every 5 seconds

⚡ Cross-Region Optimization

DynamoDB Global Tables:

{
"GlobalTableName": "Posts",
"Regions": [
  {
    "RegionName": "us-east-1",
    "ReplicaStatus": "ACTIVE",
    "Role": "PRIMARY"
  },
  {
    "RegionName": "eu-west-1",
    "ReplicaStatus": "ACTIVE",
    "Role": "SECONDARY",
    "ReplicationLag": "< 1 second"
  }
],
"ConsistencyModel": "EVENTUAL",
"ConflictResolution": "LAST_WRITER_WINS"
}

Performance Targets:

• Intra-region latency: <50ms
• Cross-region replication: <1 second
• Global cache hit rate: >85%

🚀 Multi-Tier Caching Strategy

🏗️ Cache Architecture

L1: Application Cache

• Location: In-memory on app servers
• Size: 16GB per instance
• TTL: 60 seconds
• Content: Hot user sessions, API responses
• Hit Rate: 30-40%

L2: Redis Cluster

• Location: Dedicated cache tier
• Size: 10TB per region
• TTL: 15 minutes
• Content: Feeds, user data, social graph
• Hit Rate: 70-80%

L3: CDN Edge Cache

• Location: 200+ edge locations
• Size: 100TB globally
• TTL: 24 hours
• Content: Media files, static content
• Hit Rate: 85-95%

⚡ Caching Strategies

Feed Caching Logic:

class FeedCache {
async getFeed(userId: string): Feed {
  // L1: Check application cache
  const l1Key = `feed:${userId}`;
  let feed = this.appCache.get(l1Key);
  if (feed) return feed;

  // L2: Check Redis
  const l2Key = `feed:${userId}`;
  feed = await this.redis.get(l2Key);
  if (feed) {
    this.appCache.set(l1Key, feed, 60);
    return feed;
  }

  // Generate feed (cache miss)
  feed = await this.generateFeed(userId);

  // Write-through caching
  await Promise.all([
    this.redis.setex(l2Key, 900, feed),  // 15 min
    this.appCache.set(l1Key, feed, 60)   // 1 min
  ]);

  return feed;
}

async invalidateFeed(userId: string) {
  // Cascade invalidation
  const l1Key = `feed:${userId}`;
  const l2Key = `feed:${userId}`;

  this.appCache.delete(l1Key);
  await this.redis.del(l2Key);

  // Pre-warm cache for active users
  if (await this.isActiveUser(userId)) {
    const feed = await this.generateFeed(userId);
    await this.redis.setex(l2Key, 900, feed);
  }
}
}

Cache Warming:

• Pre-compute feeds for users active in last hour
• Celebrity posts cached immediately
• Trending content pre-cached globally

Smart Eviction:

• LRU with frequency bias
• Celebrity content never evicted
• Predictive eviction based on user patterns

📊 Cache Performance Impact

Response Time

45ms

P95 with caching

vs 200ms without

Database Load

-85%

Reduction in DB queries

7M → 1M QPS

Cost Savings

$2.5M

Monthly savings

Reduced DB capacity

🌟 Celebrity User Optimization

🎯 The Celebrity Problem at Scale

Scenario: Cristiano Ronaldo (600M followers) posts a photo

Without Optimization:

• 600M write operations to PrecomputedFeed
• 600M notifications to send
• ~10 minutes of 100% CPU across cluster
• $500 in immediate compute costs
• System-wide latency spike

Solution: Hybrid Pull + Heavy Caching

With Optimization:

• 0 immediate writes (pull model)
• Post cached in all regions
• Push only to verified/VIP followers
• Batch notifications over 30 minutes
• <$5 in compute costs

Celebrity Detection & Routing:

class CelebrityOptimizer {
// Celebrity threshold configuration
private readonly CELEBRITY_THRESHOLD = 100000;
private readonly MEGA_CELEBRITY_THRESHOLD = 10000000;

async handlePost(userId: string, post: Post) {
  const user = await this.userService.getUser(userId);

  if (user.followerCount >= this.MEGA_CELEBRITY_THRESHOLD) {
    return this.handleMegaCelebrityPost(user, post);
  } else if (user.followerCount >= this.CELEBRITY_THRESHOLD) {
    return this.handleCelebrityPost(user, post);
  } else {
    return this.handleRegularPost(user, post);
  }
}

async handleMegaCelebrityPost(user: User, post: Post) {
  // 1. Store post (no fanout)
  await this.postService.create(post);

  // 2. Cache aggressively
  await Promise.all([
    this.cacheGlobally(post, 86400),     // 24 hour TTL
    this.indexForSearch(post),
    this.addToTrendingPosts(post)
  ]);

  // 3. Notify only VIP followers (< 1000)
  const vipFollowers = await this.getVipFollowers(user.id);
  await this.pushToFeeds(vipFollowers, post);

  // 4. Batch notify regular followers
  await this.queueBatchNotifications(user.id, post.id);
}

async handleCelebrityPost(user: User, post: Post) {
  // Hybrid approach
  const activeFollowers = await this.getActiveFollowers(user.id, 24); // Active in last 24h

  // Push to active followers only (typically 10-20% of total)
  await this.pushToFeeds(activeFollowers, post);

  // Cache for pull model
  await this.cacheRegionally(post, 3600); // 1 hour TTL
}

private async cacheGlobally(post: Post, ttl: number) {
  const regions = ['us-east', 'us-west', 'eu', 'asia', 'sa', 'africa'];

  await Promise.all(regions.map(region =>
    this.regionalCache[region].setex(
      `celebrity:post:${post.id}`,
      ttl,
      post
    )
  ));
}
}

✅ Optimization Techniques

• Dedicated Infrastructure: Celebrity content on separate shards
• Pre-caching: Celebrity posts cached before viral
• Rate Limiting: Throttle fanout to prevent overload
• Prioritization: VIP followers get updates first
• Lazy Loading: Regular users pull on-demand

📊 Performance Impact

• Write Reduction: 99.9% fewer writes for celebrities
• Latency: No impact on regular users
• Cost Savings: $5M/month in compute
• Scalability: Can handle users with 1B+ followers
• Reliability: No system-wide failures from viral posts

📊 Performance Monitoring & SLAs

Feed Load Time

45ms

P95 Latency

Target: <100ms

Availability

99.99%

Uptime SLA

4.38 min/month

Cache Hit Rate

87%

Global Average

Target: >85%

Error Rate

0.001%

Request Failures

Target: <0.01%

🔍 Monitoring Stack

Metrics Collection

• Prometheus: Time-series metrics
• Grafana: Real-time dashboards
• CloudWatch: AWS resource monitoring
• Custom Metrics: Business KPIs

Distributed Tracing

• Jaeger: End-to-end request tracing
• X-Ray: AWS service maps
• OpenTelemetry: Standard instrumentation
• Correlation IDs: Cross-service tracking

Alerting & Response

• PagerDuty: On-call management
• Severity Levels: P0-P3 classification
• Auto-remediation: Self-healing systems
• Runbooks: Automated response

💰 Cost Optimization at Scale

📊 Monthly Cost Breakdown (2B Users)

DynamoDB (32 shards)$1.2M

Redis Cache Clusters$800K

S3 Storage (456 PB)$9.5M

CDN (CloudFront)$2.5M

Compute (EC2/Lambda)$3.2M

Data Transfer$1.8M

ML/Analytics$1.0M

Total Monthly Cost$20M

Cost per DAU: $0.01/month

💡 Optimization Strategies

Storage Optimization:

• S3 Intelligent Tiering: Save 40% on cold data
• Media compression: 30% size reduction
• Deduplication: 15% storage savings

Compute Optimization:

• Spot instances for batch jobs: 70% savings
• Reserved instances: 40% discount
• Auto-scaling: Right-size capacity

Database Optimization:

• On-demand → Provisioned capacity: 50% savings
• TTL for old data: Reduce storage 30%
• Query optimization: Reduce read costs 25%

Potential Savings: $6M/month (30%)

🎯 Key Scale Refinements

✅ Database Sharding

32 shards handle 2B users with room for growth to 64 shards

✅ Global Distribution

6 regions provide <50ms latency for 95% of users worldwide

✅ Multi-Tier Caching

87% cache hit rate reduces database load by 85%

✅ Celebrity Optimization

Pull model for celebrities prevents system overload

✅ Performance Monitoring

Real-time dashboards ensure 99.99% availability SLA

✅ Cost Optimization

$0.01 per DAU/month through intelligent resource management

🔮 Coming Up Next

With our architecture optimized for scale, we'll address edge cases and failure scenarios:

• Network partitions - CAP theorem trade-offs and eventual consistency
• Cascading failures - Circuit breakers and graceful degradation
• Data corruption - Recovery strategies and data integrity checks
• Security breaches - Privacy controls and data protection
• Disaster recovery - Multi-region failover and backup strategies

← Previous: Core Architecture

Next: Edge Cases & Failures →