Step 6 of 6: D - Deep Dives & Implementation Details
Detailed implementation, technology choices, and optimization techniques
🛠️ DynamoDB Schema & Optimization
Table Schemas & Access Patterns
Posts Table
// Primary Table: Posts PK: postId (String) SK: userId (String) Attributes: - content: String (max 4KB) - mediaUrls: List<String> - timestamp: Number (Unix epoch) - likeCount: Number - commentCount: Number - visibility: String (public/friends/private) // GSI-1: User Posts Timeline GSI1PK: userId GSI1SK: timestamp (desc) - Query: Get all posts by user - Sort: Most recent first // LSI-1: Post Engagement LSI1SK: likeCount (desc) - Query: Top posts by user - Sort: Most liked first
Follow Relationship Table
// Primary Table: Follow PK: followerId (String) SK: followingId (String) Attributes: - followedAt: Number (timestamp) - notificationsEnabled: Boolean - relationshipType: String (friend/follow) // GSI-1: Reverse Lookup (Core Feature!) GSI1PK: followingId GSI1SK: followerId - Query: Who follows this user? - Essential for: Fanout operations // Query Patterns: // 1. Who does Alice follow? Query(PK = "alice", SK begins_with "") // 2. Who follows Alice? (THE KEY INSIGHT!) Query(IndexName="GSI1", GSI1PK="alice")
💡 Key Insight: The GSI on Follow table is what makes bidirectional queries efficient. Without it, finding "who follows user X" would require scanning the entire table - impossible at Facebook scale!
Capacity Planning & Cost Optimization
Read/Write Capacity Units
// Posts Table Calculation Write Load: - 500M posts/day = 5,787 WPS - Each post = 4KB avg - WCU needed = 5,787 × 4 = 23,148 WCU Read Load: - 7M QPS peak (post fetching) - Each query = 4KB avg - RCU needed = 7M × 4 / 4 = 7M RCU - With eventual consistency: 3.5M RCU Monthly Cost: - WCU: 23,148 × $0.00065 × 730 = $10,983 - RCU: 3.5M × $0.00013 × 730 = $332,450 - Total: ~$343K/month for Posts table
Hot Partition Management
// Partition Key Design
// ❌ Bad: Celebrity posts create hot partition
PK = userId // Taylor Swift gets all traffic!
// ✅ Good: Distribute celebrity load
PK = userId + "#" + timestamp.slice(0,8)
// taylorswift#20231201, taylorswift#20231202
// Write Sharding for Celebrities
if (user.followerCount > 10M) {
const shard = hash(postId) % 32;
PK = `${userId}#${shard}`;
}
// Query Pattern: Fan-out read
for (let shard = 0; shard < 32; shard++) {
const results = await query({
PK: `taylorswift#${shard}`
});
}⚡ Feed Ranking Algorithm
Machine Learning Ranking Pipeline
Relevance Score Calculation
class FeedRankingModel {
calculateRelevanceScore(post, user) {
const features = this.extractFeatures(post, user);
// Multi-factor scoring model
const score =
0.3 * features.authorRelationshipScore + // Close friends weight more
0.2 * features.engagementVelocity + // Viral content boost
0.15 * features.contentQualityScore + // ML content analysis
0.15 * features.recencyScore + // Time decay function
0.1 * features.personalInterestMatch + // User interest alignment
0.05 * features.diversityBoost + // Prevent echo chambers
0.05 * features.authorCredibilityScore; // Verified accounts, etc.
return Math.min(100, Math.max(0, score * 100));
}
extractFeatures(post, user) {
return {
// Relationship strength (0-1)
authorRelationshipScore: this.calculateRelationshipStrength(
post.authorId, user.id
),
// Engagement velocity (likes/minute in first hour)
engagementVelocity: post.likeCount /
Math.max(1, (Date.now() - post.timestamp) / 60000),
// ML-based content quality (0-1)
contentQualityScore: this.mlContentAnalyzer.score(post.content),
// Recency with exponential decay
recencyScore: Math.exp(-0.1 * this.getAgeInHours(post)),
// Interest matching based on user behavior
personalInterestMatch: this.interestMatcher.score(
post.topics, user.interests
)
};
}
}🧠Deep Dive: ML Model Training Pipeline▼
Training Data Pipeline
// Feature Engineering Pipeline 1. User Behavior Events - Post views (dwell time > 2 seconds) - Likes, comments, shares - Profile visits after seeing post - Time spent reading 2. Content Features - Text analysis (sentiment, topics, readability) - Image recognition (objects, faces, quality) - Video metrics (completion rate, replay rate) - URL analysis (domain reputation, content type) 3. Context Features - Time of day, day of week - Device type, connection speed - Location, weather (if permitted) - Previous session behavior 4. Training Labels - Positive: Engagement within 5 minutes - Negative: Impression without engagement - Strong Positive: Share or comment - Strong Negative: Hide post or unfollow
A/B Testing Framework
// Continuous Model Improvement
class FeedExperimentManager {
async runRankingExperiment() {
const experiment = {
name: "engagement_boost_v2.1",
hypothesis: "Increasing engagement velocity weight improves CTR",
variants: [
{ name: "control", engagementWeight: 0.2 },
{ name: "treatment", engagementWeight: 0.3 }
],
metrics: [
"click_through_rate",
"session_duration",
"posts_per_session",
"user_satisfaction_score"
],
duration_days: 14,
traffic_split: 0.1 // 10% of users
};
return await this.experimentPlatform.deploy(experiment);
}
}🔧 Technology Stack & Service Architecture
🏗️ Backend Services
Post Service (Java/Spring Boot)
@RestController
@RequestMapping("/api/posts")
public class PostController {
@PostMapping("/create")
public ResponseEntity<Post> createPost(
@RequestBody CreatePostRequest request,
@RequestHeader("Authorization") String token
) {
// 1. Validate user authentication
User user = authService.validateToken(token);
// 2. Content moderation
moderationService.scanContent(request.getContent());
// 3. Create post
Post post = postService.createPost(user.getId(), request);
// 4. Async fanout (don't wait)
fanoutService.publishPostCreatedEvent(post);
return ResponseEntity.ok(post);
}
@GetMapping("/feed/{userId}")
public ResponseEntity<FeedResponse> getFeed(
@PathVariable String userId,
@RequestParam(defaultValue = "20") int limit
) {
return ResponseEntity.ok(
feedService.getPersonalizedFeed(userId, limit)
);
}
}Feed Generation Service (Node.js)
// High-performance async processing
class FeedGenerationService {
async processFanout(postId, authorId) {
const followers = await this.getFollowers(authorId);
// Batch processing for efficiency
const batches = this.chunk(followers, 1000);
await Promise.all(batches.map(batch =>
this.processBatch(batch, postId)
));
}
async processBatch(followers, postId) {
const feedUpdates = followers.map(followerId => ({
userId: followerId,
postId: postId,
timestamp: Date.now(),
score: this.calculateRelevanceScore(postId, followerId)
}));
// Bulk insert to DynamoDB
await this.dynamodb.batchWrite('PrecomputedFeed', feedUpdates);
}
}🌐 Infrastructure & Deployment
Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: feed-service
spec:
replicas: 50
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
template:
spec:
containers:
- name: feed-service
image: facebook/feed-service:v2.1.0
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2000m"
memory: "4Gi"
env:
- name: DYNAMODB_REGION
value: "us-east-1"
- name: REDIS_CLUSTER_ENDPOINT
valueFrom:
secretKeyRef:
name: redis-config
key: endpointAuto-scaling Configuration
// HPA (Horizontal Pod Autoscaler)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: feed-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: feed-service
minReplicas: 10
maxReplicas: 200
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: pending_feed_requests
target:
type: AverageValue
averageValue: "100"🚀 Multi-Layer Caching Strategy
L1: Application Cache
// In-memory cache for hot data
class AppCache {
private cache = new LRUCache<string, any>({
maxSize: 10000,
ttl: 300000 // 5 minutes
});
async getUserFeed(userId: string) {
const key = `feed:${userId}`;
let feed = this.cache.get(key);
if (!feed) {
feed = await this.redisCache.getFeed(userId);
this.cache.set(key, feed);
}
return feed;
}
}
// Hit Rate: 60-70%
// Latency: ~0.1msL2: Redis Distributed Cache
// Redis Cluster (6 nodes, 3 masters, 3 replicas)
class RedisCache {
async getFeed(userId: string) {
const key = `feed:${userId}`;
// Try cache first
let feed = await this.redis.get(key);
if (feed) return JSON.parse(feed);
// Cache miss - get from DB
feed = await this.dynamodb.getFeed(userId);
// Store in cache with smart TTL
const ttl = this.calculateTTL(userId);
await this.redis.setex(key, ttl, JSON.stringify(feed));
return feed;
}
calculateTTL(userId: string): number {
// VIP users: shorter TTL for fresher feeds
return this.isVipUser(userId) ? 300 : 3600;
}
}
// Hit Rate: 85-90%
// Latency: ~1-5msL3: CDN Edge Cache
// CloudFront for static content
// Media URLs, profile pictures, etc.
// Edge locations: 200+ globally
// Cache Rules:
{
"media/*": {
"ttl": 86400, // 24 hours
"gzip": true,
"webp_conversion": true
},
"api/posts/*/thumbnail": {
"ttl": 3600, // 1 hour
"vary": ["Accept-Encoding", "User-Agent"]
}
}
// Benefits:
// - 95% cache hit rate for media
// - <50ms latency globally
// - 70% bandwidth savings🎯 Combined Cache Performance
🧠 Smart TTL & Hot Partition Solution
Intelligent TTL Based on User Behavior
User Activity-Based TTL Strategy
class SmartTTLManager {
calculateFeedItemTTL(userId, postId, relevanceScore) {
const userActivity = this.getUserActivityLevel(userId);
const postAge = Date.now() - this.getPostTimestamp(postId);
// Base TTL on user engagement patterns
let baseTTL;
switch(userActivity) {
case 'DAILY_ACTIVE':
baseTTL = 14 * 24 * 60 * 60; // 14 days
break;
case 'WEEKLY_ACTIVE':
baseTTL = 5 * 24 * 60 * 60; // 5 days
break;
case 'MONTHLY_ACTIVE':
baseTTL = 2 * 24 * 60 * 60; // 2 days
break;
case 'DORMANT':
baseTTL = 1 * 24 * 60 * 60; // 1 day
break;
}
// Adjust based on relevance
const relevanceMultiplier = this.getRelevanceMultiplier(relevanceScore);
return Math.floor(baseTTL * relevanceMultiplier);
}
getRelevanceMultiplier(score) {
if (score > 90) return 2.0; // Double TTL for highly relevant
if (score > 70) return 1.5; // 50% longer for relevant
if (score > 50) return 1.0; // Standard TTL
if (score > 30) return 0.5; // Half TTL for low relevance
return 0.25; // Minimal TTL for irrelevant
}
}Cost Impact of Smart TTL
// Storage Cost Analysis
Without Smart TTL:
- Average feed size: 10,000 items/user
- Storage per user: 40 MB
- Cost per user: $0.01/month
- Total (2B users): $20M/month
With Smart TTL:
- Average feed size: 500 items/user (95% reduction!)
- Storage per user: 2 MB
- Cost per user: $0.0005/month
- Total (2B users): $1M/month
Monthly Savings: $19M 💰
// DynamoDB TTL Implementation
{
userId: "alice",
postId: "post_123",
score: 87.5,
timestamp: 1703001234,
ttl: 1704211234, // Calculated dynamically
// TTL calculation:
// Alice = DAILY_ACTIVE
// Score = 87.5 (high relevance)
// TTL = 14 days * 1.5 = 21 days
}💡 Key Insight: DynamoDB TTL deletion happens within 48 hours of expiry (not immediate!). Always filter expired items in queries: FilterExpression: 'ttl > :now'
Complete Hot Partition Prevention Strategy
Multi-Layer Defense Against Hot Partitions
class HotPartitionDefense {
async getPost(postId, userId) {
const postMetadata = await this.cache.get(`meta:${postId}`);
// Layer 1: Request Coalescing
if (this.isRequestInFlight(postId)) {
return await this.waitForInflightRequest(postId);
}
// Layer 2: Stale-While-Revalidate
const cachedPost = await this.cache.get(postId);
if (cachedPost && !this.isTooStale(cachedPost)) {
// Serve stale content immediately
this.refreshInBackground(postId);
return cachedPost;
}
// Layer 3: Celebrity Detection & Special Handling
if (postMetadata?.isCelebrity) {
return await this.handleCelebrityPost(postId);
}
// Regular post - can hit database
return await this.fetchFromDatabase(postId);
}
async handleCelebrityPost(postId) {
// Celebrity posts NEVER hit primary database
const layers = [
() => this.l1Cache.get(postId), // 0.1ms
() => this.l2RedisCache.get(postId), // 1-5ms
() => this.l3CDNCache.get(postId), // 10-50ms
() => this.readReplica.get(postId), // 100ms (last resort)
];
for (const getLayer of layers) {
const post = await getLayer();
if (post) {
// Proactively warm upper layers
this.warmUpperCaches(post);
return post;
}
}
throw new Error('Celebrity post not found in any cache');
}
}Request Coalescing Pattern
// Prevent duplicate requests
class RequestCoalescer {
private inflightRequests = new Map();
async get(key, fetchFn) {
// Check if request already in flight
if (this.inflightRequests.has(key)) {
return await this.inflightRequests.get(key);
}
// Create new request promise
const promise = fetchFn();
this.inflightRequests.set(key, promise);
try {
const result = await promise;
return result;
} finally {
// Clean up after completion
this.inflightRequests.delete(key);
}
}
}
// Result: 10,000 concurrent requests for
// same post become just 1 database query!Cache Warming Strategy
// Proactive cache warming for celebrities
class CelebrityCacheWarmer {
async onCelebrityPost(post) {
// Immediately distribute to all layers
await Promise.all([
this.warmL1Cache(post),
this.warmL2Redis(post),
this.warmL3CDN(post),
this.warmGlobalEdges(post)
]);
// Set aggressive TTLs
const ttls = {
l1: 3600, // 1 hour
l2: 86400, // 24 hours
l3: 604800, // 7 days
cdn: 2592000 // 30 days
};
// Schedule refresh before expiry
this.scheduleRefresh(post.id, ttls.l1 * 0.8);
}
async warmGlobalEdges(post) {
const regions = ['us-east', 'us-west',
'eu-west', 'ap-south'];
await Promise.all(regions.map(region =>
this.pushToRegion(post, region)
));
}
}🎯 Hot Partition Defense Results
Relevance Score Deep Dive
🎯Visual Relevance Scoring Flow▼
💡 Key Insights from the Flow
Facebook prioritizes content from people you actually interact with. A post from your mom beats a celebrity post every time.
Breaking news loses 50% relevance per hour (becomes irrelevant fast), while evergreen content like recipes only loses 5% per hour.
The neural network considers 3000+ feature combinations, finding patterns humans can't see (like "photos posted at 3pm on Sundays get more engagement").
Birthday posts get 2x boost, hidden content gets 0.3x penalty. These hard rules ensure important moments aren't missed.
💰 Cost Analysis & Business Trade-offs
💸 Monthly Infrastructure Cost
~$0.46 per DAU per month
⚖️ Design Trade-offs
✅ What We Optimized For:
- • Read Performance: <100ms P95 latency
- • Global Consistency: DynamoDB Global Tables
- • Scalability: Handles 2B users seamlessly
- • Availability: 99.95% uptime target
⚠️ What We Sacrificed:
- • Write Latency: 500ms for celebrity fanout
- • Storage Cost: Redundant feed storage
- • Complexity: Multi-layer caching logic
- • Eventual Consistency: Some stale reads
💡 Key Insight: We chose availability and partition tolerance (AP in CAP theorem) over strong consistency, optimizing for user experience over perfect data correctness.
🔄 Alternative Design Approaches
Pure Pull Model (Twitter-like)
// No precomputed feeds
GET /api/feed/{userId} {
1. Get user's following list (1 query)
2. Query recent posts from each followee (N queries)
3. Merge and rank in memory
4. Return top 20 posts
}
Pros: ✅ No fanout storms, ✅ Always fresh data
Cons: ❌ Slow reads, ❌ Doesn't scale to 1000+ followsPure Push Model (Instagram-like)
// Full fanout for everyone
POST /api/posts {
1. Create post
2. Get ALL followers (even 200M for celebrities)
3. Insert post into each follower's feed
4. Return success
}
Pros: ✅ Lightning fast reads, ✅ Simple caching
Cons: ❌ Celebrity fanout kills system, ❌ Massive storage🎯 Why Hybrid Won
Our hybrid push/pull approach gives us the best of both worlds: fast reads for regular users (push) and sustainable writes for celebrities (pull). This handles the 80/20 rule perfectly - most users have <1000 followers (push works great), while celebrities need special handling.
🎉 Facebook Feed Design Complete!
You've mastered the SACRED framework for system design
Scope
2B users, global scale
API
DynamoDB + GSI magic
Core
Microservices architecture
Refinement
32-shard scaling
Edge Cases
Failure resilience
Deep Dive
Implementation details
🏆 Key Takeaways
- ✅ DynamoDB GSI enables bidirectional queries
- ✅ Hybrid Push/Pull solves celebrity problem
- ✅ Multi-tier caching achieves <100ms latency
- ✅ Horizontal sharding scales to billions
- ✅ Circuit breakers prevent cascading failures
- ✅ SACRED framework structures complex designs