Step 2: API Design & Entities

Step 2 of 6: A - API Design & Core Entities

Define core data models, relationships, and REST API contracts

🎯 Core Entities & Relationships

Facebook Feed - Entity Relationship Diagram

User• userId (PK)• username• email• displayName• profileImageUrl• createdAtPost• postId (PK)• userId (FK)• content• mediaUrls[]• postType• privacy• createdAt• updatedAtFollow• followerId (PK)• followingId (SK)• status• createdAtLike• postId (PK)• userId (SK)• likeType• createdAtComment• commentId (PK)• postId (FK)• userId (FK)• content• parentCommentId• createdAtFeed• userId (PK)• postId (SK)• createdAt• score• isReadPostStats• postId (PK)• likeCount• commentCount• shareCountcreatesfollowshasreceiveshas personalized

👤 User Entity

Core user profile information

  • Partition: userId
  • Indexes: username, email
  • Growth: 3B users

📝 Post Entity

Content created by users

  • Partition: postId
  • GSI: userId-createdAt
  • Growth: 500M posts/day

🔗 Follow Entity

User relationships graph

  • Partition: followerId
  • Sort Key: followingId
  • GSI: followingId-followerId

❤️ Like Entity

Post engagement tracking

  • Partition: postId
  • Sort Key: userId
  • Support: Different reaction types

💬 Comment Entity

Threaded discussions

  • Partition: commentId
  • GSI: postId-createdAt
  • Support: Nested replies

📰 Feed Entity

Pre-computed user timelines

  • Partition: userId
  • Sort Key: postId
  • TTL: 30 days for cleanup

🎯 The Core Challenge: Follower/Followee Relationships

🤔 Why is this the Heart of Social Media?

The follower/followee relationship is what makes social media "social". Every major feature depends on it:

📰 Feed Generation Needs:
  • • "Show me posts from people I follow"
  • • "When I post, notify people who follow me"
  • • "Calculate engagement for posts based on follower count"
🔍 Query Patterns We Must Support:
  • Following List: Who does Alice follow?
  • Followers List: Who follows Alice?
  • Relationship Check: Does Alice follow Bob?

❌ Why Traditional SQL Approaches Fall Short

Single Table Approach:
CREATE TABLE follows (
follower_id INT,
following_id INT,
PRIMARY KEY (follower_id, following_id)
);

-- ✅ Fast: Who does Alice follow?
SELECT following_id FROM follows
WHERE follower_id = 'alice';

-- ❌ SLOW: Who follows Alice?
SELECT follower_id FROM follows
WHERE following_id = 'alice';  -- Full table scan!
Dual Table Approach:
-- Need TWO tables and dual writes
CREATE TABLE following (
follower_id INT PRIMARY KEY,
following_ids JSON  -- [bob, carol, dave]
);

CREATE TABLE followers (
user_id INT PRIMARY KEY,
follower_ids JSON   -- [alice, eve, frank]
);

-- Problems:
-- • 2x storage cost
-- • Consistency issues between tables
-- • JSON array size limits

🚀 DynamoDB GSI: The Elegant Solution

✅ Single Table, Bidirectional Queries

DynamoDB's Global Secondary Index (GSI) allows us to have one table that efficiently supports both query patterns without data duplication or consistency issues.

📊 Primary Table Structure
Table Name: Follow
PK: followerId (who is following)
SK: followingId (who is being followed)
Attributes: status, createdAt, updatedAt
✅ Optimized Query:

"Who does Alice follow?"
Query PK = 'alice' → Get all items

🔄 Global Secondary Index
GSI Name: FollowingId-FollowerId-Index
GSI PK: followingId (who is being followed)
GSI SK: followerId (who is following)
Projection: Keys + status, createdAt
✅ Reverse Query:

"Who follows Alice?"
Query GSI PK = 'alice' → Get all items

📋 Sample Data & Query Examples

💾 Sample Records in Follow Table
Primary Table View:
PK (followerId)SK (followingId)status
alicebobactive
alicecarolactive
bobaliceactive
bobdaveactive
carolaliceactive
davealiceactive
GSI View (FollowingId-FollowerId-Index):
GSI PK (followingId)GSI SK (followerId)status
alicebobactive
alicecarolactive
alicedaveactive
bobaliceactive
carolaliceactive
davebobactive
🔍 Query Examples
Query 1: Who does Alice follow?
// Query Primary Table
Query(
TableName: "Follow",
KeyConditionExpression: "followerId = :alice",
ExpressionAttributeValues: {
  ":alice": "alice"
}
)

// Result: [bob, carol]
Query 2: Who follows Alice?
// Query GSI
Query(
TableName: "Follow",
IndexName: "FollowingId-FollowerId-Index",
KeyConditionExpression: "followingId = :alice",
ExpressionAttributeValues: {
  ":alice": "alice"
}
)

// Result: [bob, carol, dave]
Query 3: Does Alice follow Bob?
// Point Query
GetItem(
TableName: "Follow",
Key: {
  "followerId": "alice",
  "followingId": "bob"
}
)

// Result: Found → Yes, Not Found → No

✅ GSI Advantages

  • Single Source of Truth: No duplicate data
  • Automatic Consistency: GSI auto-updates with main table
  • Both Queries Fast: O(1) for both directions
  • Cost Effective: One table + one GSI
  • ACID Transactions: Follow/unfollow is atomic

⚠️ Important Considerations

  • GSI Costs: ~2x storage + read costs
  • Eventually Consistent: GSI updates lag by ~100ms
  • Projection Strategy: Include only needed attributes
  • Hot Partitions: Celebrity users may create hot spots
  • Pagination: Large follower lists need cursor-based paging

🌍 Real-World Impact: How This Powers Feed Generation

📰 Push Model (Write-Heavy)

When Alice posts:

  1. Query GSI: "Who follows Alice?" → [bob, carol, dave]
  2. For each follower, write to their PrecomputedFeed table
  3. Send real-time notifications

Celebrity Problem:

If Alice has 10M followers, this creates 10M writes! We need hybrid approaches for influencers.

🔍 Pull Model (Read-Heavy)

When Bob opens his feed:

  1. Query Primary Table: "Who does Bob follow?" → [alice, carol]
  2. For each following, get their recent posts
  3. Merge and rank posts by algorithm

Trade-off:

Higher read latency but no write amplification. Good for users following many accounts.

🎯 Hybrid Strategy (Best of Both)

📊 User Classification:
  • Regular users (<1K followers): Push model
  • Influencers (1K-100K): Push to active followers only
  • Celebrities (>100K): Pull model with heavy caching
⚡ Smart Optimizations:
  • • Cache follower lists for frequent reads
  • • Batch process follow/unfollow operations
  • • Use bloom filters to check relationships
  • • Pre-compute feeds for active users

🗄️ Complete DynamoDB Schema Design

📄 Posts Table

Primary Key Structure
PK: postId (UUID)
Attributes:
• userId, content, mediaUrls[]
• postType, privacy, createdAt
• likeCount, commentCount, shareCount
Global Secondary Index
GSI Name: UserPosts-Index
PK: userId
SK: createdAt
• Query user's posts chronologically

Query Pattern: Get all posts by a user, sorted by creation time.

👥 Follow Table

Primary Key Structure
PK: followerId
SK: followingId
Attributes:
• status (active, blocked, pending)
• createdAt, updatedAt
Global Secondary Index
GSI Name: Following-Index
PK: followingId
SK: followerId
• Query who follows a specific user

Query Patterns: Get following list, get followers list, check if user A follows user B.

📰 PrecomputedFeed Table

Primary Key Structure
PK: userId
SK: postId#createdAt
Attributes:
• postId, creatorId
• score, isRead, addedAt
TTL Configuration
TTL Attribute: expiresAt
• Auto-delete entries after 30 days
• Reduces storage costs

Query Pattern: Fast feed retrieval with pagination support.

❤️ Likes Table

Primary Key Structure
PK: postId
SK: userId
Attributes:
• likeType (like, love, angry, etc.)
• createdAt
Usage Pattern
• High write volume (millions/sec)
• Sparse reads (check if user liked)
• Aggregate counts cached separately

Query Pattern: Check if user liked post, get like counts per post.

🌐 REST API Design

📝 Post Management APIs

POST/api/posts
Request Body:
{
"content": "Hello world! 🌍",
"mediaUrls": [
  "https://cdn.example.com/img1.jpg",
  "https://cdn.example.com/video1.mp4"
],
"postType": "media",
"privacy": "public"
}
Response (201):
{
"postId": "post_abc123",
"userId": "user_123",
"content": "Hello world! 🌍",
"mediaUrls": [...],
"postType": "media",
"privacy": "public",
"createdAt": "2024-01-15T10:30:00Z",
"likeCount": 0,
"commentCount": 0
}
GET/api/posts/{postId}
Query Parameters:
  • includeStats - Include like/comment counts
  • userId - Check if requesting user liked
Response (200):
{
"postId": "post_abc123",
"userId": "user_123",
"username": "johndoe",
"displayName": "John Doe",
"content": "Hello world! 🌍",
"mediaUrls": [...],
"createdAt": "2024-01-15T10:30:00Z",
"likeCount": 42,
"commentCount": 5,
"userLiked": true
}
GET/api/users/{userId}/posts
Query Parameters:
  • limit - Number of posts (default: 20)
  • cursor - Pagination cursor
  • privacy - Filter by privacy level
Response (200):
{
"posts": [
  {
    "postId": "post_abc123",
    "content": "Hello world!",
    "createdAt": "2024-01-15T10:30:00Z",
    "likeCount": 42
  }
],
"pagination": {
  "nextCursor": "cursor_xyz",
  "hasMore": true
}
}

📰 Feed APIs

GET/api/feed/timeline
Query Parameters:
  • limit - Posts per page (default: 20)
  • cursor - Pagination cursor
  • algorithm - chronological or ranked
Response (200):
{
"posts": [
  {
    "postId": "post_abc123",
    "userId": "user_456",
    "username": "janedoe",
    "content": "Amazing sunset today!",
    "mediaUrls": ["sunset.jpg"],
    "createdAt": "2024-01-15T18:30:00Z",
    "likeCount": 128,
    "commentCount": 15,
    "userLiked": false,
    "score": 0.95
  }
],
"pagination": {
  "nextCursor": "feed_cursor_xyz",
  "hasMore": true,
  "generatedAt": "2024-01-15T19:00:00Z"
}
}
POST/api/feed/refresh
Request Body:
{
"lastSeen": "2024-01-15T18:30:00Z",
"algorithm": "ranked"
}
Response (200):
{
"newPostsCount": 5,
"posts": [...],
"refreshedAt": "2024-01-15T19:00:00Z"
}

❤️ Social Interaction APIs

POST/api/posts/{postId}/like
Request Body:
{
"likeType": "like"
}

likeType: like, love, laugh, angry, sad, etc.

Response (201):
{
"postId": "post_abc123",
"userId": "user_123",
"likeType": "like",
"createdAt": "2024-01-15T19:00:00Z",
"newTotalLikes": 129
}
POST/api/posts/{postId}/comments
Request Body:
{
"content": "Great post! 👍",
"parentCommentId": null
}
Response (201):
{
"commentId": "comment_xyz789",
"postId": "post_abc123",
"userId": "user_123",
"username": "johndoe",
"content": "Great post! 👍",
"createdAt": "2024-01-15T19:00:00Z",
"likeCount": 0
}
GET/api/posts/{postId}/comments
Query Parameters:
  • limit - Comments per page (default: 20)
  • sort - newest, oldest, popular
  • cursor - Pagination cursor
Response (200):
{
"comments": [
  {
    "commentId": "comment_xyz789",
    "userId": "user_123",
    "username": "johndoe",
    "content": "Great post! 👍",
    "createdAt": "2024-01-15T19:00:00Z",
    "likeCount": 5,
    "replyCount": 2
  }
],
"pagination": {
  "nextCursor": "comment_cursor_abc",
  "hasMore": true
}
}

👥 Follow Management APIs

POST/api/users/{userId}/follow
Request Body:
{
"action": "follow"
}

action: follow, unfollow

Response (200):
{
"followerId": "user_123",
"followingId": "user_456",
"status": "active",
"createdAt": "2024-01-15T19:00:00Z"
}
GET/api/users/{userId}/following
Query Parameters:
  • limit - Users per page (default: 50)
  • cursor - Pagination cursor
Response (200):
{
"following": [
  {
    "userId": "user_456",
    "username": "janedoe",
    "displayName": "Jane Doe",
    "profileImageUrl": "avatar.jpg",
    "followedAt": "2024-01-10T15:30:00Z"
  }
],
"pagination": {
  "nextCursor": "follow_cursor_def",
  "hasMore": true
},
"totalCount": 287
}

🎯 Key API Design Decisions

✅ RESTful Design

Standard HTTP methods and status codes make the API intuitive for developers. Resource-based URLs with clear hierarchies.

🎯 Cursor-based Pagination

Avoids offset-based pagination issues with real-time data. Ensures consistent results even when new posts are added.

⚡ Efficient Data Loading

Selective field inclusion, batch operations, and optimized queries reduce bandwidth and improve mobile experience.

🔄 Idempotent Operations

Like/unlike operations are idempotent. Multiple identical requests produce the same result, handling network retry scenarios.

📱 Mobile-First

Lightweight payloads, optional fields, and efficient caching headers optimize for mobile networks and battery life.

🔐 Security by Design

Authentication required for all write operations. Privacy controls enforced at API level before data access.

🔮 Coming Up Next

With our entities and APIs defined, we'll design the core system architecture:

  • Microservices design - Post Service, Feed Service, Follow Service
  • Push vs Pull models - Feed generation strategies
  • Real-time systems - WebSocket connections and notifications
  • Caching layers - Redis for hot data, CDN for media
  • Message queues - Async processing with Kafka