Step 2: API Design & Entities

Step 2 of 6: A - API Design & Core Entities

Define core data models, relationships, and REST API contracts

🎯 Core Entities & Relationships

Facebook Feed - Entity Relationship Diagram

User• userId (PK)• username• email• displayName• profileImageUrl• createdAtPost• postId (PK)• userId (FK)• content• mediaUrls[]• postType• privacy• createdAt• updatedAtFollow• followerId (PK)• followingId (SK)• status• createdAtLike• postId (PK)• userId (SK)• likeType• createdAtComment• commentId (PK)• postId (FK)• userId (FK)• content• parentCommentId• createdAtFeed• userId (PK)• postId (SK)• createdAt• score• isReadPostStats• postId (PK)• likeCount• commentCount• shareCountcreatesfollowshasreceiveshas personalized

👤 User Entity

Core user profile information

• Partition: userId
• Indexes: username, email
• Growth: 3B users

📝 Post Entity

Content created by users

• Partition: postId
• GSI: userId-createdAt
• Growth: 500M posts/day

🔗 Follow Entity

User relationships graph

• Partition: followerId
• Sort Key: followingId
• GSI: followingId-followerId

❤️ Like Entity

Post engagement tracking

• Partition: postId
• Sort Key: userId
• Support: Different reaction types

💬 Comment Entity

Threaded discussions

• Partition: commentId
• GSI: postId-createdAt
• Support: Nested replies

📰 Feed Entity

Pre-computed user timelines

• Partition: userId
• Sort Key: postId
• TTL: 30 days for cleanup

🎯 The Core Challenge: Follower/Followee Relationships

🤔 Why is this the Heart of Social Media?

The follower/followee relationship is what makes social media "social". Every major feature depends on it:

📰 Feed Generation Needs:

• "Show me posts from people I follow"
• "When I post, notify people who follow me"
• "Calculate engagement for posts based on follower count"

🔍 Query Patterns We Must Support:

• Following List: Who does Alice follow?
• Followers List: Who follows Alice?
• Relationship Check: Does Alice follow Bob?

❌ Why Traditional SQL Approaches Fall Short

Single Table Approach:

CREATE TABLE follows (
follower_id INT,
following_id INT,
PRIMARY KEY (follower_id, following_id)
);

-- ✅ Fast: Who does Alice follow?
SELECT following_id FROM follows
WHERE follower_id = 'alice';

-- ❌ SLOW: Who follows Alice?
SELECT follower_id FROM follows
WHERE following_id = 'alice';  -- Full table scan!

Dual Table Approach:

-- Need TWO tables and dual writes
CREATE TABLE following (
follower_id INT PRIMARY KEY,
following_ids JSON  -- [bob, carol, dave]
);

CREATE TABLE followers (
user_id INT PRIMARY KEY,
follower_ids JSON   -- [alice, eve, frank]
);

-- Problems:
-- • 2x storage cost
-- • Consistency issues between tables
-- • JSON array size limits

🚀 DynamoDB GSI: The Elegant Solution

✅ Single Table, Bidirectional Queries

DynamoDB's Global Secondary Index (GSI) allows us to have one table that efficiently supports both query patterns without data duplication or consistency issues.

📊 Primary Table Structure

Table Name: Follow

PK: followerId (who is following)

SK: followingId (who is being followed)

Attributes: status, createdAt, updatedAt

✅ Optimized Query:

"Who does Alice follow?"
Query PK = 'alice' → Get all items

🔄 Global Secondary Index

GSI Name: FollowingId-FollowerId-Index

GSI PK: followingId (who is being followed)

GSI SK: followerId (who is following)

Projection: Keys + status, createdAt

✅ Reverse Query:

"Who follows Alice?"
Query GSI PK = 'alice' → Get all items

📋 Sample Data & Query Examples

💾 Sample Records in Follow Table

Primary Table View:

PK (followerId)	SK (followingId)	status
alice	bob	active
alice	carol	active
bob	alice	active
bob	dave	active
carol	alice	active
dave	alice	active

GSI View (FollowingId-FollowerId-Index):

GSI PK (followingId)	GSI SK (followerId)	status
alice	bob	active
alice	carol	active
alice	dave	active
bob	alice	active
carol	alice	active
dave	bob	active

🔍 Query Examples

Query 1: Who does Alice follow?

// Query Primary Table
Query(
TableName: "Follow",
KeyConditionExpression: "followerId = :alice",
ExpressionAttributeValues: {
  ":alice": "alice"
}
)

// Result: [bob, carol]

Query 2: Who follows Alice?

// Query GSI
Query(
TableName: "Follow",
IndexName: "FollowingId-FollowerId-Index",
KeyConditionExpression: "followingId = :alice",
ExpressionAttributeValues: {
  ":alice": "alice"
}
)

// Result: [bob, carol, dave]

Query 3: Does Alice follow Bob?

// Point Query
GetItem(
TableName: "Follow",
Key: {
  "followerId": "alice",
  "followingId": "bob"
}
)

// Result: Found → Yes, Not Found → No

✅ GSI Advantages

• Single Source of Truth: No duplicate data
• Automatic Consistency: GSI auto-updates with main table
• Both Queries Fast: O(1) for both directions
• Cost Effective: One table + one GSI
• ACID Transactions: Follow/unfollow is atomic

⚠️ Important Considerations

• GSI Costs: ~2x storage + read costs
• Eventually Consistent: GSI updates lag by ~100ms
• Projection Strategy: Include only needed attributes
• Hot Partitions: Celebrity users may create hot spots
• Pagination: Large follower lists need cursor-based paging

🌍 Real-World Impact: How This Powers Feed Generation

📰 Push Model (Write-Heavy)

When Alice posts:

Query GSI: "Who follows Alice?" → [bob, carol, dave]
For each follower, write to their PrecomputedFeed table
Send real-time notifications

Celebrity Problem:

If Alice has 10M followers, this creates 10M writes! We need hybrid approaches for influencers.

🔍 Pull Model (Read-Heavy)

When Bob opens his feed:

Query Primary Table: "Who does Bob follow?" → [alice, carol]
For each following, get their recent posts
Merge and rank posts by algorithm

Trade-off:

Higher read latency but no write amplification. Good for users following many accounts.

🎯 Hybrid Strategy (Best of Both)

📊 User Classification:

• Regular users (<1K followers): Push model
• Influencers (1K-100K): Push to active followers only
• Celebrities (>100K): Pull model with heavy caching

⚡ Smart Optimizations:

• Cache follower lists for frequent reads
• Batch process follow/unfollow operations
• Use bloom filters to check relationships
• Pre-compute feeds for active users

🗄️ Complete DynamoDB Schema Design

📄 Posts Table

Primary Key Structure

PK: postId (UUID)

Attributes:

• userId, content, mediaUrls[]

• postType, privacy, createdAt

• likeCount, commentCount, shareCount

Global Secondary Index

GSI Name: UserPosts-Index

PK: userId

SK: createdAt

• Query user's posts chronologically

Query Pattern: Get all posts by a user, sorted by creation time.

👥 Follow Table

Primary Key Structure

PK: followerId

SK: followingId

Attributes:

• status (active, blocked, pending)

• createdAt, updatedAt

Global Secondary Index

GSI Name: Following-Index

PK: followingId

SK: followerId

• Query who follows a specific user

Query Patterns: Get following list, get followers list, check if user A follows user B.

📰 PrecomputedFeed Table

Primary Key Structure

PK: userId

SK: postId#createdAt

Attributes:

• postId, creatorId

• score, isRead, addedAt

TTL Configuration

TTL Attribute: expiresAt

• Auto-delete entries after 30 days

• Reduces storage costs

Query Pattern: Fast feed retrieval with pagination support.

❤️ Likes Table

Primary Key Structure

PK: postId

SK: userId

Attributes:

• likeType (like, love, angry, etc.)

• createdAt

Usage Pattern

• High write volume (millions/sec)

• Sparse reads (check if user liked)

• Aggregate counts cached separately

Query Pattern: Check if user liked post, get like counts per post.

🌐 REST API Design

📝 Post Management APIs

POST/api/posts

Request Body:

{
"content": "Hello world! 🌍",
"mediaUrls": [
  "https://cdn.example.com/img1.jpg",
  "https://cdn.example.com/video1.mp4"
],
"postType": "media",
"privacy": "public"
}

Response (201):

{
"postId": "post_abc123",
"userId": "user_123",
"content": "Hello world! 🌍",
"mediaUrls": [...],
"postType": "media",
"privacy": "public",
"createdAt": "2024-01-15T10:30:00Z",
"likeCount": 0,
"commentCount": 0
}

GET/api/posts/{postId}

Query Parameters:

• includeStats - Include like/comment counts
• userId - Check if requesting user liked

Response (200):

{
"postId": "post_abc123",
"userId": "user_123",
"username": "johndoe",
"displayName": "John Doe",
"content": "Hello world! 🌍",
"mediaUrls": [...],
"createdAt": "2024-01-15T10:30:00Z",
"likeCount": 42,
"commentCount": 5,
"userLiked": true
}

GET/api/users/{userId}/posts

Query Parameters:

• limit - Number of posts (default: 20)
• cursor - Pagination cursor
• privacy - Filter by privacy level

Response (200):

{
"posts": [
  {
    "postId": "post_abc123",
    "content": "Hello world!",
    "createdAt": "2024-01-15T10:30:00Z",
    "likeCount": 42
  }
],
"pagination": {
  "nextCursor": "cursor_xyz",
  "hasMore": true
}
}

📰 Feed APIs

GET/api/feed/timeline

Query Parameters:

• limit - Posts per page (default: 20)
• cursor - Pagination cursor
• algorithm - chronological or ranked

Response (200):

{
"posts": [
  {
    "postId": "post_abc123",
    "userId": "user_456",
    "username": "janedoe",
    "content": "Amazing sunset today!",
    "mediaUrls": ["sunset.jpg"],
    "createdAt": "2024-01-15T18:30:00Z",
    "likeCount": 128,
    "commentCount": 15,
    "userLiked": false,
    "score": 0.95
  }
],
"pagination": {
  "nextCursor": "feed_cursor_xyz",
  "hasMore": true,
  "generatedAt": "2024-01-15T19:00:00Z"
}
}

POST/api/feed/refresh

Request Body:

{
"lastSeen": "2024-01-15T18:30:00Z",
"algorithm": "ranked"
}

Response (200):

{
"newPostsCount": 5,
"posts": [...],
"refreshedAt": "2024-01-15T19:00:00Z"
}

❤️ Social Interaction APIs

POST/api/posts/{postId}/like

Request Body:

{
"likeType": "like"
}

likeType: like, love, laugh, angry, sad, etc.

Response (201):

{
"postId": "post_abc123",
"userId": "user_123",
"likeType": "like",
"createdAt": "2024-01-15T19:00:00Z",
"newTotalLikes": 129
}

POST/api/posts/{postId}/comments

Request Body:

{
"content": "Great post! 👍",
"parentCommentId": null
}

Response (201):

{
"commentId": "comment_xyz789",
"postId": "post_abc123",
"userId": "user_123",
"username": "johndoe",
"content": "Great post! 👍",
"createdAt": "2024-01-15T19:00:00Z",
"likeCount": 0
}

GET/api/posts/{postId}/comments

Query Parameters:

• limit - Comments per page (default: 20)
• sort - newest, oldest, popular
• cursor - Pagination cursor

Response (200):

{
"comments": [
  {
    "commentId": "comment_xyz789",
    "userId": "user_123",
    "username": "johndoe",
    "content": "Great post! 👍",
    "createdAt": "2024-01-15T19:00:00Z",
    "likeCount": 5,
    "replyCount": 2
  }
],
"pagination": {
  "nextCursor": "comment_cursor_abc",
  "hasMore": true
}
}

👥 Follow Management APIs

POST/api/users/{userId}/follow

Request Body:

{
"action": "follow"
}

action: follow, unfollow

Response (200):

{
"followerId": "user_123",
"followingId": "user_456",
"status": "active",
"createdAt": "2024-01-15T19:00:00Z"
}

GET/api/users/{userId}/following

Query Parameters:

• limit - Users per page (default: 50)
• cursor - Pagination cursor

Response (200):

{
"following": [
  {
    "userId": "user_456",
    "username": "janedoe",
    "displayName": "Jane Doe",
    "profileImageUrl": "avatar.jpg",
    "followedAt": "2024-01-10T15:30:00Z"
  }
],
"pagination": {
  "nextCursor": "follow_cursor_def",
  "hasMore": true
},
"totalCount": 287
}

🎯 Key API Design Decisions

✅ RESTful Design

Standard HTTP methods and status codes make the API intuitive for developers. Resource-based URLs with clear hierarchies.

🎯 Cursor-based Pagination

Avoids offset-based pagination issues with real-time data. Ensures consistent results even when new posts are added.

⚡ Efficient Data Loading

Selective field inclusion, batch operations, and optimized queries reduce bandwidth and improve mobile experience.

🔄 Idempotent Operations

Like/unlike operations are idempotent. Multiple identical requests produce the same result, handling network retry scenarios.

📱 Mobile-First

Lightweight payloads, optional fields, and efficient caching headers optimize for mobile networks and battery life.

🔐 Security by Design

Authentication required for all write operations. Privacy controls enforced at API level before data access.

🔮 Coming Up Next

With our entities and APIs defined, we'll design the core system architecture:

• Microservices design - Post Service, Feed Service, Follow Service
• Push vs Pull models - Feed generation strategies
• Real-time systems - WebSocket connections and notifications
• Caching layers - Redis for hot data, CDN for media
• Message queues - Async processing with Kafka

← Previous: Scope & Requirements

Next: Core Architecture →