Step 2: API Design & Entities
Step 2 of 6: A - API Design & Core Entities
Define core data models, relationships, and REST API contracts
🎯 Core Entities & Relationships
👤 User Entity
Core user profile information
- • Partition: userId
- • Indexes: username, email
- • Growth: 3B users
📝 Post Entity
Content created by users
- • Partition: postId
- • GSI: userId-createdAt
- • Growth: 500M posts/day
🔗 Follow Entity
User relationships graph
- • Partition: followerId
- • Sort Key: followingId
- • GSI: followingId-followerId
❤️ Like Entity
Post engagement tracking
- • Partition: postId
- • Sort Key: userId
- • Support: Different reaction types
💬 Comment Entity
Threaded discussions
- • Partition: commentId
- • GSI: postId-createdAt
- • Support: Nested replies
📰 Feed Entity
Pre-computed user timelines
- • Partition: userId
- • Sort Key: postId
- • TTL: 30 days for cleanup
🎯 The Core Challenge: Follower/Followee Relationships
🤔 Why is this the Heart of Social Media?
The follower/followee relationship is what makes social media "social". Every major feature depends on it:
📰 Feed Generation Needs:
- • "Show me posts from people I follow"
- • "When I post, notify people who follow me"
- • "Calculate engagement for posts based on follower count"
🔍 Query Patterns We Must Support:
- • Following List: Who does Alice follow?
- • Followers List: Who follows Alice?
- • Relationship Check: Does Alice follow Bob?
❌ Why Traditional SQL Approaches Fall Short
Single Table Approach:
CREATE TABLE follows ( follower_id INT, following_id INT, PRIMARY KEY (follower_id, following_id) ); -- ✅ Fast: Who does Alice follow? SELECT following_id FROM follows WHERE follower_id = 'alice'; -- ❌ SLOW: Who follows Alice? SELECT follower_id FROM follows WHERE following_id = 'alice'; -- Full table scan!
Dual Table Approach:
-- Need TWO tables and dual writes CREATE TABLE following ( follower_id INT PRIMARY KEY, following_ids JSON -- [bob, carol, dave] ); CREATE TABLE followers ( user_id INT PRIMARY KEY, follower_ids JSON -- [alice, eve, frank] ); -- Problems: -- • 2x storage cost -- • Consistency issues between tables -- • JSON array size limits
🚀 DynamoDB GSI: The Elegant Solution
✅ Single Table, Bidirectional Queries
DynamoDB's Global Secondary Index (GSI) allows us to have one table that efficiently supports both query patterns without data duplication or consistency issues.
📊 Primary Table Structure
FollowfollowerId (who is following)followingId (who is being followed)✅ Optimized Query:
"Who does Alice follow?"
Query PK = 'alice' → Get all items
🔄 Global Secondary Index
FollowingId-FollowerId-IndexfollowingId (who is being followed)followerId (who is following)✅ Reverse Query:
"Who follows Alice?"
Query GSI PK = 'alice' → Get all items
📋 Sample Data & Query Examples
💾 Sample Records in Follow Table
| PK (followerId) | SK (followingId) | status |
|---|---|---|
| alice | bob | active |
| alice | carol | active |
| bob | alice | active |
| bob | dave | active |
| carol | alice | active |
| dave | alice | active |
| GSI PK (followingId) | GSI SK (followerId) | status |
|---|---|---|
| alice | bob | active |
| alice | carol | active |
| alice | dave | active |
| bob | alice | active |
| carol | alice | active |
| dave | bob | active |
🔍 Query Examples
Query 1: Who does Alice follow?
// Query Primary Table
Query(
TableName: "Follow",
KeyConditionExpression: "followerId = :alice",
ExpressionAttributeValues: {
":alice": "alice"
}
)
// Result: [bob, carol]Query 2: Who follows Alice?
// Query GSI
Query(
TableName: "Follow",
IndexName: "FollowingId-FollowerId-Index",
KeyConditionExpression: "followingId = :alice",
ExpressionAttributeValues: {
":alice": "alice"
}
)
// Result: [bob, carol, dave]Query 3: Does Alice follow Bob?
// Point Query
GetItem(
TableName: "Follow",
Key: {
"followerId": "alice",
"followingId": "bob"
}
)
// Result: Found → Yes, Not Found → No✅ GSI Advantages
- • Single Source of Truth: No duplicate data
- • Automatic Consistency: GSI auto-updates with main table
- • Both Queries Fast: O(1) for both directions
- • Cost Effective: One table + one GSI
- • ACID Transactions: Follow/unfollow is atomic
⚠️ Important Considerations
- • GSI Costs: ~2x storage + read costs
- • Eventually Consistent: GSI updates lag by ~100ms
- • Projection Strategy: Include only needed attributes
- • Hot Partitions: Celebrity users may create hot spots
- • Pagination: Large follower lists need cursor-based paging
🌍 Real-World Impact: How This Powers Feed Generation
📰 Push Model (Write-Heavy)
When Alice posts:
- Query GSI: "Who follows Alice?" → [bob, carol, dave]
- For each follower, write to their PrecomputedFeed table
- Send real-time notifications
Celebrity Problem:
If Alice has 10M followers, this creates 10M writes! We need hybrid approaches for influencers.
🔍 Pull Model (Read-Heavy)
When Bob opens his feed:
- Query Primary Table: "Who does Bob follow?" → [alice, carol]
- For each following, get their recent posts
- Merge and rank posts by algorithm
Trade-off:
Higher read latency but no write amplification. Good for users following many accounts.
🎯 Hybrid Strategy (Best of Both)
📊 User Classification:
- • Regular users (<1K followers): Push model
- • Influencers (1K-100K): Push to active followers only
- • Celebrities (>100K): Pull model with heavy caching
⚡ Smart Optimizations:
- • Cache follower lists for frequent reads
- • Batch process follow/unfollow operations
- • Use bloom filters to check relationships
- • Pre-compute feeds for active users
🗄️ Complete DynamoDB Schema Design
📄 Posts Table
Primary Key Structure
Global Secondary Index
Query Pattern: Get all posts by a user, sorted by creation time.
👥 Follow Table
Primary Key Structure
Global Secondary Index
Query Patterns: Get following list, get followers list, check if user A follows user B.
📰 PrecomputedFeed Table
Primary Key Structure
TTL Configuration
Query Pattern: Fast feed retrieval with pagination support.
❤️ Likes Table
Primary Key Structure
Usage Pattern
Query Pattern: Check if user liked post, get like counts per post.
🌐 REST API Design
📝 Post Management APIs
Request Body:
{
"content": "Hello world! 🌍",
"mediaUrls": [
"https://cdn.example.com/img1.jpg",
"https://cdn.example.com/video1.mp4"
],
"postType": "media",
"privacy": "public"
}Response (201):
{
"postId": "post_abc123",
"userId": "user_123",
"content": "Hello world! 🌍",
"mediaUrls": [...],
"postType": "media",
"privacy": "public",
"createdAt": "2024-01-15T10:30:00Z",
"likeCount": 0,
"commentCount": 0
}Query Parameters:
- •
includeStats- Include like/comment counts - •
userId- Check if requesting user liked
Response (200):
{
"postId": "post_abc123",
"userId": "user_123",
"username": "johndoe",
"displayName": "John Doe",
"content": "Hello world! 🌍",
"mediaUrls": [...],
"createdAt": "2024-01-15T10:30:00Z",
"likeCount": 42,
"commentCount": 5,
"userLiked": true
}Query Parameters:
- •
limit- Number of posts (default: 20) - •
cursor- Pagination cursor - •
privacy- Filter by privacy level
Response (200):
{
"posts": [
{
"postId": "post_abc123",
"content": "Hello world!",
"createdAt": "2024-01-15T10:30:00Z",
"likeCount": 42
}
],
"pagination": {
"nextCursor": "cursor_xyz",
"hasMore": true
}
}📰 Feed APIs
Query Parameters:
- •
limit- Posts per page (default: 20) - •
cursor- Pagination cursor - •
algorithm- chronological or ranked
Response (200):
{
"posts": [
{
"postId": "post_abc123",
"userId": "user_456",
"username": "janedoe",
"content": "Amazing sunset today!",
"mediaUrls": ["sunset.jpg"],
"createdAt": "2024-01-15T18:30:00Z",
"likeCount": 128,
"commentCount": 15,
"userLiked": false,
"score": 0.95
}
],
"pagination": {
"nextCursor": "feed_cursor_xyz",
"hasMore": true,
"generatedAt": "2024-01-15T19:00:00Z"
}
}Request Body:
{
"lastSeen": "2024-01-15T18:30:00Z",
"algorithm": "ranked"
}Response (200):
{
"newPostsCount": 5,
"posts": [...],
"refreshedAt": "2024-01-15T19:00:00Z"
}❤️ Social Interaction APIs
Request Body:
{
"likeType": "like"
}likeType: like, love, laugh, angry, sad, etc.
Response (201):
{
"postId": "post_abc123",
"userId": "user_123",
"likeType": "like",
"createdAt": "2024-01-15T19:00:00Z",
"newTotalLikes": 129
}Request Body:
{
"content": "Great post! 👍",
"parentCommentId": null
}Response (201):
{
"commentId": "comment_xyz789",
"postId": "post_abc123",
"userId": "user_123",
"username": "johndoe",
"content": "Great post! 👍",
"createdAt": "2024-01-15T19:00:00Z",
"likeCount": 0
}Query Parameters:
- •
limit- Comments per page (default: 20) - •
sort- newest, oldest, popular - •
cursor- Pagination cursor
Response (200):
{
"comments": [
{
"commentId": "comment_xyz789",
"userId": "user_123",
"username": "johndoe",
"content": "Great post! 👍",
"createdAt": "2024-01-15T19:00:00Z",
"likeCount": 5,
"replyCount": 2
}
],
"pagination": {
"nextCursor": "comment_cursor_abc",
"hasMore": true
}
}👥 Follow Management APIs
Request Body:
{
"action": "follow"
}action: follow, unfollow
Response (200):
{
"followerId": "user_123",
"followingId": "user_456",
"status": "active",
"createdAt": "2024-01-15T19:00:00Z"
}Query Parameters:
- •
limit- Users per page (default: 50) - •
cursor- Pagination cursor
Response (200):
{
"following": [
{
"userId": "user_456",
"username": "janedoe",
"displayName": "Jane Doe",
"profileImageUrl": "avatar.jpg",
"followedAt": "2024-01-10T15:30:00Z"
}
],
"pagination": {
"nextCursor": "follow_cursor_def",
"hasMore": true
},
"totalCount": 287
}🎯 Key API Design Decisions
✅ RESTful Design
Standard HTTP methods and status codes make the API intuitive for developers. Resource-based URLs with clear hierarchies.
🎯 Cursor-based Pagination
Avoids offset-based pagination issues with real-time data. Ensures consistent results even when new posts are added.
⚡ Efficient Data Loading
Selective field inclusion, batch operations, and optimized queries reduce bandwidth and improve mobile experience.
🔄 Idempotent Operations
Like/unlike operations are idempotent. Multiple identical requests produce the same result, handling network retry scenarios.
📱 Mobile-First
Lightweight payloads, optional fields, and efficient caching headers optimize for mobile networks and battery life.
🔐 Security by Design
Authentication required for all write operations. Privacy controls enforced at API level before data access.
🔮 Coming Up Next
With our entities and APIs defined, we'll design the core system architecture:
- • Microservices design - Post Service, Feed Service, Follow Service
- • Push vs Pull models - Feed generation strategies
- • Real-time systems - WebSocket connections and notifications
- • Caching layers - Redis for hot data, CDN for media
- • Message queues - Async processing with Kafka