System Design Framework

A structured methodology for approaching system design problems. This framework ensures you cover all critical aspects while maintaining a logical flow that interviewers can easily follow.

The SACRED Framework

🏛️ SACRED: Systematic Approach to Complex Requirements & Efficient Design

S - Scope & Requirements

Define functional & non-functional requirements

A - API Design & Core Entities

Define entities, relationships, and API contracts

C - Core High-Level Design

Basic architecture to satisfy functional requirements

R - Refinement for Scale

Address non-functional requirements (scale, performance)

E - Edge Cases

Failure handling and special scenarios

D - Deep Dives

Detailed component implementation

Step 1: Scope & Requirements (5-10 mins)

🎯 Requirements Gathering

Functional Requirements

What the system should DO

Ask about:

• Core features (must-have vs nice-to-have)

• User actions and workflows

• Data operations (CRUD)

• Business logic rules

• User types and permissions

Non-Functional Requirements

HOW WELL the system should perform

Clarify:

• Scale (users, requests, data volume)

• Performance (latency, throughput)

• Availability (uptime targets)

• Consistency requirements

• Geographic distribution

💡 Pro Tip: Write down specific numbers! "Handle 1B requests/day" is better than "handle high traffic". If not given, make reasonable assumptions and state them clearly.

Example: URL Shortener Requirements

🔗 Real Interview Dialogue

👨‍💼 Interviewer:

"Design a URL shortening service like bit.ly."

🧑‍💻 You:

"Great! Let me clarify the requirements. For functional requirements - should users be able to shorten any URL, and do we need custom aliases?"

👨‍💼 Interviewer:

"Yes to shortening URLs. Custom aliases would be nice but not required initially."

🧑‍💻 You:

"Got it. Do we need analytics - like click counts, geographic data? Also, should URLs expire or last forever?"

👨‍💼 Interviewer:

"Basic click counts are important. For expiration, let's say URLs last forever unless deleted."

🧑‍💻 You:

"Perfect. Now for scale - how many URL shortenings per day should we handle? And what about read vs write ratio?"

👨‍💼 Interviewer:

"Let's say 100M URLs created per day. Reads are much higher - assume 100:1 read/write ratio."

🧑‍💻 You:

"Excellent. So 100M writes/day and 10B reads/day. For availability - is 99.9% uptime acceptable? Any latency requirements?"

👨‍💼 Interviewer:

"99.9% is fine. URL redirection should be under 100ms."

📝 Summary:

Functional Requirements:

• Shorten long URLs to short URLs

• Redirect short URLs to original URLs

• Basic click analytics

• URLs persist forever (no expiration)

Non-Functional Requirements:

• 100M URL shortenings/day

• 10B redirections/day (100:1 read/write)

• 99.9% availability

• <100ms redirect latency

Step 2: API Design & Core Entities (10-15 mins)

🔗 Entities & API Contracts

2.1 Core Entities

Identify main objects:

• Primary entities (User, Post, Order)
• Relationships (1:1, 1:N, N:M)
• Key attributes per entity
• Data size estimations

Example - URL Shortener:

URL: {id, long_url, short_code, created_at}

User: {id, email, api_key}

Analytics: {url_id, clicks, timestamp}

User → URLs (1:N)

2.2 API Design

Define endpoints:

• REST vs GraphQL decision
• Request/Response format
• Authentication method
• Rate limiting approach

URL Shortener APIs:

POST /shorten

{long_url} → {short_url, expires}

GET /{short_code}

→ 301 Redirect to long_url

GET /stats/{short_code}

→ {clicks, referrers, timeline}

2.3 Back-of-Envelope Calculations

Based on requirements & entities:

• QPS = Daily requests / 86,400 seconds

• Storage = Entity size × Number × Retention

• Bandwidth = QPS × Average payload size

• Cache = 20% of frequently accessed data

Example (100M URLs/day):

• Write QPS: 100M / 86400 ≈ 1200 QPS

• Read QPS: 1200 × 100 = 120K QPS

• Storage: 500 bytes × 100M × 365 ≈ 18 TB/year

Step 3: Core High-Level Design (10-15 mins)

🏛️ Basic Architecture (Functional Requirements)

3.1 Simple Architecture

Draw components to satisfy APIs:

• Client → Load Balancer

• Load Balancer → Application Servers

• Application → Database (single master initially)

• Application → Object Storage (for media)

Focus: Make it work functionally!

3.2 Database Design

Choose database type:

• SQL for ACID compliance
• NoSQL for scale & flexibility
• Schema design for entities
• Primary/foreign key relationships

URL Shortener Tables:

urls: id, long_url, short_code, user_id

users: id, email, api_key

analytics: url_id, timestamp, ip

3.3 Basic Component Flow

Draw simple architecture that satisfies functional requirements:

URL Shortener Basic Flow:

Client → Load Balancer → App Server

App Server → Database (read/write)

App Server → Cache (for hot URLs)

Focus: Make functional requirements work!

3.4 Database Selection

SQL

• ACID compliance
• Complex queries
• Strong consistency

NoSQL

• Horizontal scaling
• Flexible schema
• Eventual consistency

Step 4: Refinement for Scale (10-15 mins)

⚡ Scale & Performance Optimization

4.1 Scale Architecture

Now address non-functional requirements:

• Multiple Load Balancers (availability)

• Auto-scaling Application Servers

• Multi-level Caching (CDN + Redis)

• Database Replication (Master-Slave)

• Database Sharding (for 100M+ URLs)

• Message Queue (analytics processing)

4.2 Performance Optimizations

Read Path (10B/day):

1. CDN cache (popular URLs)
2. Application cache (Redis)
3. Database read replicas
4. Geographic distribution
5. <100ms latency target

Write Path (100M/day):

1. Rate limiting (per user)
2. Database sharding (by short_code)
3. Async analytics processing
4. Write-through cache
5. Batch writes for analytics

4.3 Scaling Strategies

Database Scaling:

• Read replicas (handle 100:1 ratio)
• Sharding by short_code hash
• Connection pooling
• Indexing on short_code

Caching Strategy:

• CDN for popular URLs (80/20 rule)
• Redis for recent URLs
• Application-level caching
• Write-through for new URLs

Step 5: Edge Cases & Failures (10 mins)

🛡️ Resilience & Failure Handling

Edge Cases & Failure Scenarios

• Hotspots: Celebrity problem, viral content
• Race conditions: Distributed locks
• Server failure: Health checks, auto-restart
• Database failure: Replicas, failover
• Network partition: CAP theorem trade-offs
• Cascading failures: Circuit breakers

Performance Refinements

• Caching: Multi-level (CDN, Redis, application)
• Database: Indexing, query optimization
• Load balancing: Geographic distribution
• Async processing: Queue for heavy operations
• Data partitioning: Sharding strategies
• Connection pooling: Resource management

Caching Strategy

CDN

Static content, edge caching

Application Cache

Redis/Memcached for hot data

Database Cache

Query result caching

Database Scaling

Vertical Scaling

• Upgrade hardware (CPU, RAM)
• Limited by single machine
• Simple but expensive

Horizontal Scaling

• Replication (Master-Slave)
• Sharding (partition data)
• Federation (split databases)

Load Distribution

• Load Balancer: Round-robin, least connections, IP hash
• Message Queue: Decouple components, handle spikes
• Service Mesh: Microservices communication

Step 6: Deep Dives (Time Permitting)

🔍 Component Deep Dives

Based on interviewer interest or remaining time, dive deep into specific components:

Algorithm Details

• Consistent hashing implementation
• Rate limiting algorithms
• Ranking/recommendation logic
• Consensus protocols (Raft, Paxos)

Data Structures

• Bloom filters for deduplication
• LSM trees for write-heavy loads
• B-trees for indexing
• Tries for autocomplete

Specific Optimizations

• Database query optimization
• Connection pooling
• Batch processing strategies
• Compression techniques

Trade-off Analysis

• SQL vs NoSQL for specific use case
• Synchronous vs asynchronous processing
• Push vs pull architecture
• Monolith vs microservices

Time Management Guide

⏱️ 45-Minute Interview Timeline

0-5 min

S - Scope:Functional & non-functional requirements

5-15 min

A - API & Entities:Data model, API design, calculations

15-25 min

C - Core Design:Basic architecture for functional requirements

25-35 min

R - Refinement:Scale optimization, performance tuning

35-40 min

E - Edge Cases:Failures, monitoring, security

40-45 min

D - Deep Dives:Specific components, algorithms, Q&A

Common Pitfalls to Avoid

❌ What NOT to Do

Design Mistakes

❌ Jumping to implementation details too early
❌ Over-engineering for unlikely scenarios
❌ Ignoring data consistency requirements
❌ Not considering failure modes
❌ Forgetting about data growth over time

Communication Mistakes

❌ Not asking clarifying questions
❌ Making assumptions without stating them
❌ Getting stuck on one approach
❌ Not explaining trade-offs
❌ Avoiding areas you're unsure about

Framework Application Examples

📚 Applied to Common Problems

URL Shortener

S: 100M URLs/day, <100ms latency

A: URL entity, POST/GET APIs

C: LB→App→DB+Cache basic flow

R: CDN, sharding, read replicas

E: Rate limiting, failover

D: Base62 encoding, Snowflake ID

Chat Application

S: Real-time, 1M users

A: User, Message entities, WebSocket

C: WebSocket servers + message queue

R: Connection pooling, partitioning

E: Offline sync, delivery guarantees

D: Message ordering, encryption

Video Streaming

S: Netflix-like, global scale

A: Video, User entities, streaming APIs

C: Upload→Encode→CDN→Stream

R: Global CDN, adaptive bitrates

E: Buffering, bandwidth limits

D: Video encoding, caching strategy

Quick Reference Card

📋 Interview Checklist

Before You Start Drawing

☐ Clarified functional requirements
☐ Clarified scale and performance needs
☐ Stated your assumptions
☐ Did back-of-envelope math

Core Design Must-Haves

☐ Data model defined
☐ API endpoints listed
☐ High-level architecture drawn
☐ Database choice justified

Scaling Considerations

☐ Identified bottlenecks
☐ Added caching layers
☐ Discussed database scaling
☐ Considered geographic distribution

Production Readiness

☐ Handled failure scenarios
☐ Added monitoring/alerting
☐ Discussed security concerns
☐ Considered edge cases

💡 Remember: System design is about trade-offs. There's no perfect solution. Always explain your reasoning, acknowledge alternatives, and be prepared to adjust based on new requirements or constraints.