URL Shortener - Step 3: Core High-Level Design

Step 3 of 6: C - Core High-Level Design

Design the algorithm and basic architecture to satisfy functional requirements

URL Shortening Algorithm: Counter vs Hash✅ Counter-Based (Our Choice)Snowflake ID GeneratorDistributed, time-ordered, uniqueGenerate 64-bit IDe.g., 1234567890123456789Convert to Base620-9, a-z, A-Z (62 chars)Result: 7-char codee.g., "8KpQ5Wx"Advantages:✓ No collisions (guaranteed unique)✓ No DB check needed✓ Predictable length (always 7 chars)✓ Time-sortable (useful for analytics)✓ Fast generation (< 1ms)✗ Slightly predictable patterns✗ Need distributed ID generator❌ Hash-Based (Alternative)Take Long URLhttps://example.com/very/long/urlGenerate MD5/SHA Hash128-bit or 256-bit hashTake First 43 bits → Base62Truncate for 7 chars⚠️ Check for CollisionDB lookup required!Disadvantages:✗ Collision possible (Birthday paradox)✗ Must check DB for every creation✗ Retry logic needed✗ Performance degrades over time✓ Same URL → Same short code✓ No external dependencies✓ Simpler to implement initially

✅ Decision: Counter

Snowflake ID → Base62

No collisions, fast, scalable

📊 Why Base62?

62^7 = 3.5 trillion URLs

Enough for 100+ years

⚡ Performance

< 1ms generation time

No DB check needed

🔢Alternative: Redis Distributed Counter

🎯 Third Approach: Use Redis INCR command for atomic counter operations across multiple instances

✅ Snowflake ID (Our Choice)

  • Coordination: None needed
  • Dependencies: None
  • Performance: Ultra-fast (<1ms)
  • Availability: No SPOF
  • Complexity: Medium
  • Collision Risk: Zero

🔄 Redis Counter

  • Coordination: Redis cluster
  • Dependencies: Redis availability
  • Performance: Fast (~2-5ms)
  • Availability: Redis SPOF
  • Complexity: Low
  • Collision Risk: Zero

❌ Hash-Based

  • Coordination: None
  • Dependencies: None
  • Performance: Degrades over time
  • Availability: No SPOF
  • Complexity: Low initially
  • Collision Risk: Increases with scale

🔧 Redis Counter Implementation

# Redis Cluster Setup
INCR url_counter # Atomic increment
GET url_counter # Returns: 1234567
base62_encode(1234567) # → "8KpQ"
# Multiple Redis masters for HA
counter_1 = INCR shard_1:counter
counter_2 = INCR shard_2:counter
✅ Redis Pros
  • • Simple implementation (just INCR)
  • • Guaranteed atomic operations
  • • Sequential IDs (good for analytics)
  • • Easy to understand and debug
  • • Can shard counters for scale
❌ Redis Cons
  • • Redis becomes critical dependency
  • • Network latency for each generation
  • • Need Redis HA setup complexity
  • • Counter state persistence concerns
  • • Potential bottleneck at massive scale

🤔 When to Use Redis Counter: Great for simpler systems where you already have Redis infrastructure. For massive scale (our 100M URLs/day), Snowflake's independence wins, but Redis counter is perfectly viable for millions of URLs/day with proper HA setup.

🏗️ Basic System Architecture

ClientBrowser/AppAPI Gateway/ Load BalancerRoute by path:/api/* → Write/{short_code} → ReadWrite ServiceCreate URLsSnowflake IDGenerator(In-process)Read ServiceRedirect URLs10x more instancesRedis CacheLRUHot URLsNoSQL Database(DynamoDB/Cassandra)PK: short_codeSimple key-valueAuto-shardedPOSTWriteGETCheckMiss

🔀 Why Separate Read/Write Services?

Write Service (1,200 QPS)

  • CPU-light: Just ID generation
  • Few instances needed (3-5)
  • Stateless, horizontal scaling
  • Can optimize for writes

Read Service (11,600 QPS)

  • Cache-heavy: Redis first
  • More instances (20-30)
  • Can scale independently
  • Optimize for low latency

💡 Core Design Complete

Algorithm Chosen

Counter-based with Snowflake ID for guaranteed uniqueness

Architecture Pattern

Read/Write separation for independent scaling

Storage Strategy

NoSQL with Redis cache for hot URLs