URL Shortener - Step 6: Deep Dives

Step 6 of 6: D - Deep Dives

Detailed component implementation and technology choices

❄️ Snowflake ID Generation Deep Dive

64-bit Snowflake ID Structure01 bitSignTimestamp41 bitsMilliseconds since epoch~69 yearsDC ID5 bits32 DCsMachine5 bits32 machinesSequence12 bits4096/msExample ID Generation:Binary: 0|10011010110111001101111010111110010|00001|00010|000000000001Decimal: 420894213832982529 → Base62: "8KpQ5Wx"Key Properties:• Time-sortable: IDs naturally ordered by creation time• Globally unique: No coordination needed between generators• High throughput: 4,096,000 IDs per second per datacenter• Compact: 64 bits fits in a single long integer

Implementation Details

class SnowflakeGenerator:
def init(self, dc_id, machine_id):
self.dc_id = dc_id
self.machine_id = machine_id
self.sequence = 0
self.last_timestamp = -1

Clock Skew Handling

  • • NTP sync every 30 seconds
  • • Reject if clock moves backward
  • • Wait if same millisecond
  • • Alert on > 5ms drift

🗄️ Database Technology Comparison

AspectDynamoDB ✅CassandraMongoDB
ConsistencyEventually/StrongEventuallyStrong
Auto-scaling✓ Built-inManualManual
MaintenanceFully managedSelf-managedAtlas managed
Cost at scaleHigherLowerMedium
Latency<10ms<10ms10-20ms

🎯 Decision: DynamoDB for managed simplicity, Cassandra if cost becomes a concern at massive scale

💾 Cache Strategy Implementation

Cache-Aside Pattern

def get_url(short_code):
# Try cache first
url = redis.get(short_code)
if url:
return url
# Cache miss, get from DB
url = db.get(short_code)
if url:
# Update cache
redis.setex(short_code, url, 86400)
return url

Cache Warming Strategy

  • 1.Pre-load top 10K URLs daily
  • 2.Warm cache on URL creation
  • 3.Predictive warming based on trends
  • 4.Gradual warming to avoid thundering herd

📊 Analytics Pipeline Architecture

User ClickEventKafkaBufferSparkStream ProcessClickHouseAnalytics DBAPIQueryAsyncNon-blockingPartitionedby short_codeAggregate1-min windowsTime-seriesColumnar

⚖️ Key Trade-offs & Decisions

Counter vs Hash

We chose counter-based (Snowflake) over hash

✓ Pro: No collisions, predictable performance


✗ Con: Slightly predictable, needs coordination

NoSQL vs SQL

We chose NoSQL (DynamoDB) over PostgreSQL

✓ Pro: Easy scaling, managed service


✗ Con: Higher cost, vendor lock-in

Sync vs Async Analytics

We chose async analytics with Kafka

✓ Pro: Non-blocking redirects, better performance


✗ Con: Eventual consistency, complex pipeline

302 vs 301 Redirects

We chose 302 (temporary) over 301 (permanent)

✓ Pro: Analytics tracking, flexibility


✗ Con: No browser caching, more server load

🚀 Future Improvements

Performance

  • • Edge computing with Workers
  • • QUIC protocol support
  • • Predictive prefetching
  • • WebAssembly validators

Features

  • • QR code generation
  • • Bulk URL import/export
  • • A/B testing support
  • • API rate limit tiers

Analytics

  • • ML-based fraud detection
  • • Real-time dashboards
  • • Predictive analytics
  • • Geographic heatmaps

🎉 Congratulations!

You've completed the full URL shortener system design using the SACRED framework!

📋

Requirements

Clarified & scoped

🔌

APIs

Designed & documented

🏗️

Architecture

Built & explained

📈

Scale

Calculated & addressed

⚠️

Edge Cases

Identified & handled

🔍

Deep Dives

Explored & justified