Distributed Unique ID Generator
Designing a distributed ID generator that can produce unique, ordered identifiers at massive scale. Think Twitter's Snowflake, Instagram's ID generation, or Discord's unique message IDs.
System Requirements
Functional Requirements
- •Uniqueness: IDs must be globally unique across all servers
- •Numerical: IDs are numerical values only (primary requirement)
- •64-bit constraint: Must fit in a 64-bit integer (long/bigint)
- •Time-ordered: IDs generated later have larger values
- •High throughput: Generate 10,000+ unique IDs per second
- •Low latency: ID generation should be fast (~1ms)
Why Not Use Traditional Approaches?
❌ Auto-increment Database ID
- • Single point of failure
- • Doesn't scale horizontally
- • Database bottleneck at high throughput
- • Hard to shard
❌ UUID/GUID
- • 128-bit (doesn't fit 64-bit requirement)
- • Not time-ordered (UUID v4)
- • Poor index performance
- • Not sequential
❌ Multi-Master Replication
- • ID collision risk across masters
- • Complex conflict resolution
- • Network partition issues
- • Synchronization overhead
🔍 Why Multi-Master Fails at Scale
Scenario: Two masters generating auto-increment IDs independently:
Master A: 1001, 1002, 1003...
Master B: 1001, 1002, 1003...
Result: Duplicate IDs! 💥
Common "Solutions" and Their Limitations:
Master A: 1, 4, 7, 10... (start=1, increment=3)
Master B: 2, 5, 8, 11... (start=2, increment=3)
Master C: 3, 6, 9, 12... (start=3, increment=3)
✓ No collisions, but ✗ Adding/removing servers is complex
✗ IDs reveal infrastructure (3→6→9 shows 3 servers)
✗ Gaps grow with more servers (100 servers = increment by 100)
The Real Problem: While increment-by-K prevents collisions, it doesn't meet our requirement of time-ordered IDs. IDs from different servers interleave randomly:
Time order: A₁(t=1), B₁(t=2), A₂(t=3), C₁(t=4), B₂(t=5)
Actual IDs: 1, 2, 4, 3, 5
Result: IDs are NOT time-ordered! 🔀
Approach 1: Twitter Snowflake
❄️ Snowflake Algorithm (Most Popular)
Twitter's Snowflake packs multiple components into a 64-bit integer to guarantee uniqueness and ordering.
Snowflake Bit Layout
64-bit ID Structure:
┌─────────────────────────────────────────────────────────────────┐
│ 0 │ 41 bits timestamp │ 10 bits machine │ 12 bits sequence │
└─────────────────────────────────────────────────────────────────┘
↑ ↑ ↑ ↑
Sign Milliseconds since Data center + Sequence counter
bit custom epoch Worker ID (per millisecond)
Example ID: 412376492842516481
Binary: 0|10110110010110100000101111000100001|01001|00101|000000000001
↑ └────────────41 bits──────────────┘└─5──┘└─5──┘└───12────┘Component Breakdown
| Component | Bits | Range | Purpose |
|---|---|---|---|
| Sign bit | 1 bit | Always 0 | Keep number positive |
| Timestamp | 41 bits | ~69 years | Milliseconds since epoch (ensures ordering) |
| Machine ID | 10 bits | 1024 machines | Unique server identifier |
| Sequence | 12 bits | 4096/ms | Counter for IDs in same millisecond |
Implementation
import time
import threading
class SnowflakeIDGenerator:
def __init__(self, datacenter_id, worker_id):
# Bit allocation
self.TIMESTAMP_BITS = 41
self.DATACENTER_BITS = 5
self.WORKER_BITS = 5
self.SEQUENCE_BITS = 12
# Max values
self.MAX_DATACENTER_ID = (1 << self.DATACENTER_BITS) - 1 # 31
self.MAX_WORKER_ID = (1 << self.WORKER_BITS) - 1 # 31
self.MAX_SEQUENCE = (1 << self.SEQUENCE_BITS) - 1 # 4095
# Bit shift amounts
self.WORKER_SHIFT = self.SEQUENCE_BITS # 12
self.DATACENTER_SHIFT = self.SEQUENCE_BITS + self.WORKER_BITS # 17
self.TIMESTAMP_SHIFT = self.DATACENTER_SHIFT + self.DATACENTER_BITS # 22
# Custom epoch (January 1, 2020)
self.EPOCH = 1577836800000
# Instance configuration
if datacenter_id > self.MAX_DATACENTER_ID or datacenter_id < 0:
raise ValueError(f"Datacenter ID must be between 0 and {self.MAX_DATACENTER_ID}")
if worker_id > self.MAX_WORKER_ID or worker_id < 0:
raise ValueError(f"Worker ID must be between 0 and {self.MAX_WORKER_ID}")
self.datacenter_id = datacenter_id
self.worker_id = worker_id
self.sequence = 0
self.last_timestamp = -1
self.lock = threading.Lock()
def _current_millis(self):
return int(time.time() * 1000)
def _wait_next_millis(self, last_timestamp):
timestamp = self._current_millis()
while timestamp <= last_timestamp:
timestamp = self._current_millis()
return timestamp
def generate(self):
with self.lock:
timestamp = self._current_millis()
# Clock moved backwards - wait
if timestamp < self.last_timestamp:
raise Exception(f"Clock moved backwards. Refusing to generate ID for {self.last_timestamp - timestamp} ms")
# Same millisecond - increment sequence
if timestamp == self.last_timestamp:
self.sequence = (self.sequence + 1) & self.MAX_SEQUENCE
# Sequence overflow - wait for next millisecond
if self.sequence == 0:
timestamp = self._wait_next_millis(self.last_timestamp)
else:
# New millisecond - reset sequence
self.sequence = 0
self.last_timestamp = timestamp
# Generate ID by combining all components
id_value = ((timestamp - self.EPOCH) << self.TIMESTAMP_SHIFT) | \
(self.datacenter_id << self.DATACENTER_SHIFT) | \
(self.worker_id << self.WORKER_SHIFT) | \
self.sequence
return id_value
# Usage
generator = SnowflakeIDGenerator(datacenter_id=1, worker_id=1)
for _ in range(5):
print(generator.generate())
# Output:
# 412376492842516481
# 412376492842516482
# 412376492842516483Snowflake Analysis
⚠️ The 69-Year Limit: What Happens After?
Question: What happens when the 41-bit timestamp overflows after ~69 years?
The Problem: Simply resetting to a new epoch would cause ID collisions - the same IDs would be generated again!
2020-2089: [timestamp=1][machine=1][seq=0] = 4096
2089-2158: [timestamp=1][machine=1][seq=0] = 4096 💥 Collision!
Real-World Solutions:
- • Most common: Migrate to new system before 69 years (average system lifespan: 10-15 years)
- • Version prefix: Add epoch version outside 64-bit (v1_id, v2_id)
- • Reserve bits: Use 2 bits for epoch version (4 epochs × 17 years each)
Reality: Twitter (2010 epoch) has until 2079. They'll likely migrate to a completely different system decades before then.
✅ Advantages
- • Time-ordered (sortable)
- • Highly scalable (1024 machines)
- • No network calls needed
- • 4096 IDs per millisecond per machine
- • Works for 69 years from epoch
⚠️ Considerations
- • Requires machine ID coordination
- • Clock synchronization critical
- • Clock rollback handling needed
- • ID reveals timestamp (privacy)
Approach 2: Database Ticket Server
🎫 Centralized Ticket Server
Flickr's approach: Use dedicated MySQL servers to generate IDs with auto-increment.
Architecture
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ App Server 1 │ │ App Server 2 │ │ App Server 3 │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└────────────────────┼────────────────────┘
│
┌──────▼──────┐
│Load Balancer│
└──────┬──────┘
│
┌──────────────┼──────────────┐
│ │
┌───────▼────────┐ ┌────────▼───────┐
│ Ticket Server 1│ │ Ticket Server 2│
│ (Odd IDs) │ │ (Even IDs) │
│ AUTO_INCREMENT│ │ AUTO_INCREMENT│
│ increment = 2 │ │ increment = 2 │
└────────────────┘ └────────────────┘MySQL Configuration
-- Ticket Server 1 (generates odd IDs)
CREATE TABLE tickets64 (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
stub CHAR(1) NOT NULL DEFAULT '',
PRIMARY KEY (id),
UNIQUE KEY (stub)
);
ALTER TABLE tickets64 AUTO_INCREMENT = 1;
SET @@auto_increment_increment = 2;
SET @@auto_increment_offset = 1;
-- Ticket Server 2 (generates even IDs)
ALTER TABLE tickets64 AUTO_INCREMENT = 2;
SET @@auto_increment_increment = 2;
SET @@auto_increment_offset = 2;
-- To get a new ID:
REPLACE INTO tickets64 (stub) VALUES ('a');
SELECT LAST_INSERT_ID();✅ Advantages
- • Simple implementation
- • IDs are roughly time-ordered
- • Works with any MySQL setup
- • No clock sync issues
❌ Disadvantages
- • Network call for every ID
- • Single point of failure risk
- • Database becomes bottleneck
- • Not suitable for 10K+ IDs/sec
Approach 3: UUID with Timestamp (UUIDv7)
🔑 Time-Ordered UUID (128-bit Alternative)
📝 Note: This doesn't fit the 64-bit requirement but is included for completeness as it's widely used when 128-bit is acceptable.
UUIDv7 Structure (Time-ordered):
┌────────────────────────────────────────────────────────────────┐
│ 48-bit timestamp │ 4-bit │ 12-bit random │ 2-bit │ 62-bit │
│ (milliseconds) │version│ or counter │variant│ random │
└────────────────────────────────────────────────────────────────┘
Example: 018e3f7a-5ec9-7a2b-9e4c-4f2b6c3d8e9f
└─timestamp─┘└ver┘└─────random──────┘Approach 4: Alphanumeric IDs
🔤 Alphanumeric ID Generation
When alphanumeric IDs are acceptable, we gain more density and flexibility.
Approach 4a: Encoded Snowflake
def generate_alphanumeric_id():
# Generate numeric Snowflake ID
numeric_id = snowflake_generator.generate() # e.g., 412376492842516481
# Encode to Base62 for URL-safe alphanumeric
base62_id = base62_encode(numeric_id) # e.g., "3kQ9mPx"
return base62_id
# Benefits:
# - 7-8 characters instead of 19 digits
# - Still time-ordered (underlying Snowflake)
# - URL-safe without encodingApproach 4b: Nano ID
import string
import secrets
import time
class NanoIDGenerator:
def __init__(self, alphabet=None, size=21):
self.alphabet = alphabet or string.ascii_letters + string.digits + '-_'
self.size = size
def generate(self):
# Add timestamp prefix for ordering (optional)
timestamp_part = base62_encode(int(time.time() * 1000))
# Random suffix for uniqueness
random_part = ''.join(secrets.choice(self.alphabet)
for _ in range(self.size - len(timestamp_part)))
return timestamp_part + random_part
# Example: "1Ky8_dXl9pRb2C4mNvQ7F"Approach 4c: KSUID (K-Sortable Unique ID)
KSUID Structure (160 bits = 20 bytes):
┌─────────────────────────────────────────┐
│ 32-bit timestamp │ 128-bit random payload│
│ (seconds since │ (16 random bytes) │
│ custom epoch) │ │
└─────────────────────────────────────────┘
Base62 encoded: 27 characters
Example: "2LjpHjBdPXKDSJ4gVcNvYjZ9MqT"Comparison Matrix
| Approach | 64-bit | Ordered | Throughput | Complexity | Use Case |
|---|---|---|---|---|---|
| Snowflake | ✅ Yes | ✅ Yes | 4M/sec | Medium | Twitter, Discord, Instagram |
| Ticket Server | ✅ Yes | ⚠️ Rough | 10K/sec | Simple | Flickr, Small scale |
| UUIDv7 | ❌ 128-bit | ✅ Yes | Unlimited | Simple | When 128-bit OK |
| Base62 Snowflake | ✅ Yes* | ✅ Yes | 4M/sec | Medium | URL shorteners |
| KSUID | ❌ 160-bit | ✅ Yes | Unlimited | Simple | Segment, Stripe |
Real-World Implementations
🐦 Twitter Snowflake
Original Snowflake implementation for tweet IDs
41 bits: timestamp
10 bits: machine ID
12 bits: sequence
= 64-bit sortable IDs
Modified Snowflake with PostgreSQL
41 bits: timestamp
13 bits: shard ID
10 bits: sequence
Uses PostgreSQL schemas
💬 Discord
Snowflake for messages, channels, users
Discord epoch: 2015-01-01
5 bits: worker ID
5 bits: process ID
Visible in all Discord IDs
🔧 MongoDB ObjectId
96-bit IDs with embedded timestamp
32 bits: Unix timestamp
40 bits: random value
24 bits: counter
Hex encoded: 24 chars
Handling Edge Cases
⚠️ Critical Considerations
Clock Synchronization
- • Use NTP to keep clocks in sync
- • Detect clock rollback and refuse to generate
- • Consider monotonic clocks
- • Add jitter buffer (wait if clock moves back)
Machine ID Assignment
- • Use ZooKeeper for dynamic assignment
- • Environment variables for static assignment
- • IP address hashing (risk of collisions)
- • Container orchestrator labels (K8s)
Sequence Overflow
- • Wait for next millisecond if sequence exhausted
- • Pre-generate ID pools during low traffic
- • Use larger sequence bits if needed
- • Monitor sequence usage patterns
Final Recommendation
🎯 For Your Requirements
Given the requirements (64-bit, numerical, ordered, 10K+ IDs/sec), Snowflake is the best choice.
Implementation Plan:
- 1. Use 41-bit timestamp (69 years from custom epoch)
- 2. Allocate 10 bits for machine ID (1024 servers)
- 3. Use 12-bit sequence (4096 IDs/ms = 4M IDs/sec per machine)
- 4. Implement clock rollback protection
- 5. Use ZooKeeper/etcd for machine ID coordination
- 6. For alphanumeric: Encode Snowflake IDs with Base62
Summary
Distributed ID generation is a fundamental problem with multiple solutions. Snowflake provides the best balance of:
- ✅ Fits in 64-bit
- ✅ Time-ordered for database indexing
- ✅ No coordination needed after setup
- ✅ Scales to thousands of machines
- ✅ Proven at Twitter, Discord, Instagram scale
For alphanumeric IDs, encoding Snowflake IDs with Base62 gives you shorter, URL-safe identifiers while maintaining all the benefits.
💡 Pro Tip: Start with Snowflake for numerical IDs. If you need shorter strings for URLs or user-facing IDs, Base62-encode them. This gives you the best of both worlds: sortable 64-bit integers in your database and compact strings for your users.