Consistent Hashing
Consistent hashing is a distributed hashing scheme that minimizes key redistribution when nodes are added or removed from a distributed system. It's a fundamental technique used in distributed caches, databases, and load balancers.
Consistent Hash Ring Visualization
Physical Nodes
Test Keys
Settings
How it works: Each key is hashed to a position on the ring (0-360°). The key is assigned to the first node found clockwise from its position. Virtual nodes improve distribution by giving each physical node multiple positions on the ring.
The Problem
Traditional hashing (hash(key) % N
) redistributes most keys when N changes:
4 servers: hash("user123") % 4 = server 0
5 servers: hash("user123") % 5 = server 2 (moved!)
Result: ~80% of keys need to be redistributed
How Consistent Hashing Works
Imagine a circle (hash ring) with hash values from 0 to 2³²-1:
- Nodes are placed on the ring using hash(node_id)
- Keys are placed on the ring using hash(key)
- A key belongs to the first node clockwise from its position
Virtual Nodes
To improve load distribution, each physical node is mapped to multiple positions:
// Physical node A creates 150 virtual nodes
for i := 0; i < 150; i++ {
hash := crc32.ChecksumIEEE([]byte(fmt.Sprintf("nodeA:%d", i)))
ring[hash] = "nodeA"
}
This ensures:
- Better load distribution (standard deviation < 10%)
- Smoother rebalancing when nodes change
- More predictable performance
Implementation with B-tree
We use Google's B-tree for O(log n) operations:
type ConsistentHash struct {
tree *btree.BTree // B-tree for fast lookups
nodes map[string]int // physical nodes
replicas int // virtual nodes per physical node
}
func (ch *ConsistentHash) GetNode(key string) string {
hash := ch.hashFunc([]byte(key))
// Find first virtual node >= hash
var result string
ch.tree.AscendGreaterOrEqual(VirtualNode{Hash: hash},
func(item btree.Item) bool {
result = item.(VirtualNode).NodeID
return false // stop after first
})
return result
}
Distribution Quality
With 150 virtual nodes per physical node:
Key Distribution Across Nodes
Target: Each node should handle ~25% of keys. With 150 virtual nodes per physical node, distribution stays within ±5% of the target, ensuring balanced load across the cluster.
Real-World Usage
1. Distributed Cache (Memcached/Redis)
func (cache *DistributedCache) Get(key string) ([]byte, error) {
node := cache.ring.GetNode(key)
return cache.clients[node].Get(key)
}
2. Database Sharding
func (db *ShardedDB) GetShard(userID string) *DBConnection {
shard := db.ring.GetNode(userID)
return db.connections[shard]
}
3. Load Balancing
func (lb *LoadBalancer) Route(sessionID string) *Backend {
backend := lb.ring.GetNode(sessionID)
return lb.backends[backend]
}
Data Migration
When topology changes, consistent hashing tells you where data should go, but doesn't move it:
Adding a Node
// Before: 3 nodes
"user123" → NodeA
"user456" → NodeB
// After: 4 nodes (NodeD added)
"user123" → NodeD // needs migration
"user456" → NodeB // stays put
// Only ~25% of keys move (optimal)
Migration Strategies
- Lazy Migration: Migrate on access
- Active Migration: Background job moves data
- Dual Reads: Check both old and new locations
Performance
Performance Characteristics
GetNode
O(log n)
Find which node owns a key
AddNode
O(r log n)
Add a new node to the ring
RemoveNode
O(r log n)
Remove a node from the ring
GetNode Operation Details
Time Complexity
O(log n)
n = total virtual nodes
Current Performance
109ns
With 600 virtual nodes
Operations/Second
9174311.9M
Theoretical maximum
GetNode is the most frequent operation, called on every request. With B-tree implementation, it maintains O(log n) complexity even with thousands of virtual nodes. At ~109ns per lookup, it can handle millions of requests per second.
Production Considerations
1. Choose Replica Count
- 100-200: Good balance of memory vs distribution
- More replicas = better distribution but more memory
2. Hash Function
- CRC32: Fast, good distribution (recommended)
- MD5/SHA: Cryptographic but slower
- MurmurHash: Excellent distribution, very fast
3. Monitoring
stats := ring.GetStats()
distribution := ring.Distribution(sampleKeys)
// Log if any node has >130% average load
Code Example
package main
import (
"fmt"
"github.com/hawaijar/system-design-go/pkg/consistenthash"
)
func main() {
// Create ring with 150 virtual nodes per physical node
ring := consistenthash.NewDefault()
// Add nodes
ring.AddNode("cache-1")
ring.AddNode("cache-2")
ring.AddNode("cache-3")
// Route requests
key := "user:12345"
node := ring.GetNode(key)
fmt.Printf("%s → %s\n", key, node)
// Get replicas for redundancy
replicas := ring.GetNodes(key, 3)
fmt.Printf("Replicas: %v\n", replicas)
}
Comparison with Alternatives
Hashing Methods Comparison
Mod-N Hash
Static clusters
O(1)
O(1)
O(n) redistribution
Consistent Hash
Dynamic clusters
O(vn)
O(log vn)
O(k/n) movement
Jump Hash
Fixed growth
O(1)
O(ln n)
Limited flexibility
Maglev
Load balancers
O(M)
O(1)
O(M) rebuild
Rendezvous Hash
Small clusters
O(n)
O(n)
O(k/n) movement
Consistent Hash
Best for: Dynamic clusters
Memory Usage
O(vn)
Lookup Performance
O(log vn)
Dynamic Operations
O(k/n) movement
Advantages
- Minimal data movement
- Smooth scaling
- Good load distribution with virtual nodes
Limitations
- More complex than mod-N
- Requires virtual nodes for balance
- Slightly slower lookups
Implementation
Explore our Go implementation:
- 📄 Source Code - Core consistent hashing with B-tree
- 🧪 Example Usage - Practical examples and demos
- 📖 Documentation - Features and best practices
Next Steps
- Try our interactive hash ring demo
- Implement consistent hashing in your project
- Learn about Rate Limiting