Consistent Hashing

Consistent hashing is a distributed hashing scheme that minimizes key redistribution when nodes are added or removed from a distributed system. It's a fundamental technique used in distributed caches, databases, and load balancers.

Consistent Hash Ring Visualization

Physical Nodes

Node A
2 keys
Node B
1 keys
Node C
0 keys

Test Keys

user123
Node B
session456
Node A
cache7
Node A

Settings

3
90°180°270°ABCuser123session456cache7
Physical Node
Virtual Node
Key
Key Assignment
Total Virtual Nodes
9
Keys per Node (avg)
1.0

How it works: Each key is hashed to a position on the ring (0-360°). The key is assigned to the first node found clockwise from its position. Virtual nodes improve distribution by giving each physical node multiple positions on the ring.

The Problem

Traditional hashing (hash(key) % N) redistributes most keys when N changes:

4 servers: hash("user123") % 4 = server 0
5 servers: hash("user123") % 5 = server 2 (moved!)

Result: ~80% of keys need to be redistributed

How Consistent Hashing Works

Imagine a circle (hash ring) with hash values from 0 to 2³²-1:

  1. Nodes are placed on the ring using hash(node_id)
  2. Keys are placed on the ring using hash(key)
  3. A key belongs to the first node clockwise from its position

Virtual Nodes

To improve load distribution, each physical node is mapped to multiple positions:

// Physical node A creates 150 virtual nodes
for i := 0; i < 150; i++ {
    hash := crc32.ChecksumIEEE([]byte(fmt.Sprintf("nodeA:%d", i)))
    ring[hash] = "nodeA"
}

This ensures:

  • Better load distribution (standard deviation < 10%)
  • Smoother rebalancing when nodes change
  • More predictable performance

Implementation with B-tree

We use Google's B-tree for O(log n) operations:

type ConsistentHash struct {
    tree     *btree.BTree      // B-tree for fast lookups
    nodes    map[string]int    // physical nodes
    replicas int               // virtual nodes per physical node
}

func (ch *ConsistentHash) GetNode(key string) string {
    hash := ch.hashFunc([]byte(key))
    
    // Find first virtual node >= hash
    var result string
    ch.tree.AscendGreaterOrEqual(VirtualNode{Hash: hash}, 
        func(item btree.Item) bool {
            result = item.(VirtualNode).NodeID
            return false // stop after first
        })
    
    return result
}

Distribution Quality

With 150 virtual nodes per physical node:

Key Distribution Across Nodes

Node A24.8%(-0.8%)
Node B25.3%(+1.2%)
Node C24.5%(-2.0%)
Node D25.4%(+1.6%)
Standard Deviation
0.45%
Excellent
Max Deviation
2.0%
Within target

Target: Each node should handle ~25% of keys. With 150 virtual nodes per physical node, distribution stays within ±5% of the target, ensuring balanced load across the cluster.

Real-World Usage

1. Distributed Cache (Memcached/Redis)

func (cache *DistributedCache) Get(key string) ([]byte, error) {
    node := cache.ring.GetNode(key)
    return cache.clients[node].Get(key)
}

2. Database Sharding

func (db *ShardedDB) GetShard(userID string) *DBConnection {
    shard := db.ring.GetNode(userID)
    return db.connections[shard]
}

3. Load Balancing

func (lb *LoadBalancer) Route(sessionID string) *Backend {
    backend := lb.ring.GetNode(sessionID)
    return lb.backends[backend]
}

Data Migration

When topology changes, consistent hashing tells you where data should go, but doesn't move it:

Adding a Node

// Before: 3 nodes
"user123" → NodeA
"user456" → NodeB

// After: 4 nodes (NodeD added)
"user123" → NodeD  // needs migration
"user456" → NodeB  // stays put

// Only ~25% of keys move (optimal)

Migration Strategies

  1. Lazy Migration: Migrate on access
  2. Active Migration: Background job moves data
  3. Dual Reads: Check both old and new locations

Performance

Performance Characteristics

GetNode

O(log n)

Find which node owns a key

109ns

AddNode

O(r log n)

Add a new node to the ring

86μs

RemoveNode

O(r log n)

Remove a node from the ring

90μs
1510
50150300

GetNode Operation Details

Time Complexity

O(log n)

n = total virtual nodes

Current Performance

109ns

With 600 virtual nodes

Operations/Second

9174311.9M

Theoretical maximum

GetNode is the most frequent operation, called on every request. With B-tree implementation, it maintains O(log n) complexity even with thousands of virtual nodes. At ~109ns per lookup, it can handle millions of requests per second.

Benchmarked on Intel Core i7 @ 2.6GHz with Go 1.21 | Times vary based on hardware and load

Production Considerations

1. Choose Replica Count

  • 100-200: Good balance of memory vs distribution
  • More replicas = better distribution but more memory

2. Hash Function

  • CRC32: Fast, good distribution (recommended)
  • MD5/SHA: Cryptographic but slower
  • MurmurHash: Excellent distribution, very fast

3. Monitoring

stats := ring.GetStats()
distribution := ring.Distribution(sampleKeys)
// Log if any node has >130% average load

Code Example

package main

import (
    "fmt"
    "github.com/hawaijar/system-design-go/pkg/consistenthash"
)

func main() {
    // Create ring with 150 virtual nodes per physical node
    ring := consistenthash.NewDefault()
    
    // Add nodes
    ring.AddNode("cache-1")
    ring.AddNode("cache-2")
    ring.AddNode("cache-3")
    
    // Route requests
    key := "user:12345"
    node := ring.GetNode(key)
    fmt.Printf("%s → %s\n", key, node)
    
    // Get replicas for redundancy
    replicas := ring.GetNodes(key, 3)
    fmt.Printf("Replicas: %v\n", replicas)
}

Comparison with Alternatives

Hashing Methods Comparison

Understanding the Methods: Click on any method below to see detailed explanation and examples. Each method has different trade-offs in terms of memory usage, lookup speed, and flexibility when adding or removing nodes.

Mod-N Hash

Static clusters

Memory

O(1)

Lookup

O(1)

Scale

O(n) redistribution

Consistent Hash

Dynamic clusters

Memory

O(vn)

Lookup

O(log vn)

Scale

O(k/n) movement

Jump Hash

Fixed growth

Memory

O(1)

Lookup

O(ln n)

Scale

Limited flexibility

Maglev

Load balancers

Memory

O(M)

Lookup

O(1)

Scale

O(M) rebuild

Rendezvous Hash

Small clusters

Memory

O(n)

Lookup

O(n)

Scale

O(k/n) movement

Consistent Hash

Best for: Dynamic clusters

Memory Usage

O(vn)

Lookup Performance

O(log vn)

Dynamic Operations

O(k/n) movement

Advantages
  • Minimal data movement
  • Smooth scaling
  • Good load distribution with virtual nodes
Limitations
  • More complex than mod-N
  • Requires virtual nodes for balance
  • Slightly slower lookups
When to use Consistent Hash: Ideal for distributed caches, databases, and any system where nodes frequently join or leave. Provides the best balance of features for most use cases.

Implementation

Explore our Go implementation:

Next Steps

References