Client Devices๐Ÿ’ป๐Ÿ“ฑโŒš๐Ÿ–ฅ๏ธ๐Ÿ“ŸRate LimiterToken BucketSliding WindowProtected Server๐Ÿ–ฅ๏ธAPI ServerRejected (429)

Rate Limiting

Discover powerful algorithms that control request flow and protect your systems from overload

Rate limiting is a fundamental technique in system design to control the rate at which requests are processed. It protects services from being overwhelmed, ensures fair resource allocation, and maintains quality of service.

Why Rate Limiting?

  • Prevent DoS attacks: Stop malicious actors from overwhelming your service
  • Fair usage: Ensure all users get reasonable access to resources
  • Cost control: Limit expensive API calls or resource usage
  • Quality of service: Maintain consistent performance for all users

Implementations

We've implemented four different rate limiting algorithms, each with its own trade-offs:

1. Token Bucket

The token bucket algorithm is one of the most popular rate limiting approaches. Think of it as a bucket that holds tokens:

  • Tokens are added at a constant rate (e.g., 10 tokens per second)
  • Each request consumes one or more tokens
  • Requests are rejected when the bucket is empty
  • Burst capacity allows handling spike traffic up to bucket size
type TokenBucket struct {
    rate     float64    // tokens per second
    burst    int        // max tokens in bucket
    tokens   float64    // current tokens
    lastTime time.Time  // last refill time
}

Visual Flow:

Token Bucket

tokens available

Calculate ElapsedRefill Tokenselapsed ร— ratetokensโ‰ฅ 1?โœ“ Allowtokensโœ— RejectRate LimitedRequestYesNoโฑ๏ธTime-basedRefillToken Bucket Algorithm Flow

Core Algorithm Logic:

func (tb *TokenBucket) Allow() bool {
    // 1. Calculate elapsed time and refill tokens
    elapsed := time.Since(tb.lastTime).Seconds()
    tb.tokens = math.Min(tb.burst, tb.tokens + elapsed * tb.rate)
    tb.lastTime = time.Now()
    
    // 2. Check if tokens available
    if tb.tokens >= 1 {
        tb.tokens -= 1  // Consume token
        return true     // Allow request
    }
    return false        // Reject request
}

Pros:

  • Smooth rate limiting
  • Allows burst traffic
  • Memory efficient

Cons:

  • Not strictly accurate over short intervals
  • Requires token calculation on each request

2. Fixed Window Counter

Divides time into fixed windows and counts requests per window:

Window 1 (12:00:00-12:00:59)8/10 requestsWindow 2 (12:01:00-12:01:59)2/10 requestsUsed requestsAvailable requests
type FixedWindow struct {
    window      time.Duration  // window size (e.g., 1 minute)
    limit       int            // max requests per window
    counter     int            // current window request count
    windowStart time.Time      // current window start time
}

Core Algorithm Logic:

func (fw *FixedWindow) Allow() bool {
    // 1. Calculate current window
    now := time.Now()
    currentWindow := now.Truncate(fw.window) // Rounds down to window start
    
    // 2. Reset counter if new window
    if currentWindow != fw.windowStart {
        fw.counter = 0
        fw.windowStart = currentWindow
    }
    
    // 3. Check if under limit
    if fw.counter < fw.limit {
        fw.counter++   // Increment counter
        return true    // Allow request
    }
    return false       // Reject request
}

The Boundary Problem:

11:59:0011:59:3012:00:0012:00:3012:01:00Fixed Window 1Fixed Window 210 requests11:59:50-5910 requests12:00:00-1020 requests in 20s!Fixed Window Boundary Problem

3. Sliding Window Log

Stores timestamps of all requests within the current time window:

type SlidingWindowLog struct {
    window     time.Duration
    limit      int
    timestamps []time.Time  // all requests in window
}

Core Algorithm Logic:

func (swl *SlidingWindowLog) Allow() bool {
    now := time.Now()
    cutoff := now.Add(-swl.window) // Window boundary
    
    // 1. Remove expired timestamps
    for len(swl.timestamps) > 0 && swl.timestamps[0].Before(cutoff) {
        swl.timestamps = swl.timestamps[1:] // Remove first element
    }
    
    // 2. Check if under limit
    if len(swl.timestamps) < swl.limit {
        swl.timestamps = append(swl.timestamps, now) // Add current request
        return true  // Allow request
    }
    return false     // Reject request
}

How Sliding Window Log Fixes the Boundary Problem:

11:59:0011:59:3012:00:0012:00:3012:01:00Sliding Window (60s)11:59:05 โ†’ 12:00:055 requests11:59:10-405 allowed11:59:50-595 rejected12:00:00-10โœ“ Only 10 requests in any 60s windowAllowed (within window)Rejected (exceeds limit)Sliding Window Log: Accurate Rate Limiting

Most accurate but memory intensive (O(n) where n = requests in window)

4. Sliding Window Counter

Hybrid approach combining fixed windows with weighted calculation:

Previous window: 80 requests
Current window: 40 requests
Position: 25% into current window

Weighted count = 40 + (80 ร— 0.75) = 100
type SlidingWindowCounter struct {
    window        time.Duration  // window size (e.g., 1 minute)
    limit         int            // max requests per window
    currentCount  int            // requests in current window
    previousCount int            // requests in previous window
    currentWindow time.Time      // current window start time
}

Core Algorithm Logic:

func (swc *SlidingWindowCounter) Allow() bool {
    now := time.Now()
    currentWindow := now.Truncate(swc.window) // Current window start
    
    // 1. Update counters if new window
    if currentWindow != swc.currentWindow {
        swc.previousCount = swc.currentCount
        swc.currentCount = 0
        swc.currentWindow = currentWindow
    }
    
    // 2. Calculate weighted count
    elapsed := now.Sub(currentWindow)
    weight := 1.0 - float64(elapsed)/float64(swc.window)
    estimatedCount := swc.currentCount + int(float64(swc.previousCount) * weight)
    
    // 3. Check if under limit
    if estimatedCount < swc.limit {
        swc.currentCount++ // Increment current window
        return true        // Allow request
    }
    return false           // Reject request
}

How Sliding Window Counter Approximates the Solution:

11:59:0011:59:3012:00:0012:00:3012:01:00Previous Window: 8 requestsCurrent Window: 2 requestsNow (12:00:15)25% into window8 ร— 0.75 = 6Previous window weight2Current countEstimated Count = 2 + 6 = 8 requests (< 10 limit โœ“)Weight = 1 - (elapsed / window_size) = 1 - (15s / 60s) = 0.75Previous window (weighted)Current windowCurrent timeSliding Window Counter: Memory Efficient Approximation

Performance Comparison

Performance Comparison

Token Bucket43ns/op
Memory: 0 B/op
Fixed Window43ns/op
Memory: 0 B/op
Sliding Log45ns/op
Memory: 100 B/op
Sliding Counter51ns/op
Memory: 0 B/op

Note: Benchmarks run on Apple M4 Pro. Lower latency is better. All algorithms except Sliding Log have zero memory allocations per operation.

Algorithm Comparison

๐ŸชฃToken Bucket

Memory:O(1)
Accuracy:Good
Complexity:Low
Best for: General purpose, APIs

KEY FEATURES

  • Burst handling
  • Simple implementation
  • Memory efficient

๐ŸชŸFixed Window

Memory:O(1)
Accuracy:Fair
Complexity:Very Low
Best for: Simple quotas

KEY FEATURES

  • Simplest algorithm
  • Reset at intervals
  • Minimal overhead

๐Ÿ“ŠSliding Log

Memory:O(n)
Accuracy:Excellent
Complexity:Medium
Best for: Critical accuracy

KEY FEATURES

  • Most accurate
  • Request-level tracking
  • Higher memory usage

๐Ÿ”ขSliding Counter

Memory:O(1)
Accuracy:Very Good
Complexity:Medium
Best for: High-traffic APIs

KEY FEATURES

  • Good balance
  • Approximates sliding log
  • Memory efficient

Interactive Demo

Try different rate limiting algorithms and see how they behave:

Rate Limiter Simulator

Rate:10 req/s
Burst:20 requests

Request Timeline

Click "Run Simulation" to see requests

Production Considerations

1. Distributed Rate Limiting

In a distributed system, you need to coordinate rate limits across multiple servers:

// Using Redis for distributed rate limiting
type RedisRateLimiter struct {
    client *redis.Client
    local  RateLimiter  // fallback
}

2. Rate Limit Headers

Always return rate limit information to clients:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640995200

3. Graceful Degradation

func (rl *RateLimiter) AllowWithFallback(key string) bool {
    // Try distributed rate limiter
    allowed, err := rl.distributed.Allow(key)
    if err != nil {
        // Fall back to local rate limiter
        return rl.local.Allow(key)
    }
    return allowed
}

Code Examples

Basic Usage

// Create a rate limiter
config := ratelimiter.Config{
    Rate:  100.0,  // 100 requests per second
    Burst: 200,    // burst up to 200
}
rl := ratelimiter.NewTokenBucket(config)

// Check if request is allowed
if rl.Allow(ctx, userID) {
    // Process request
} else {
    // Return 429 Too Many Requests
}

API Gateway Example

func RateLimitMiddleware(rl ratelimiter.RateLimiter) func(next http.Handler) http.Handler {
    return func(next http.Handler) http.Handler {
        return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            key := getClientIP(r)
            
            if !rl.Allow(r.Context(), key) {
                w.Header().Set("Retry-After", "60")
                http.Error(w, "Rate limit exceeded", 429)
                return
            }
            
            next.ServeHTTP(w, r)
        })
    }
}

Implementation

Explore our Go implementations:

Next Steps

  • Learn about Consistent Hashing for distributed systems
  • Implement rate limiting in your own projects

References