Rate Limiting

Discover powerful algorithms that control request flow and protect your systems from overload

Rate limiting is a fundamental technique in system design to control the rate at which requests are processed. It protects services from being overwhelmed, ensures fair resource allocation, and maintains quality of service.

Why Rate Limiting?

Prevent DoS attacks: Stop malicious actors from overwhelming your service
Fair usage: Ensure all users get reasonable access to resources
Cost control: Limit expensive API calls or resource usage
Quality of service: Maintain consistent performance for all users

Implementations

We've implemented four different rate limiting algorithms, each with its own trade-offs:

1. Token Bucket

The token bucket algorithm is one of the most popular rate limiting approaches. Think of it as a bucket that holds tokens:

Tokens are added at a constant rate (e.g., 10 tokens per second)
Each request consumes one or more tokens
Requests are rejected when the bucket is empty
Burst capacity allows handling spike traffic up to bucket size

from datetime import datetime
import time

class TokenBucket:
    def __init__(self, rate: float, burst: int):
        self.rate = rate          # tokens per second
        self.burst = burst        # max tokens in bucket
        self.tokens = burst       # start with full bucket
        self.last_time = time.time()  # last refill time

Visual Flow:

tokens available

Calculate ElapsedRefill Tokenselapsed × ratetokens≥ 1?✓ Allowtokens✗ RejectRate LimitedRequestYesNo⏱️Time-basedRefillToken Bucket Algorithm Flow

Core Algorithm Logic:

def allow(self) -> bool:
    # 1. Calculate elapsed time and refill tokens
    now = time.time()
    elapsed = now - self.last_time
    self.tokens = min(self.burst, self.tokens + elapsed * self.rate)
    self.last_time = now
    
    # 2. Check if tokens available
    if self.tokens >= 1:
        self.tokens -= 1  # Consume token
        return True       # Allow request
    return False          # Reject request

Pros:

Smooth rate limiting
Allows burst traffic
Memory efficient

Cons:

Not strictly accurate over short intervals
Requires token calculation on each request

2. Fixed Window Counter

Divides time into fixed windows and counts requests per window:

import time
from datetime import datetime, timedelta

class FixedWindow:
    def __init__(self, window_seconds: int, limit: int):
        self.window_seconds = window_seconds  # window size in seconds
        self.limit = limit                    # max requests per window
        self.counter = 0                      # current window request count
        self.window_start = None              # current window start time

Core Algorithm Logic:

def allow(self) -> bool:
    # 1. Calculate current window
    now = time.time()
    current_window = int(now // self.window_seconds) * self.window_seconds
    
    # 2. Reset counter if new window
    if self.window_start != current_window:
        self.counter = 0
        self.window_start = current_window
    
    # 3. Check if under limit
    if self.counter < self.limit:
        self.counter += 1  # Increment counter
        return True        # Allow request
    return False           # Reject request

The Boundary Problem:

3. Sliding Window Log

Stores timestamps of all requests within the current time window:

import time
from typing import List

class SlidingWindowLog:
    def __init__(self, window_seconds: int, limit: int):
        self.window_seconds = window_seconds
        self.limit = limit
        self.timestamps: List[float] = []  # all request timestamps in window

Core Algorithm Logic:

def allow(self) -> bool:
    now = time.time()
    cutoff = now - self.window_seconds  # Window boundary
    
    # 1. Remove expired timestamps
    while self.timestamps and self.timestamps[0] < cutoff:
        self.timestamps.pop(0)  # Remove first element
    
    # 2. Check if under limit
    if len(self.timestamps) < self.limit:
        self.timestamps.append(now)  # Add current request
        return True   # Allow request
    return False      # Reject request

How Sliding Window Log Fixes the Boundary Problem:

Most accurate but memory intensive (O(n) where n = requests in window)

4. Sliding Window Counter

Hybrid approach combining fixed windows with weighted calculation:

Previous window: 80 requests
Current window: 40 requests
Position: 25% into current window

Weighted count = 40 + (80 × 0.75) = 100

import time

class SlidingWindowCounter:
    def __init__(self, window_seconds: int, limit: int):
        self.window_seconds = window_seconds  # window size in seconds
        self.limit = limit                    # max requests per window
        self.current_count = 0                # requests in current window
        self.previous_count = 0               # requests in previous window
        self.current_window = None            # current window start time

Core Algorithm Logic:

def allow(self) -> bool:
    now = time.time()
    current_window = int(now // self.window_seconds) * self.window_seconds
    
    # 1. Update counters if new window
    if self.current_window != current_window:
        self.previous_count = self.current_count
        self.current_count = 0
        self.current_window = current_window
    
    # 2. Calculate weighted count
    elapsed = now - current_window
    weight = 1.0 - (elapsed / self.window_seconds)
    estimated_count = self.current_count + int(self.previous_count * weight)
    
    # 3. Check if under limit
    if estimated_count < self.limit:
        self.current_count += 1  # Increment current window
        return True              # Allow request
    return False                 # Reject request

How Sliding Window Counter Approximates the Solution:

Performance Comparison

Token Bucket43ns/op

Memory: 0 B/op

Fixed Window43ns/op

Memory: 0 B/op

Sliding Log45ns/op

Memory: 100 B/op

Sliding Counter51ns/op

Memory: 0 B/op

Note: Benchmarks run on Apple M4 Pro. Lower latency is better. All algorithms except Sliding Log have zero memory allocations per operation.

Algorithm Comparison

🪣Token Bucket

Memory:O(1)

Accuracy:Good

Complexity:Low

Best for: General purpose, APIs

KEY FEATURES

Burst handling
Simple implementation
Memory efficient

🪟Fixed Window

Memory:O(1)

Accuracy:Fair

Complexity:Very Low

Best for: Simple quotas

KEY FEATURES

Simplest algorithm
Reset at intervals
Minimal overhead

📊Sliding Log

Memory:O(n)

Accuracy:Excellent

Complexity:Medium

Best for: Critical accuracy

KEY FEATURES

Most accurate
Request-level tracking
Higher memory usage

🔢Sliding Counter

Memory:O(1)

Accuracy:Very Good

Complexity:Medium

Best for: High-traffic APIs

KEY FEATURES

Good balance
Approximates sliding log
Memory efficient

Interactive Demo

Try different rate limiting algorithms and see how they behave:

Rate Limiter Simulator

Algorithm

Rate:10 req/s

Burst:20 requests

Request Timeline

Click "Run Simulation" to see requests

Production Considerations

1. Distributed Rate Limiting

In a distributed system, you need to coordinate rate limits across multiple servers:

# Using Redis for distributed rate limiting
import redis
from typing import Optional

class RedisRateLimiter:
    def __init__(self, redis_client: redis.Redis, local_limiter: Optional[object] = None):
        self.client = redis_client
        self.local = local_limiter  # fallback

2. Rate Limit Headers

Always return rate limit information to clients:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1640995200

3. Graceful Degradation

def allow_with_fallback(self, key: str) -> bool:
    try:
        # Try distributed rate limiter
        return self.distributed_allow(key)
    except Exception:
        # Fall back to local rate limiter
        return self.local.allow() if self.local else False

Code Examples

Basic Usage

# Create a rate limiter
rl = TokenBucket(
    rate=100.0,   # 100 requests per second
    burst=200     # burst up to 200
)

# Check if request is allowed
if rl.allow():
    # Process request
    print("Request allowed")
else:
    # Return 429 Too Many Requests
    print("Rate limit exceeded")

API Gateway Example

from flask import Flask, request, jsonify
from functools import wraps

def rate_limit_middleware(rate_limiter):
    def decorator(f):
        @wraps(f)
        def decorated_function(*args, **kwargs):
            client_ip = request.remote_addr
            
            if not rate_limiter.allow():
                return jsonify({
                    "error": "Rate limit exceeded",
                    "retry_after": 60
                }), 429
            
            return f(*args, **kwargs)
        return decorated_function
    return decorator

# Usage example
app = Flask(__name__)
rl = TokenBucket(rate=100.0, burst=200)

@app.route('/api/data')
@rate_limit_middleware(rl)
def get_data():
    return jsonify({"data": "Hello World"})

Implementation

The Python implementations above provide clear, readable code for all four rate limiting algorithms. Key benefits of the Python version:

Cleaner syntax - easier to understand the core algorithm logic
Better readability - intuitive time handling and data structures
Type hints - improved code documentation and IDE support
Pythonic patterns - follows Python best practices

For production use, consider:

Using collections.deque instead of list for better performance in Sliding Window Log
Adding proper error handling and logging
Implementing distributed coordination with Redis or similar
Adding metrics and monitoring

Next Steps

Learn about Consistent Hashing for distributed systems
Implement rate limiting in your own projects

Rate Limiting

Why Rate Limiting?

Implementations

1. Token Bucket

2. Fixed Window Counter

3. Sliding Window Log

4. Sliding Window Counter

Performance Comparison

Performance Comparison

Algorithm Comparison

🪣Token Bucket

🪟Fixed Window

📊Sliding Log

🔢Sliding Counter

Interactive Demo

Rate Limiter Simulator

Request Timeline

Production Considerations

1. Distributed Rate Limiting

2. Rate Limit Headers

3. Graceful Degradation

Code Examples

Basic Usage

API Gateway Example

Implementation

Next Steps

References