Rate Limiting
Discover powerful algorithms that control request flow and protect your systems from overload
Rate limiting is a fundamental technique in system design to control the rate at which requests are processed. It protects services from being overwhelmed, ensures fair resource allocation, and maintains quality of service.
Why Rate Limiting?
- Prevent DoS attacks: Stop malicious actors from overwhelming your service
- Fair usage: Ensure all users get reasonable access to resources
- Cost control: Limit expensive API calls or resource usage
- Quality of service: Maintain consistent performance for all users
Implementations
We've implemented four different rate limiting algorithms, each with its own trade-offs:
1. Token Bucket
The token bucket algorithm is one of the most popular rate limiting approaches. Think of it as a bucket that holds tokens:
- Tokens are added at a constant rate (e.g., 10 tokens per second)
- Each request consumes one or more tokens
- Requests are rejected when the bucket is empty
- Burst capacity allows handling spike traffic up to bucket size
from datetime import datetime
import time
class TokenBucket:
def __init__(self, rate: float, burst: int):
self.rate = rate # tokens per second
self.burst = burst # max tokens in bucket
self.tokens = burst # start with full bucket
self.last_time = time.time() # last refill timeVisual Flow:
Core Algorithm Logic:
def allow(self) -> bool:
# 1. Calculate elapsed time and refill tokens
now = time.time()
elapsed = now - self.last_time
self.tokens = min(self.burst, self.tokens + elapsed * self.rate)
self.last_time = now
# 2. Check if tokens available
if self.tokens >= 1:
self.tokens -= 1 # Consume token
return True # Allow request
return False # Reject requestPros:
- Smooth rate limiting
- Allows burst traffic
- Memory efficient
Cons:
- Not strictly accurate over short intervals
- Requires token calculation on each request
2. Fixed Window Counter
Divides time into fixed windows and counts requests per window:
import time
from datetime import datetime, timedelta
class FixedWindow:
def __init__(self, window_seconds: int, limit: int):
self.window_seconds = window_seconds # window size in seconds
self.limit = limit # max requests per window
self.counter = 0 # current window request count
self.window_start = None # current window start timeCore Algorithm Logic:
def allow(self) -> bool:
# 1. Calculate current window
now = time.time()
current_window = int(now // self.window_seconds) * self.window_seconds
# 2. Reset counter if new window
if self.window_start != current_window:
self.counter = 0
self.window_start = current_window
# 3. Check if under limit
if self.counter < self.limit:
self.counter += 1 # Increment counter
return True # Allow request
return False # Reject requestThe Boundary Problem:
3. Sliding Window Log
Stores timestamps of all requests within the current time window:
import time
from typing import List
class SlidingWindowLog:
def __init__(self, window_seconds: int, limit: int):
self.window_seconds = window_seconds
self.limit = limit
self.timestamps: List[float] = [] # all request timestamps in windowCore Algorithm Logic:
def allow(self) -> bool:
now = time.time()
cutoff = now - self.window_seconds # Window boundary
# 1. Remove expired timestamps
while self.timestamps and self.timestamps[0] < cutoff:
self.timestamps.pop(0) # Remove first element
# 2. Check if under limit
if len(self.timestamps) < self.limit:
self.timestamps.append(now) # Add current request
return True # Allow request
return False # Reject requestHow Sliding Window Log Fixes the Boundary Problem:
Most accurate but memory intensive (O(n) where n = requests in window)
4. Sliding Window Counter
Hybrid approach combining fixed windows with weighted calculation:
Previous window: 80 requests
Current window: 40 requests
Position: 25% into current window
Weighted count = 40 + (80 × 0.75) = 100import time
class SlidingWindowCounter:
def __init__(self, window_seconds: int, limit: int):
self.window_seconds = window_seconds # window size in seconds
self.limit = limit # max requests per window
self.current_count = 0 # requests in current window
self.previous_count = 0 # requests in previous window
self.current_window = None # current window start timeCore Algorithm Logic:
def allow(self) -> bool:
now = time.time()
current_window = int(now // self.window_seconds) * self.window_seconds
# 1. Update counters if new window
if self.current_window != current_window:
self.previous_count = self.current_count
self.current_count = 0
self.current_window = current_window
# 2. Calculate weighted count
elapsed = now - current_window
weight = 1.0 - (elapsed / self.window_seconds)
estimated_count = self.current_count + int(self.previous_count * weight)
# 3. Check if under limit
if estimated_count < self.limit:
self.current_count += 1 # Increment current window
return True # Allow request
return False # Reject requestHow Sliding Window Counter Approximates the Solution:
Performance Comparison
Performance Comparison
Note: Benchmarks run on Apple M4 Pro. Lower latency is better. All algorithms except Sliding Log have zero memory allocations per operation.
Algorithm Comparison
🪣Token Bucket
KEY FEATURES
- Burst handling
- Simple implementation
- Memory efficient
🪟Fixed Window
KEY FEATURES
- Simplest algorithm
- Reset at intervals
- Minimal overhead
📊Sliding Log
KEY FEATURES
- Most accurate
- Request-level tracking
- Higher memory usage
🔢Sliding Counter
KEY FEATURES
- Good balance
- Approximates sliding log
- Memory efficient
Interactive Demo
Try different rate limiting algorithms and see how they behave:
Rate Limiter Simulator
Request Timeline
Click "Run Simulation" to see requests
Production Considerations
1. Distributed Rate Limiting
In a distributed system, you need to coordinate rate limits across multiple servers:
# Using Redis for distributed rate limiting
import redis
from typing import Optional
class RedisRateLimiter:
def __init__(self, redis_client: redis.Redis, local_limiter: Optional[object] = None):
self.client = redis_client
self.local = local_limiter # fallback2. Rate Limit Headers
Always return rate limit information to clients:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 16409952003. Graceful Degradation
def allow_with_fallback(self, key: str) -> bool:
try:
# Try distributed rate limiter
return self.distributed_allow(key)
except Exception:
# Fall back to local rate limiter
return self.local.allow() if self.local else FalseCode Examples
Basic Usage
# Create a rate limiter
rl = TokenBucket(
rate=100.0, # 100 requests per second
burst=200 # burst up to 200
)
# Check if request is allowed
if rl.allow():
# Process request
print("Request allowed")
else:
# Return 429 Too Many Requests
print("Rate limit exceeded")API Gateway Example
from flask import Flask, request, jsonify
from functools import wraps
def rate_limit_middleware(rate_limiter):
def decorator(f):
@wraps(f)
def decorated_function(*args, **kwargs):
client_ip = request.remote_addr
if not rate_limiter.allow():
return jsonify({
"error": "Rate limit exceeded",
"retry_after": 60
}), 429
return f(*args, **kwargs)
return decorated_function
return decorator
# Usage example
app = Flask(__name__)
rl = TokenBucket(rate=100.0, burst=200)
@app.route('/api/data')
@rate_limit_middleware(rl)
def get_data():
return jsonify({"data": "Hello World"})Implementation
The Python implementations above provide clear, readable code for all four rate limiting algorithms. Key benefits of the Python version:
- Cleaner syntax - easier to understand the core algorithm logic
- Better readability - intuitive time handling and data structures
- Type hints - improved code documentation and IDE support
- Pythonic patterns - follows Python best practices
For production use, consider:
- Using
collections.dequeinstead oflistfor better performance in Sliding Window Log - Adding proper error handling and logging
- Implementing distributed coordination with Redis or similar
- Adding metrics and monitoring
Next Steps
- Learn about Consistent Hashing for distributed systems
- Implement rate limiting in your own projects