Communication Patterns

Understanding different communication patterns is crucial for system design. Each pattern has its own trade-offs in terms of latency, complexity, resource usage, and real-time capabilities.

Quick Navigation

Core Patterns

• Overview of Communication Patterns
• 1. HTTP/HTTPS (Request-Response)
• 2. Long Polling
• 3. Server-Sent Events (SSE)
• 4. WebSocket
• 5. gRPC (Remote Procedure Call)
• 6. WebRTC (Peer-to-Peer)

Analysis & Guidelines

• Comparison Table
• When to Use What?
• Performance & Resource Comparison
• Summary
• Communication Reliability Patterns

Overview of Communication Patterns

Request-Response Patterns

• HTTP/HTTPS (Traditional)
• Long Polling
• HTTP/2 Server Push

Streaming Patterns

• Server-Sent Events (SSE)
• WebSocket (Bidirectional)
• WebRTC (Peer-to-Peer)

1. HTTP/HTTPS (Traditional Request-Response)

Standard HTTP/HTTPS Communication

✅ Advantages

• Simple and well-understood
• Stateless protocol
• Wide browser/server support
• Works with proxies/firewalls

❌ Disadvantages

• Connection overhead for each request
• No real-time updates
• Client must initiate all communication
• Inefficient for frequent updates

↑ Back to TOC

← Overview

Long Polling →

2. Long Polling

Long Polling Pattern

✅ Advantages

• Near real-time updates
• Works with HTTP/1.1
• No special protocols needed
• Reduces unnecessary polling

❌ Disadvantages

• Keeps connections open (resource intensive)
• Reconnection overhead
• Timeout handling complexity
• Not truly bidirectional

↑ Back to TOC

← HTTP/HTTPS

Server-Sent Events →

3. Server-Sent Events (SSE)

Server-Sent Events (EventSource)

✅ Advantages

• Real-time server-to-client updates
• Automatic reconnection
• Simple API (EventSource)
• Works over HTTP
• Built-in event IDs for replay

❌ Disadvantages

• Server-to-client only (unidirectional)
• Text data only (no binary)
• 6 connection limit per domain
• No request headers after connection

↑ Back to TOC

← Long Polling

WebSocket →

4. WebSocket

WebSocket (Full Duplex)

✅ Advantages

• Full-duplex bidirectional communication
• Low latency real-time updates
• Supports binary data
• Less overhead than HTTP polling
• Wide browser support

❌ Disadvantages

• More complex to implement
• Requires stateful connections
• Proxy/firewall complications
• No automatic reconnection
• Scaling challenges (sticky sessions)

↑ Back to TOC

← Server-Sent Events

gRPC →

5. gRPC (Remote Procedure Call)

gRPC - High Performance RPC

✅ Advantages

• High performance (10x faster than JSON/HTTP)
• Strongly typed with Protocol Buffers
• Built-in streaming support
• HTTP/2 multiplexing
• Auto-generated client/server code
• Efficient binary serialization
• Language agnostic

❌ Disadvantages

• Not browser-native (needs proxy)
• Binary format not human-readable
• Steeper learning curve
• Limited ecosystem vs REST
• Requires HTTP/2 support
• Schema versioning complexity

↑ Back to TOC

← WebSocket

WebRTC →

6. WebRTC (Peer-to-Peer)

WebRTC - P2P Communication over UDP

📌 Key Point: WebRTC uses UDP (not TCP) for media transport to achieve ultra-low latency. UDP's lack of guaranteed delivery is actually beneficial for real-time media where dropping old packets is better than waiting for retransmission.

🌐Deep Dive: STUN vs TURN Servers▼

🔍 STUN Server

Purpose: NAT Discovery & Public IP Detection

• Tells clients their public IP address
• Determines NAT type and behavior
• Enables direct P2P connections
• Low cost - just UDP packet reflection
• Works for ~80% of connections

Example: stun.l.google.com:19302

🔄 TURN Server

Purpose: Relay Server (Fallback)

• Relays traffic when direct P2P fails
• Handles symmetric NATs & firewalls
• Uses server bandwidth for media
• Higher latency than direct connection
• Required for ~20% of connections

Fallback: When STUN-assisted P2P fails

🔗 The Connection Process: Peers first contact STUN to discover their public endpoints. They exchange these "ICE candidates" via the signaling server. If direct UDP connection succeeds, they communicate P2P. If blocked by NAT/firewalls, they fall back to TURN relay.

✅ Advantages

• Lowest latency (direct P2P)
• No server bandwidth for media
• Built-in audio/video codecs
• End-to-end encryption
• NAT traversal capabilities

❌ Disadvantages

• Complex setup (SDP, ICE, STUN/TURN)
• Requires signaling server
• May need TURN relay servers
• Browser compatibility issues
• Difficult debugging

↑ Back to TOC

← gRPC

Comparison Table →

Comparison Table

Pattern	Direction	Real-time	Protocol	Use Cases
HTTP/HTTPS	Request-Response	No	HTTP	REST APIs, Web pages
Long Polling	Client → Server	Near real-time	HTTP	Notifications, Updates
SSE	Server → Client	Yes	HTTP	Live feeds, Dashboards
WebSocket	Bidirectional	Yes	WS/WSS	Chat, Gaming, Trading
gRPC	Bidirectional Streaming	Yes	HTTP/2	Microservices, APIs
WebRTC	P2P Bidirectional	Yes	SRTP/SCTP	Video calls, Screen sharing

When to Use What?

Use HTTP/HTTPS

• RESTful APIs
• CRUD operations
• File uploads/downloads
• Traditional web apps

Use Long Polling

• Fallback for older browsers
• Simple notifications
• Infrequent updates
• Behind restrictive proxies

Use SSE

• Live news feeds
• Stock price updates
• Server monitoring
• Progress indicators

Use WebSocket

• Chat applications
• Multiplayer games
• Collaborative editing
• Trading platforms

Use gRPC

• Microservice communication
• Internal APIs
• High-performance systems
• Mobile app backends

Use WebRTC

• Video conferencing
• Screen sharing
• P2P file transfer
• Low-latency gaming

Performance & Resource Comparison

Resource Usage & Performance

Metric	HTTP	Long Poll	SSE	WebSocket	gRPC	WebRTC
Latency	High	Medium	Low	Very Low	Very Low	Ultra Low
Server Resources	Low	High	Medium	Medium	Low	Low*
Complexity	Simple	Simple	Simple	Medium	Medium	Complex
Scalability	Excellent	Poor	Good	Medium	Excellent	Good**

Low server resources for media after connection establishment
** Good scalability for P2P, but TURN servers may be needed

Summary

Each communication pattern serves different needs:

HTTP/HTTPS: Best for traditional request-response scenarios
Long Polling: Simple real-time updates when WebSocket isn't available
SSE: Perfect for server-to-client streaming (news feeds, notifications)
WebSocket: Ideal for bidirectional real-time communication (chat, gaming)
gRPC: Excellent for high-performance microservice communication
WebRTC: Essential for peer-to-peer media streaming (video calls)

Choose based on your specific requirements for latency, directionality, browser support, and infrastructure complexity.

Communication Reliability Patterns

Communication protocols are only as reliable as the patterns you use to handle failures. These patterns ensure your systems gracefully handle network partitions, service outages, and cascading failures.

↑ Back to TOC

← Summary

🔄 Timeouts & Retries

📌 Key Point: Timeouts prevent hanging requests, retries handle transient failures, but both need careful configuration to avoid making things worse.

⏱️ Timeout Strategies

• Connection Timeout: 3-5 seconds
• Read Timeout: 10-30 seconds
• Total Request Timeout: 1-2 minutes
• Service-specific: Database (5s), Cache (100ms)

🔁 Retry Strategies

• Exponential Backoff: 1s, 2s, 4s, 8s...
• Jitter: ±25% randomization
• Max Retries: Usually 3-5 attempts
• Retry Budget: Limit retry rate (e.g., 10%)

🎲Deep Dive: Why Jitter Prevents Retry Storms▼

❌ Without Jitter

All clients retry at exact same intervals:

T+1s: 1000 clients retry

T+2s: 1000 clients retry

T+4s: 1000 clients retry

Result: Synchronized load spikes overwhelm recovering service

✅ With ±25% Jitter

Randomized retry intervals spread the load:

T+0.75-1.25s: Clients spread out

T+1.5-2.5s: Clients spread out

T+3-5s: Clients spread out

Result: Smooth load distribution allows recovery

🔧 Implementation Example

baseDelay = Math.pow(2, attempt) * 1000 // 1s, 2s, 4s...

jitter = baseDelay * 0.25 * (Math.random() - 0.5)

actualDelay = baseDelay + jitter

This spreads 1000ms retry into 750-1250ms range, preventing thundering herd.

⚠️ Common Mistakes

• Retry Storms: All clients retry simultaneously
• No Jitter: Synchronized retries amplify load spikes
• Retrying Non-Idempotent Operations: Payment processing, user creation
• No Circuit Breaking: Retrying when service is clearly down

⚡ Circuit Breaker Pattern

✅ Benefits

• Prevents cascading failures
• Fast failure responses
• Automatic recovery testing
• Resource protection

🔧 Configuration

• Failure threshold: 5 failures
• Success threshold: 3 successes
• Timeout: 60 seconds
• Request volume: 20 requests

🎯 Use Cases

• Database connections
• External API calls
• Microservice communication
• Third-party integrations

🌊 Cascading Failures

📌 Definition: When the failure of one component causes the failure of other components, creating a domino effect that can bring down entire systems.

Common Cascading Failure Scenarios:

🔥 Retry Amplification

Service A fails → All clients retry → 10x load on Service A → Service A crashes completely

Solution: Exponential backoff, circuit breakers, retry budgets

⚡ Resource Exhaustion

Database slow → Connection pool exhausted → Web servers hang → Load balancer marks them unhealthy

Solution: Timeouts, connection limits, bulkhead pattern

📈 Thundering Herd

Cache expires → All requests hit database → Database overloaded → More cache misses

Solution: Cache warming, staggered expiration, circuit breakers

🛡️ Prevention Strategies

• Bulkhead Pattern: Isolate critical resources
• Load Shedding: Drop low-priority requests
• Graceful Degradation: Reduce functionality, stay up
• Health Checks: Proactive failure detection
• Rate Limiting: Protect downstream services

🚨 Detection & Recovery

• Monitoring: Error rates, latency, throughput
• Alerting: Early warning systems
• Automated Rollback: Quick recovery
• Chaos Engineering: Proactive failure testing
• Runbooks: Clear incident response

💀 Dead Letter Queues

📌 Purpose: Capture messages that cannot be processed after multiple retry attempts, preventing message loss and infinite retry loops.

✅ Benefits

• Prevents message loss
• Stops infinite retries
• Enables failure analysis
• Manual recovery possible
• Maintains system stability

🔄 Processing Flow

• Message fails processing
• Retry with exponential backoff
• After max retries → DLQ
• Monitor DLQ for patterns
• Fix issues and replay

↑ Back to Table of Contents