Database Concepts: Scaling Strategies

Understanding how databases scale is fundamental to system design. This guide covers the core concepts of replication, sharding, and distributed query challenges.

Replication vs Sharding

Replication: Copying Data for Availability

🔄 What is Replication?

Replication creates identical copies of your data across multiple database instances.

Primary Database    →    Replica 1    →    Replica 2
┌─────────────────┐     ┌───────────┐     ┌───────────┐
│ Users: 1M       │────▶│ Users: 1M │────▶│ Users: 1M │
│ Orders: 5M      │     │ Orders: 5M│     │ Orders: 5M│
│ Products: 10K   │     │ Products: │     │ Products: │
└─────────────────┘     │    10K    │     │    10K    │
                        └───────────┘     └───────────┘
     (Writes)              (Reads)          (Reads)

Purpose: Handle more read traffic and provide backup copies.

Sharding: Splitting Data for Scale

⚡ What is Sharding?

Sharding splits your data across multiple database instances based on some key.

Application Request (user_id = 12345)
                    ↓
            Shard Key: user_id
                    ↓
    ┌─── Hash(12345) % 3 = 0 ────┐
    ↓                            ↓
Shard 0          Shard 1          Shard 2
┌──────────┐    ┌──────────┐    ┌──────────┐
│Users:    │    │Users:    │    │Users:    │
│ 1-333K   │    │ 334-666K │    │ 667K-1M  │
│Orders:   │    │Orders:   │    │Orders:   │
│ for users│    │ for users│    │ for users│
│ 1-333K   │    │ 334-666K │    │ 667K-1M  │
└──────────┘    └──────────┘    └──────────┘

Purpose: Store more data and handle more total throughput.

Key Differences Summary

Aspect	Sharding	Replication
Purpose	Scalability	Availability
Data Distribution	Different data per node	Same data on all nodes
Storage Capacity	Increases with nodes	Same as single node
Query Complexity	Cross-shard joins difficult	All data available locally
Failure Impact	Lose portion of data	Data still available
Write Scaling	Distributes write load	All writes to primary

MySQL: Replication & Sharding

MySQL Replication Types

📖 Master-Slave

• Writes: Only to master
• Reads: From slaves
• Consistency: Eventually consistent
• Use case: Read-heavy workloads

🔄 Master-Master

• Writes: To any master
• Reads: From any master
• Consistency: Conflict resolution needed
• Use case: High availability, geo-distribution

MySQL Sharding with Vitess

🚀 Vitess Architecture

Application
     ↓
┌─────────────────────────────────────────────────────────────┐
│                      VTGate (Proxy)                        │
│  • Query routing   • Connection pooling   • Query planning │
└─────────────────────────────────────────────────────────────┘
     ↓                    ↓                    ↓
┌─────────┐          ┌─────────┐          ┌─────────┐
│ Shard 0 │          │ Shard 1 │          │ Shard 2 │
│┌───────┐│          │┌───────┐│          │┌───────┐│
││Primary││          ││Primary││          ││Primary││
│└───────┘│          │└───────┘│          │└───────┘│
│┌───────┐│          │┌───────┐│          │┌───────┐│
││Replica││          ││Replica││          ││Replica││
│└───────┘│          │└───────┘│          │└───────┘│
└─────────┘          └─────────┘          └─────────┘
 users:               users:               users:
 1-333K               334K-666K            667K-1M

Features:

Automatic resharding: Move data between shards
Query routing: Send queries to correct shards
Connection pooling: Efficient MySQL connections
Backup/restore: Consistent snapshots across shards

Multi-Master Replication

🌐 Multi-Master Setup

       US East              US West              Europe
   ┌─────────────┐      ┌─────────────┐      ┌─────────────┐
   │   Master 1  │◄────▶│   Master 2  │◄────▶│   Master 3  │
   │             │      │             │      │             │
   │ ┌─────────┐ │      │ ┌─────────┐ │      │ ┌─────────┐ │
   │ │ Users   │ │      │ │ Users   │ │      │ │ Users   │ │
   │ │ Orders  │ │      │ │ Orders  │ │      │ │ Orders  │ │
   │ │ Products│ │      │ │ Products│ │      │ │ Products│ │
   │ └─────────┘ │      │ └─────────┘ │      │ └─────────┘ │
   └─────────────┘      └─────────────┘      └─────────────┘
         ↑                      ↑                      ↑
    East Coast            West Coast              European
    Applications          Applications            Applications

Benefits:

Geographic distribution: Low latency worldwide
High availability: No single point of failure
Write scaling: Multiple masters accept writes

Challenges:

Conflict resolution: Same record updated simultaneously
Network partitions: Split-brain scenarios
Consistency: Eventually consistent across masters

MySQL Write Throughput Challenge

Critical Point: Even with Vitess, MySQL's fundamental limitation remains - writes only happen on the primary node. This creates a single point of write operations per shard.

For scaling write throughput in a single region, you have limited options:

⚠️ MySQL Write Scaling Options

1. Multi-Master Replication

• Multiple nodes accept writes
• True write throughput scaling
• Conflict resolution complexity
• Network partition challenges

2. Alternative Approaches

• Write buffering/batching
• Asynchronous processing
• Move to NoSQL (MongoDB, Cassandra)
• Event sourcing patterns

Reality Check:

Vitess Sharding:               Multi-Master:
┌─────────────────────┐        ┌─────────────────────┐
│ Shard 1: [Primary] │        │ Master 1: ✓ Writes │
│         [Replica]  │   VS   │ Master 2: ✓ Writes │
│ Shard 2: [Primary] │        │ Master 3: ✓ Writes │
│         [Replica]  │        │ (All can write)     │
└─────────────────────┘        └─────────────────────┘
     ↑                                ↑
 Single writer per shard      Multiple concurrent writers

Cross-Shard Joins: The Efficiency Challenge

When using sharded systems like Vitess, cross-shard joins are problematic and should be avoided when possible.

How Cross-Shard Joins Work

⚠️ Cross-Shard Join Process

-- Query: Get user details with their recent orders
SELECT u.name, u.email, o.order_id, o.total 
FROM users u 
JOIN orders o ON u.user_id = o.user_id 
WHERE u.region = 'US'

What Vitess has to do:

1. Scatter Phase:
   ┌─────────────────────────────────────────────────────┐
   │ Query sent to ALL shards (because we don't know     │
   │ which shards contain matching users/orders)         │
   └─────────────────────────────────────────────────────┘
                           ↓
   Shard 1    Shard 2    Shard 3    Shard 4
   ┌─────┐    ┌─────┐    ┌─────┐    ┌─────┐
   │Users│    │Users│    │Users│    │Users│
   │Orders│   │Orders│   │Orders│   │Orders│
   └─────┘    └─────┘    └─────┘    └─────┘

2. Gather Phase:
   ┌─────────────────────────────────────────────────────┐
   │ Collect partial results from each shard            │
   │ Vitess proxy performs the JOIN operation           │
   │ Sort, filter, and return final results             │
   └─────────────────────────────────────────────────────┘

Efficiency Problems

🐌 Performance Issues

• Network overhead: Data from all shards
• Memory usage: Vitess proxy holds all data
• No index usage: Can't use MySQL join optimizations
• Serial processing: Wait for slowest shard
• Bandwidth waste: Moving large datasets

📊 Complexity Issues

• Query planning: Vitess must understand joins
• Transaction boundaries: ACID across shards
• Deadlocks: Cross-shard dependencies
• Limited SQL support: Not all joins supported
• Debugging difficulty: Multi-shard query plans

Design Patterns to Avoid Cross-Shard Joins

1. Denormalization:

-- Instead of joining across shards
-- users (shard by user_id) + orders (shard by user_id)
-- Store user info in orders table

CREATE TABLE orders (
  order_id BIGINT,
  user_id BIGINT,
  user_name VARCHAR(100),  -- Denormalized
  user_email VARCHAR(100), -- Denormalized  
  total DECIMAL(10,2),
  created_at TIMESTAMP
) -- Shard by user_id

2. Application-Level Joins:

// Fetch from multiple shards in application
users := fetchUsers(userIDs)      // From user shards
orders := fetchOrders(userIDs)    // From order shards
result := joinInMemory(users, orders) // App does the join

3. Materialized Views:

-- Pre-computed join results
CREATE TABLE user_order_summary (
  user_id BIGINT,
  total_orders INT,
  total_spent DECIMAL(12,2),
  last_order_date DATE
) -- Updated via events/batch jobs

Performance Comparison:

Same-Shard Join:     1-10ms    (MySQL native optimization)
Cross-Shard Join:    100-1000ms (Network + coordination overhead)
Application Join:    50-200ms   (Parallel fetches + in-memory join)

MySQL vs PostgreSQL Comparison

Feature	MySQL	PostgreSQL
Replication Setup	Simple master-slave, built-in	Streaming replication, WAL-based
Multi-Master	Galera Cluster (third-party)	BDR, Postgres-XL (extensions)
Sharding Tools	Vitess (mature, YouTube-proven)	Citus (extension), manual sharding
Cross-Shard Queries	Limited in Vitess	Better support in Citus
Ecosystem Maturity	Very mature for scaling	Growing, more features
Best For	Web apps, proven scale	Complex queries, analytics

When to Use What

Decision Matrix

Scenario	Vitess	Multi-Master	Both
Single region, large dataset	✓ Perfect	Not needed	Overkill
Multi-region, small dataset	Not needed	✓ Perfect	Overkill
Global scale, large dataset	Helps	Helps	✓ Best
High write availability needed	Doesn't help	✓ Perfect	If also large
Latency-sensitive writes	Doesn't help	✓ Perfect	If also large

Summary

Replication: Same data, multiple copies → Availability & read scaling
Sharding: Different data, partitioned → Storage & write scaling
Vitess: MySQL sharding solution, proven at YouTube scale
Multi-Master: Write scaling via multiple active nodes
Cross-shard joins: Avoid in production, use denormalization instead
Choose based on: Data size, geography, write patterns, availability needs

For specific system design patterns that use these concepts, see the individual design pages.