Database Concepts: Scaling Strategies
Understanding how databases scale is fundamental to system design. This guide covers the core concepts of replication, sharding, and distributed query challenges.
Replication vs Sharding
Replication: Copying Data for Availability
π What is Replication?
Replication creates identical copies of your data across multiple database instances.
Primary Database β Replica 1 β Replica 2
βββββββββββββββββββ βββββββββββββ βββββββββββββ
β Users: 1M ββββββΆβ Users: 1M ββββββΆβ Users: 1M β
β Orders: 5M β β Orders: 5Mβ β Orders: 5Mβ
β Products: 10K β β Products: β β Products: β
βββββββββββββββββββ β 10K β β 10K β
βββββββββββββ βββββββββββββ
(Writes) (Reads) (Reads)Purpose: Handle more read traffic and provide backup copies.
Sharding: Splitting Data for Scale
β‘ What is Sharding?
Sharding splits your data across multiple database instances based on some key.
Application Request (user_id = 12345)
β
Shard Key: user_id
β
ββββ Hash(12345) % 3 = 0 βββββ
β β
Shard 0 Shard 1 Shard 2
ββββββββββββ ββββββββββββ ββββββββββββ
βUsers: β βUsers: β βUsers: β
β 1-333K β β 334-666K β β 667K-1M β
βOrders: β βOrders: β βOrders: β
β for usersβ β for usersβ β for usersβ
β 1-333K β β 334-666K β β 667K-1M β
ββββββββββββ ββββββββββββ ββββββββββββPurpose: Store more data and handle more total throughput.
Key Differences Summary
| Aspect | Sharding | Replication |
|---|---|---|
| Purpose | Scalability | Availability |
| Data Distribution | Different data per node | Same data on all nodes |
| Storage Capacity | Increases with nodes | Same as single node |
| Query Complexity | Cross-shard joins difficult | All data available locally |
| Failure Impact | Lose portion of data | Data still available |
| Write Scaling | Distributes write load | All writes to primary |
MySQL: Replication & Sharding
MySQL Replication Types
π Master-Slave
- β’ Writes: Only to master
- β’ Reads: From slaves
- β’ Consistency: Eventually consistent
- β’ Use case: Read-heavy workloads
π Master-Master
- β’ Writes: To any master
- β’ Reads: From any master
- β’ Consistency: Conflict resolution needed
- β’ Use case: High availability, geo-distribution
MySQL Sharding with Vitess
π Vitess Architecture
Application
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β VTGate (Proxy) β
β β’ Query routing β’ Connection pooling β’ Query planning β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βββββββββββ βββββββββββ βββββββββββ
β Shard 0 β β Shard 1 β β Shard 2 β
βββββββββββ βββββββββββ βββββββββββ
ββPrimaryββ ββPrimaryββ ββPrimaryββ
βββββββββββ βββββββββββ βββββββββββ
βββββββββββ βββββββββββ βββββββββββ
ββReplicaββ ββReplicaββ ββReplicaββ
βββββββββββ βββββββββββ βββββββββββ
βββββββββββ βββββββββββ βββββββββββ
users: users: users:
1-333K 334K-666K 667K-1MFeatures:
- Automatic resharding: Move data between shards
- Query routing: Send queries to correct shards
- Connection pooling: Efficient MySQL connections
- Backup/restore: Consistent snapshots across shards
Multi-Master Replication
π Multi-Master Setup
US East US West Europe
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Master 1 βββββββΆβ Master 2 βββββββΆβ Master 3 β
β β β β β β
β βββββββββββ β β βββββββββββ β β βββββββββββ β
β β Users β β β β Users β β β β Users β β
β β Orders β β β β Orders β β β β Orders β β
β β Productsβ β β β Productsβ β β β Productsβ β
β βββββββββββ β β βββββββββββ β β βββββββββββ β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β β β
East Coast West Coast European
Applications Applications ApplicationsBenefits:
- Geographic distribution: Low latency worldwide
- High availability: No single point of failure
- Write scaling: Multiple masters accept writes
Challenges:
- Conflict resolution: Same record updated simultaneously
- Network partitions: Split-brain scenarios
- Consistency: Eventually consistent across masters
MySQL Write Throughput Challenge
Critical Point: Even with Vitess, MySQL's fundamental limitation remains - writes only happen on the primary node. This creates a single point of write operations per shard.
For scaling write throughput in a single region, you have limited options:
β οΈ MySQL Write Scaling Options
1. Multi-Master Replication
- β’ Multiple nodes accept writes
- β’ True write throughput scaling
- β’ Conflict resolution complexity
- β’ Network partition challenges
2. Alternative Approaches
- β’ Write buffering/batching
- β’ Asynchronous processing
- β’ Move to NoSQL (MongoDB, Cassandra)
- β’ Event sourcing patterns
Reality Check:
Vitess Sharding: Multi-Master:
βββββββββββββββββββββββ βββββββββββββββββββββββ
β Shard 1: [Primary] β β Master 1: β Writes β
β [Replica] β VS β Master 2: β Writes β
β Shard 2: [Primary] β β Master 3: β Writes β
β [Replica] β β (All can write) β
βββββββββββββββββββββββ βββββββββββββββββββββββ
β β
Single writer per shard Multiple concurrent writersCross-Shard Joins: The Efficiency Challenge
When using sharded systems like Vitess, cross-shard joins are problematic and should be avoided when possible.
How Cross-Shard Joins Work
β οΈ Cross-Shard Join Process
-- Query: Get user details with their recent orders
SELECT u.name, u.email, o.order_id, o.total
FROM users u
JOIN orders o ON u.user_id = o.user_id
WHERE u.region = 'US'What Vitess has to do:
1. Scatter Phase:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Query sent to ALL shards (because we don't know β
β which shards contain matching users/orders) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
Shard 1 Shard 2 Shard 3 Shard 4
βββββββ βββββββ βββββββ βββββββ
βUsersβ βUsersβ βUsersβ βUsersβ
βOrdersβ βOrdersβ βOrdersβ βOrdersβ
βββββββ βββββββ βββββββ βββββββ
2. Gather Phase:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Collect partial results from each shard β
β Vitess proxy performs the JOIN operation β
β Sort, filter, and return final results β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββEfficiency Problems
π Performance Issues
- β’ Network overhead: Data from all shards
- β’ Memory usage: Vitess proxy holds all data
- β’ No index usage: Can't use MySQL join optimizations
- β’ Serial processing: Wait for slowest shard
- β’ Bandwidth waste: Moving large datasets
π Complexity Issues
- β’ Query planning: Vitess must understand joins
- β’ Transaction boundaries: ACID across shards
- β’ Deadlocks: Cross-shard dependencies
- β’ Limited SQL support: Not all joins supported
- β’ Debugging difficulty: Multi-shard query plans
Design Patterns to Avoid Cross-Shard Joins
1. Denormalization:
-- Instead of joining across shards
-- users (shard by user_id) + orders (shard by user_id)
-- Store user info in orders table
CREATE TABLE orders (
order_id BIGINT,
user_id BIGINT,
user_name VARCHAR(100), -- Denormalized
user_email VARCHAR(100), -- Denormalized
total DECIMAL(10,2),
created_at TIMESTAMP
) -- Shard by user_id2. Application-Level Joins:
// Fetch from multiple shards in application
users := fetchUsers(userIDs) // From user shards
orders := fetchOrders(userIDs) // From order shards
result := joinInMemory(users, orders) // App does the join3. Materialized Views:
-- Pre-computed join results
CREATE TABLE user_order_summary (
user_id BIGINT,
total_orders INT,
total_spent DECIMAL(12,2),
last_order_date DATE
) -- Updated via events/batch jobsPerformance Comparison:
Same-Shard Join: 1-10ms (MySQL native optimization)
Cross-Shard Join: 100-1000ms (Network + coordination overhead)
Application Join: 50-200ms (Parallel fetches + in-memory join)MySQL vs PostgreSQL Comparison
| Feature | MySQL | PostgreSQL |
|---|---|---|
| Replication Setup | Simple master-slave, built-in | Streaming replication, WAL-based |
| Multi-Master | Galera Cluster (third-party) | BDR, Postgres-XL (extensions) |
| Sharding Tools | Vitess (mature, YouTube-proven) | Citus (extension), manual sharding |
| Cross-Shard Queries | Limited in Vitess | Better support in Citus |
| Ecosystem Maturity | Very mature for scaling | Growing, more features |
| Best For | Web apps, proven scale | Complex queries, analytics |
When to Use What
Decision Matrix
| Scenario | Vitess | Multi-Master | Both |
|---|---|---|---|
| Single region, large dataset | β Perfect | Not needed | Overkill |
| Multi-region, small dataset | Not needed | β Perfect | Overkill |
| Global scale, large dataset | Helps | Helps | β Best |
| High write availability needed | Doesn't help | β Perfect | If also large |
| Latency-sensitive writes | Doesn't help | β Perfect | If also large |
Summary
- Replication: Same data, multiple copies β Availability & read scaling
- Sharding: Different data, partitioned β Storage & write scaling
- Vitess: MySQL sharding solution, proven at YouTube scale
- Multi-Master: Write scaling via multiple active nodes
- Cross-shard joins: Avoid in production, use denormalization instead
- Choose based on: Data size, geography, write patterns, availability needs
For specific system design patterns that use these concepts, see the individual design pages.