Sharding & Partitioning

Splitting a database into smaller pieces either by rows (horizontal) or by columns (vertical)

Overview

Sharding (horizontal partitioning) splits a database into smaller, more manageable pieces called shards. Each shard contains a subset of the data and can be on a different server.

This is essential for scaling beyond what a single database server can handle, but introduces significant complexity.

Key Concepts

Shard Key

The field used to determine which shard data belongs to. Common choices: user_id, geographic region, hash of ID.

Horizontal Partitioning (Sharding)

Splitting rows across multiple databases. User 1-1000 on Shard1, 1001-2000 on Shard2.

Vertical Partitioning

Splitting columns across tables. User profile in Table1, user preferences in Table2.

Hash-Based Sharding

Use hash function on shard key to determine shard. Good distribution but hard to rebalance.

Range-Based Sharding

Split by ranges (IDs 1-1M on Shard1, 1M-2M on Shard2). Easy to add shards but can create hotspots.

How It Works

Example: Shard by user_id

Shard 1: Users 1-1,000,000 Shard 2: Users 1,000,001-2,000,000 Shard 3: Users 2,000,001-3,000,000

Query Flow:

  1. Get user 1,234,567
  2. Calculate shard: user_id / 1M = Shard 2
  3. Query only Shard 2
  4. Return result

Cross-Shard Query: SELECT * FROM users WHERE created_at > '2024-01-01' Problem: Must query ALL shards and merge results

  • Much slower, avoid if possible

Use Cases

Applications with massive data (billions of rows)

Multi-tenant applications (shard by tenant_id)

Social media (shard by user_id)

Time-series data (shard by time range)

Geographic applications (shard by region)

Best Practices

Choose shard key carefully (even distribution, avoid hotspots)

Shard by entity that naturally partitions (user_id, tenant_id)

Avoid cross-shard queries and joins

Implement application-level routing

Plan for resharding from the start

Use consistent hashing for easier rebalancing

Monitor shard sizes and rebalance proactively

Consider managed sharding solutions (Vitess, Citus)

Interview Tips

What Interviewers Look For

  • Explain horizontal vs vertical partitioning

  • Discuss shard key selection: must distribute evenly, align with query patterns

  • Mention challenges: cross-shard queries, joins, transactions

  • Talk about sharding strategies: hash-based, range-based, geographic

  • Explain consistent hashing for rebalancing

  • Discuss when to shard: when single DB can't handle load (typically 100s of GBs or millions of QPS)

  • Mention alternatives: scale vertically first, use read replicas, then shard as last resort

AI Tutor

Ask about the topic

Sign in Required

Please sign in to use the AI tutor

Sign In
Sharding & Partitioning - Module 5: Scaling the Database | System Design | Revise Algo