Databases & Data Storage
Sharding
Definition
Sharding is a type of database partitioning that separates one table’s rows into multiple different tables, known as partitions or shards. Each shard has the same schema, but a different subset of the data. Shards are spread across multiple servers.
Why It Matters
Sharding is a method for horizontal scaling ("scaling out"). It allows a database to grow beyond the limits of a single server by distributing the data and the query load across many machines. It is a key technique for building massively scalable applications.
Contextual Example
A social media app with billions of users might shard its `Users` table by country. All users from the US are on one set of servers, all users from Germany on another, etc. This distributes the data and allows the system to scale globally.
Common Misunderstandings
- Sharding is different from replication. Replication copies data to provide redundancy, while sharding splits up data to provide scalability.
- Sharding adds significant complexity to an application, as the application logic must know how to find which shard contains the data it needs.