Spectre<_ INDEX
// PUBLISHED18.04.26
// TIME10 MINS
// TAGS
#SYSTEM DESIGN#HIGH TRAFFIC#BACKEND ARCHITECTURE#DATABASE DESIGN
// AUTHOR
Spectre Command

M

ost startups don't need database sharding. There. That's the most useful thing this post can tell you upfront.

But the question of when you do need it — and what it actually costs to implement — is worth understanding before you hit the wall, not after. Because by the time sharding becomes urgent, you're usually operating under pressure, and pressure is a terrible time to make irreversible architectural decisions.

Here's what database sharding is, how it works, and the honest framework for deciding whether it belongs in your near-term roadmap.

What Database Sharding Actually Is

A database shard is a horizontal partition of your data. Instead of one database holding all your rows, you split the dataset across multiple database instances — each instance (a "shard") holding a subset of the data.

The key word is horizontal. Sharding is not the same as replication, where you copy the same data to multiple servers for read scaling or redundancy. In sharding, each record lives in exactly one shard. The total dataset is distributed, not duplicated.

The mechanism that makes this work is the shard key — the field you use to determine which shard a given record belongs to. A common example: shard by user_id. Users 1–1,000,000 go to Shard A. Users 1,000,001–2,000,000 go to Shard B. Your application (or a routing layer) knows which shard to query for a given user.

Simple in concept. Genuinely complex in practice.

[→ Read: How to build a backend that scales from 100 to 10 million users]

The Problem Sharding Solves — and Why Most Teams Don't Have It Yet

Sharding exists to solve one specific problem: a single database server that can no longer handle your write volume or data volume, and where vertical scaling (bigger hardware) is no longer a viable or cost-effective option.

Read scaling is a different problem with different solutions. If you have heavy read load, read replicas solve that cleanly and at a fraction of the operational complexity. Connection pooling, query optimisation, caching layers like — these address most database performance issues that startups encounter at early to mid-scale.

A useful heuristic: if your database is struggling, sharding is probably not the first answer. It's usually the last answer after you've exhausted the simpler ones.

The rough sequence most production systems follow before sharding becomes necessary:

  1. Query optimisation and proper indexing — this alone fixes the majority of "the database is slow" problems we see in audit engagements.
  2. A caching layer for frequently-read data.
  3. Read replicas to offload read traffic from the primary.
  4. Connection pooling.
  5. Vertical scaling (larger instance, more RAM, faster disks).
  6. And only after all of the above — sharding for write-heavy workloads that have outgrown a single primary.

Most startups are stuck somewhere between step one and step three. Sharding is step six.

The Real Cost of Sharding

This is where the blog posts usually get dishonest. They explain how sharding works, show you the architecture diagrams, and stop there. The operational reality deserves more attention.

Cross-shard queries become painful. If your shard key is user_id and you need to run an analytics query across all users — say, "show me all orders placed in the last 24 hours across all regions" — you now have to query every shard and aggregate the results in your application layer. Either that, or you maintain a separate analytics database that aggregates across shards. Neither is free.

Transactions get complicated. In a single database, ACID transactions are straightforward. Across shards, any operation that touches records in two different shards requires a distributed transaction, which is a significantly harder problem to solve correctly. In practice, most teams redesign their data model to avoid cross-shard transactions rather than implement them — which is often the right call, but it constrains how you can model your domain.

Rebalancing is not painless. Your shards will not stay balanced forever. If you shard by user_id range and your power users are all in the upper ID range, one shard gets hammered while the others sit idle — a "hot shard" problem. Fixing it means rebalancing data across shards while the system is live. That's a non-trivial operational exercise.

Schema migrations get harder. Running a migration on one database is already a careful process on a production system. Running coordinated migrations across twelve shards, ensuring they complete consistently, is a different category of problem.

Your application layer has to know about it. Unlike read replicas (which are often transparent to the application), sharding usually requires the application to participate in routing decisions. That's code your team now owns and maintains.

None of this means sharding is the wrong choice when you actually need it. Tokopedia, Gojek, and Traveloka all run sharded databases at scale because they have the traffic and data volumes that genuinely require it. But they also have dedicated platform engineering teams managing that infrastructure. That context matters.

[→ Read: Monolith vs modular monolith vs microservices: the honest decision framework]

Sharding Strategies: The Four Main Approaches

When sharding is the right call, how you shard matters as much as whether you shard. There are four primary strategies, and each has a different set of trade-offs.

Range-based sharding partitions data by a continuous range of the shard key value — user IDs 1 to 1M on Shard A, 1M to 2M on Shard B, and so on. It's simple to understand and implement, but vulnerable to the hot shard problem if your data isn't evenly distributed across the range.

Hash-based sharding applies a hash function to the shard key and uses the result to determine placement. This distributes data more evenly, which reduces hot shards, but it destroys range locality — you can no longer efficiently query "all users with IDs between X and Y" because those records are now scattered.

Directory-based sharding maintains a lookup table that maps shard keys to shards. This is the most flexible approach and allows you to rebalance shards without changing your hashing logic. The trade-off is the lookup table itself becomes a dependency — a bottleneck and a single point of failure if not handled carefully.

Geographic sharding partitions data by region — Southeast Asian users on one cluster, Australian users on another. This is particularly relevant for companies operating across multiple markets with data residency requirements. Indonesia's data localisation regulations under Government Regulation No. 71 of 2019 (PP 71/2019) require certain categories of personal data to be stored on infrastructure physically located in Indonesia. Geographic sharding can be part of how you comply with that, though the regulatory picture is more nuanced than just where the database sits.

When Your Startup Actually Needs to Start Thinking About Sharding

Specific signals matter more than vague thresholds, but here are the concrete ones worth paying attention to.

Your write throughput has exceeded what a single primary can handle even after connection pooling and hardware upgrades. You're seeing consistent replication lag on your read replicas that's impacting user experience. Your largest tables have grown past the point where a single-server B-tree index can serve queries within acceptable latency. Your data volume is approaching the practical storage limits of a single instance and vertical scaling costs have become disproportionate.

In terms of rough order of magnitude: for most well-optimised or setups on decent hardware, you can handle tens of thousands of write transactions per second before you genuinely exhaust single-node capacity. Many startups that feel they need sharding are running at a fraction of that — and their actual problem is unoptimised queries, missing indexes, or unnecessary write amplification in their application code.

A practical test: before pursuing sharding, run a proper database performance audit. Look at your slow query log. Examine your write patterns. Check whether your schema design is creating unnecessary lock contention. We've worked with teams who were convinced they needed sharding and found, after a structured audit, that three index changes and a query rewrite cut their database load by 60 percent. That bought them 18 months of headroom without touching the architecture.

[→ Read: How to run a technical debt audit]

A Concrete Example: Sharding Decision for a Payments Platform

Consider a fintech startup processing payments across Indonesia — peer-to-peer transfers, bill payments, e-wallet top-ups. They come to us at around 500,000 active users and 200,000 transactions per day, worried about whether their PostgreSQL single-node setup will survive projected growth.

At 200,000 transactions per day, they're writing roughly 2–3 records per transaction (the transaction record, a ledger entry, a notification event). That's 400,000–600,000 writes per day, which averages to under 10 writes per second. A well-configured PostgreSQL instance can comfortably handle 5,000–10,000 writes per second. They have two to three orders of magnitude of headroom.

The right conversation isn't sharding — it's ensuring their indexes are correct, their connection pooling is configured properly, and they have a read replica absorbing their reporting queries. That architecture will take them past 5 million users without fundamental change.

Now imagine they've grown to 5 million active users and are processing 50 million transactions per day — the kind of volume GoPay was handling in its growth phase. At that scale, write throughput genuinely becomes a single-node constraint, and the case for sharding, probably by user_id with a consistent hash, becomes real and defensible.

The architecture decision should follow the traffic, not anticipate it by three years.

FAQ

Q: What's the difference between sharding and partitioning?

A: Partitioning is typically done within a single database instance — PostgreSQL table partitioning, for example, splits one logical table into physical sub-tables on the same server. It improves query performance and manageability but doesn't distribute load across multiple servers. Sharding distributes data across multiple separate database instances. Partitioning is often a useful step before sharding and can buy you significant headroom on its own.

Q: Can managed databases like Amazon RDS or Google Cloud SQL handle sharding for me?

A: Not automatically, no. RDS and Cloud SQL manage replication, backups, failover, and vertical scaling, but they don't shard your data across instances on your behalf. Amazon Aurora has some features that push in this direction for read scaling, and Google Spanner is a distributed database that handles horizontal scaling transparently — but Spanner is a different product category with different cost and complexity trade-offs. For most startups, managed databases like RDS are the right choice well before sharding is relevant.

Q: Is MongoDB or Cassandra easier to shard than PostgreSQL?

A: MongoDB and Cassandra have sharding (or in Cassandra's case, distributed architecture) built into their core design. PostgreSQL and MySQL require more explicit work to shard, whether through application-level routing, Citus, or tools like Vitess. That said, "easier to shard" shouldn't drive your database choice. The database that fits your data model and query patterns is more important than one that theoretically scales more easily — because most teams never reach the scale where sharding is necessary regardless of which database they chose.

Q: If we shard now while we're small, won't that make scaling easier later?

A: This is the most common trap. Implementing sharding before you need it adds immediate complexity, slows down development, and optimises for a future scale problem that may never materialise in the form you anticipated. Your shard key choice may turn out to be wrong for your actual access patterns — and changing a shard key on a live system is painful. Build with clean boundaries and a data model that could accommodate sharding later. Don't implement the sharding itself until the signals are there.

Q: We're a non-technical founder. How do we know if our CTO is recommending sharding prematurely?

A: Ask two questions. First: what have we already tried before reaching this conclusion? The answer should include read replicas, caching, query optimisation, and vertical scaling. If sharding is being proposed as a first response to performance issues, that's a flag. Second: what's our current write throughput and how does it compare to the limits of our current setup? If the answer is vague, push for numbers. Real performance problems have measurable symptoms.


Getting the database architecture wrong in either direction is expensive — too early and you're carrying operational complexity that slows your team down; too late and you're doing emergency architecture work under production pressure. The honest answer is that most startups reading this are further from needing sharding than they think, and the simpler scaling levers are worth pulling first.

When you do hit genuine write-scale constraints, the decision about how to shard — and what to extract into separate data stores — is one worth getting external perspective on before committing. The choices you make at that stage are difficult to unwind.

External Documentation:

// END_OF_LOGSPECTRE_SYSTEMS_V1

Need this architecture?

We deploy elite engineering squads to materialize this vision.

Initialize Sequence