Sharding - some dirty little secrets
10/22/2024

Boaz Palgi
Regatta CEO and Co-Founder
10/22/2024

Boaz Palgi
Regatta CEO and Co-Founder

A few years ago, Erez Webman and I wrote a white paper describing the dirty little secrets of scale-out sharding. Scale-out sharding as a technique to try to cope with the limitations of single-node writer database systems is in wide use, although it has many problematic issues. In this blog, I will offer a summary of the issues that we covered in that whitepaper, and encourage you to read the full whitepaper for a more comprehensive treatise.
Large data sets or high throughput applications challenge any single-node database. Large capacity exceeds the server’s storage. Heavy workloads exhaust CPU capacity, I/O resources, or memory (RAM). Upgrading servers with more storage, memory, and faster CPU might solve this challenge. However, this is expensive, and there is a limit to the capacity or workload that a single server can handle. The contemporary solution is often scale-out sharding.
Large data sets or high throughput applications challenge any single-node database. Large capacity exceeds the server’s storage. Heavy workloads exhaust CPU capacity, I/O resources, or memory (RAM). Upgrading servers with more storage, memory, and faster CPU might solve this challenge. However, this is expensive, and there is a limit to the capacity or workload that a single server can handle. The contemporary solution is often scale-out sharding.
With scale-out sharding, data is divided across multiple servers (“shards”) in a cluster according to a sharding criterion. Each server operates as an independent database that can easily handle its part of the total data.
However, scale-out sharding requires the application code to be well-aware of the exact data placement across the local independent database servers (shards) such that it will be able to manage the workload across those shards. Sharding breaks developers’ expectations around serializability, externalizable behaviors and overall ACID system attributes.
If you are building an application, you do not want to be in the business of designing and developing concurrency control and durability yourself. Having to implement ACID and other data integrity guarantees in the application code will slurp up a significant amount of development and financial resources. Attempts to maintain ACID-like guarantees at the application layer are known to be hard or even impossible, and in any case complex, painful, and usually buggy. Even experienced developers agree that it is much easier to rely on the database than to attempt to achieve ACID at the application level.
Contrary to what is sometimes popular belief, scale-out sharding configurations typically do not offer the notion of a large, single, database. Scale-out sharding does not try to align many nodes to act like one big database. Rather, each node operates as an independent sub-database.
As a result, some basic database functionality that can be expected from a single node database cannot be achieved in multi-node sharded databases. For example, cross-shard-boundaries transactions and analytics operations may not work, or not work well. In addition, sharded databases may suffer other issues including, for example, per-node performance bottlenecks, inefficient workload execution, and even volume-skews surpassing nodes’ available capacity. Changing sharded database cluster configurations, like adding or removing servers, is generally challenging, and requires significant modifications that developers inherently must make in the application code.
In the Dirty Little Secrets of Scale-out Sharding whitepaper, Erez and I dive deeper into these, and various related, topics:
If data accuracy of database operations is important, then the database should guarantee ACID. Some developers might believe that it surely is possible to build applications on modern databases that do not guarantee ACID — writing code and logic that somehow guarantees correctness and ordering of operations at the application level. Of course, this is not necessarily possible, and inevitably introduces tradeoffs and complexity in the process and the application logic. This is known to be hard or even impossible, and in any case complex, painful, and usually buggy.
Why are multi-nodes databases challenged to guarantee strong ACID?
In this section of the whitepaper Erez and I analyze several examples of the impact of executing queries on scale-out sharding distributed databases. We review why application developers need to do a lot of hard work and to write a lot of complex and error-prone code in the application to attempt to realize the execution of queries. We also illustrate why in many cases writing code for the execution of such queries at the application level is simply not possible at all.
Regatta frees developers from the burden of dealing with the underlying data infrastructure challenges, and allows developers to focus for 100% on the business logic.
You can download the whitepaper here.

Things are moving fast. The market has shifted from "should we build agents" to "we are building and will...

The legacy data stack can’t keep up Databases are architected for one thing. You pick Postgres for...

SUMMARY: The Shift to Agentic AI is real: Moving beyond basic LLMs, the Agentic Era focuses on AI systems that can...