Optimized Data Layouts

Disk I/Os are a major cause of database performance issues. While disks are dramatically slower than RAM, each single access to disk represents a much larger chunk of data than a single access to RAM. As a result, the database’s data layout – the persistent data structure that the database uses for storing table data on disk – is a critical factor determining the database performance. Most databases are built on one type of data layout. Unfortunately, there is no single data layout that could optimally fit all.

Generally, the optimal data layout depends on the disk media type (i.e., magnetic HDD or flash SSD), the type of the data stored and the nature of the workload. Therefore, “single layout” databases tend to perform well with a limited set of combinations of specific media, type of data, and workloads, and under-perform elsewhere. For example, some relational databases with data layouts that are based on slotted-pages may seriously under-perform when rows are larger than a few KB – and could get even worse if certain scenarios where variable-size rows are frequently updated. Some modern databases that use a variation of LSM Tree for their data layout may work fine with some workloads on magnetic media. However, running those LSM-based data layouts on flash media may often not deliver the performance improvements expected from those faster and more expensive disks.

In contrast, Regatta incorporates multiple different data-layouts that perform optimally with a wide range of types of data, media-types and workloads. Therefore, Regatta, as a single database, supports various constellations including fixed-size small rows, variable-size rows, high-ingress time series workloads, ad-hoc analytics-ready data, and so forth. In addition, Regatta stores data in a media-type aware manner. That means that Regatta achieves maximum efficiency and the highest performance whether data is stored on magnetic disk or on flash media.

Traditional databases were developed when the only persistent media for online data storage was magnetic disk (HDD). In the past decade, flash media (SSD) has gained more market share, especially in workloads that require high I/O performance. The common, somewhat oversimplified market notion is that “flash SSD is a faster and more expensive disk than HDD”. The reality is more complex. HDDs and SSDs are dramatically different. SSD can easily perform tens of thousands of small random-access I/Os per second (IOPs), completely outperforming HDDs that may realize around 100 IOPs. On the other hand, for workloads requiring large and/or sequential I/Os, HDD can effortlessly reach SSD-like performance at a fraction of the costs of SSD. That is essentially the reason why HDDs are not going to disappear anytime soon.

The latency of magnetic media I/O almost does not change with the size of that I/O. Thus, data layouts that try to optimize for characteristics of magnetic media can sometimes achieve better performance by trading off small random-access I/O patterns for fewer large I/Os, at the cost of accessing more overall data. Ironically, since SSD are sensitive to I/O size, applying these types of HDD-optimized data-layouts to flash media causes the SSD to unfavorably access more overall data, which may meaningfully degrade the SSD’s I/O performance.

On the other hand, flash-optimized data layouts can leverage flash media to perform fast, small and concurrent random-access I/Os. While that works well on flash, that class of data layouts would translate horribly to magnetic media. Clearly different data-layout strategies are required for magnetic and flash media.

Unlike many databases that, regardless of the underlying media, rely on a single HDD-optimized type of data-layout, Regatta supports data-layouts that were especially designed and optimized for flash media. If installed in the servers, Regatta leverages persistent-RAM solutions (e.g., Optane Memory or other NVRAM solutions) for accelerating the work with the disk even further. Additionally, Regatta is architected to support the use of Optane Memory as storage media with NVRAM-optimized data-layouts.

Not only does Regatta optimize for performance, it also minimizes the use of scarce disk I/O resources. To further optimize I/O resource consumption, Regatta works directly with the raw disks, bypassing the various file-system I/O overheads. The choice of optimal data-layout also depends on the workload accessing the data. For instance, column-stores can provide high performance analytics workloads that rely on scans, but may perform miserably as a target for OLTP operations. IOT/time-series workloads, where potentially a large number of events is streamed into the database, should be stored in a spatial locality optimized manner, since consecutive events (e.g., of an IOT sensor) are likely to be accessed together.

As mentioned, most databases rely on a single type of data layout, and therefore support some workloads better than others. Unfortunately, most businesses have multiple types of data and different workloads. Running these workloads in a single database that has only one type of data layout would be far from optimal in terms of performance, and would often be simply impossible. As a result, most organizations have to deploy and maintain a range of different point-solution “siloed” databases. Unfortunately, this approach introduces a lot of complexity and costs. For example, it is generally impossible to perform coherent operations – whether analytical or transactional – across siloed databases, even though data in one database often has relations with data that reside in other siloed databases. Overcoming these limitations – if even possible – is typically hard. For instance, it is operationally complex and expensive to ETL data from multiple databases into yet another siloed analytical database, and typically takes a lot of time. Therefore, such analytics cannot take place on the most up-to-date data.

Regatta utilizes various data-layouts to support different types of workloads. Regatta can represent tables as row-stores or column-stores. Furthermore, it provides extensive support for “sequential” workloads that are typical for IOT. Different workloads can co-exist in the same database, eliminating silos. Different data-pieces can relate to each other, regardless of whether they are stored as row-store, column-store, sequential-store, etc. Any of Regatta’s functionality, whether analytical or transactional, works on the entire data, regardless of how it is represented and what mixture of data layouts is required.

The distinction among data-representations (e.g., column-store, row-store, sequential-store) can be made with granularity of a partition. For instance, Regatta can maintain a table in which some partitions are row-stores while others are column-stores. Furthermore, Regatta’s Fractured Mirrors functionality allows different simultaneous representations of the same table (or partition). For example, it is possible to define a table with a row-store representation distributed across one set of nodes (node-pool1) and to simultaneously represent the same table as a column-store, distributed across another set of nodes (node-pool2) – and to run an OLTP workload on the row-store version in node-pool1, and analytics on the column-store version of that table in node-pool2. Since these workloads are segregated, the analytics activities will not impact the transactional ones. Regatta’s unique column-store mechanisms allow to maintain a completely up-to-date (column-oriented) mirror on node-pool2, while the OLTP workload on node-pool1 does not pay any penalty for updating the column-store.

Most relational databases can store small BLOBs. It is usually inefficient or impossible to store larger BLOBs in the database, and thus these must generally be stored externally to the database. Unlike other databases, Regatta can optimally store small (KBs), medium (MBs), large (GBs) and huge (100GBs and even TBs) BLOBs as an integral part of the database. While BLOBs can appear anywhere in the data-model, Regatta stores the BLOBs (but the smallest ones) using separate optimized data-layouts that guarantee optimal access performance. The way in which these BLOBs are stored depends on various factors, such as their size and the media-type used for their storage.

While Regatta’s underlying data-layout technology is highly sophisticated, these complexities are completely hidden from the user who can optionally instruct his needs in very simple policy terms.