Commonly one of the first things one does when learning a new language and using a new framework is to build a basic CRUD (Create, Read, Update, Delete) app. This is to me a proof point of the universal yin/yang pairing of app + data that underpins every app.
Many decades ago, back in the mists of time a realization came about that decoupling the data from the app had a lot of upside. It meant multiple apps could access the same data. It meant that app iterations could be decoupled from persistence and therefore could accelerate iteration velocity. Frankly, it also made code simpler – because you could count on something else to take care of atomicity, consistency, integrity, and durability of data – which is a hard technical problem. It’s also a problem that should be solved once, and not by the devs of every single app. Thus was born SQL – and it’s durability throughout so much change highlights that those ideas were good ones.
But, over time – there has been an explosion of complexity in the data ecosystem, and over and over, attempts to push the complexity back up to the dev. Which is weird, when you think about it.
Why is this? My PoV:
- New app design patterns drive an emergence of databases that suit the pattern. Apps that were monoliths loved monolithic databases (think Oracle/SQL Server/Postgres). Event-driven apps love streaming databases (think Kafka). Web services using RESTful APIs led to the emergence of Graph databases. Microservices app patterns led to an explosion of simple keyvalue stores like Redis (and streaming too). The emergence of node.js more modern client/server apps and the explosion of MongoDB and document databases doesn’t seem to be a concidence.
- Developer have a serious case of FOMO. Amongst the devs I know and love, they are always looking for “what is the new thing?” I think this is a function of their deep inherent curiosity (a characteristic that is self-selected for in the dev community) and less positively an insecurity about the new dev generation coming up behind them – and wanting to stay current. This means that often they are trying and experimenting with new databases as they are playing with new frameworks.
The difficulty that gets constantly re-learned is data and databases have a “mass” that is part of their nature. This is the story that plays out over and over:
- You start building a new app, with a particular pattern – and almost arbitrarily pick a database based on fomo, app pattern fit, or just because it’s what you have experience with.
- The app is a success of a failure. If it’s a failure, it doesn’t matter – as it’s gone the way of the dodo, so let’s assume it’s a success.
- The first part of success is that the app doesn’t live on a private instance or on a laptop, but is now a production system – regardless of where it runs. That means some rules, some governance and almost always some separation of duties (people who operationalize data platforms, app platforms are often not the same person).
- The app lives on, starts getting used by more people, more APIs, more apps. that all means that the data grows, and grows – and the mass starts to accumulate. You can iterate on the app (regardless of it’s architecture), but the data – it just keeps growing.
- The performance needs keep getting bigger…. and Bigger…. and BIGGER.
- Additional demands start to appear that were not even a glimmer in the eye of the original MVP. People and apps want insight from the data that were never anticipated. This is good, but also horrifying – because these demands often are oppositional to the goals of the app in the first place.
Then things start getting messy.
- The original app / data choice was almost arbitrary. That doesn’t translate into “ignorant” or “wrong” – but it was impossible to predict success, so adapting to the new demands is the winning strategy, not trying to “out think them”. So, people start to adapt.
- To deal with performance, all sorts of hacks start to appear. Caching layers appear – which help in some cases but never really solve the problem. Sharding (argh) appears – but you can never get the primary key right, it’s only a question of how wrong are you? Then inevitably there’s the pain associated with rebalancing. And – all these break the most important original expectation of the data layer – that it honors the abstraction principles. If originally ACID behaviors were expected, breaking this means that the app code gets completely overloaded by dealing with the lack of consistency that the performance hacks (sharding/caching) create. Argh. What was once nice and clean is now a mess.
- The new requirements/APIs/Users drive are a delight and a horror show. To deal with that, all sorts of hacks start to appear. Common examples I see are:
- Beautiful event streaming patterns that worked great for the initial MVP but where all of a sudden there is a need to bring very structured relational data into the picture. At the core of streaming databases they are a form of a key value store with no relational model. So what seems to happen is a messy union. The traditional relational SQL database gets swamped by the stream, so there needs to be filters, data pipelines to trim it down. Dammit, why can’t we just have a relational SQL database that can reason over the entire stream? And now the iteration velocity of the original app is delayed. Why – shouldn’t the concerns be decoupled? Yeah, but they never are. What was once nice and clean is now a mess.
- The first phases of “I need to analyze/integrate these data” starts with querying the system of record directly (which can be an old monolithic database, a modern document database, distributed relational database – whatever) but that quickly runs into “STOP! What you’re doing is affecting the application behavior itself”, so then the inevitable journey of ETL/data pipelines to datalakes and datawarehouses begins. It’s inevitable because it works. At least to the extent that it buffers the system of record and the app, but NOT to the extent that the data in the datalake/datawarehouse is definitionally “old”. You can now see all the datalakes/datawarehouses working on making this better (almost universally with some high throughput caching mechanisms). And the dev now has to deal with a ton of data interfaces where at the start there was one. What was once nice and clean is now a mess.
These are just a couple of examples of how a data layer that was once simple, clean gets messy. That in turn pushes complexity up into the app part of the stack and back to the dev.
As we started development on Regatta, it started by asking and answering the question that underpins this pattern: “why are there so many database architectures? Are the design triggers for forking so wildly really intrinsic?”
What if you had a database that was relational with strong ACID properties and normal semantics and SQL interfaces, but that could also scale out without a long list of caveats? What if it could have the ingress and scaling behavior of key value stores and event databases – but without abandoning SQL behaviors
What if that same database could handle unstructured data as well as it does structured data?
What if that same database had radically new consensus and concurrency mechanisms that meant that complex analytical queries didn’t interfere with rapid inserts and updates?
What at the underpinning algorithms are the things that are stopping that from happening?
That’s what we believe we have solved with Regatta. This is an ambitious target but the dream is worth it. The dream is that this new category of database system – OLxP Databases – adapt to the changing needs of the app and the dev. That in turn means simplification, cleaner code, less infrastructure and cost. Usually when people span what are distinct “best of breed” domains today – it means being “average” across those domains. That’s not the case here. We think Regatta will be a best of breed transactional distributed relational database at the same time that it will set records for key analytical workloads.
Skepticism is healthy – particularly when we are aiming to solve problems that are viewed as intrinsic. The best way for us to prove it to you is to sign up for our Early Access Program, and put us through our paces!