This article is about the next evolutionary step in the development of data processing systems. The topic is ambitious, so I will first tell you a little about myself. For more than 10 years I have been working on projects in the field of CRDT and data synchronization. During this time, he managed to work for universities, ycombinator startups and well-known international companies. My project for the last three years is Replicated Object Notation, a new data representation format that combines the capabilities of object notation (like JSON or YAML), network Protocol, and oplog/binlog. You may have heard about other projects working in the same direction, such as Datanet, Automerge, and others. You could also read Local-first software, which is the most comprehensive Manifesto of this area of Computer Science. The authors are a wonderful team of Ink & Switch, including M. Kleppmann, who is widely known to us from The “book with a Boar”. Or you may have listened to my presentations on this topic at various conferences.
The ideas of this article echo what Pat Helland has been writing in recent years: Immutability Changes Everything, etc. They are adjacent to the IPFS and DAT projects that I am related to.
So. Classic databases are built on a linear operation log (WAL). Transactions are built from this log, and master-slave replication is built from it. The theory of replication with a linear log was written in the early 1980s with the participation of the notorious L. Lamport. In classic legacy systems with one large Central database, all this works well. This is how Oracle, Postresql, MySQL, DB2 and other classic SQL databases work. This is how many key-value databases work, for example, LevelDB/RocksDB.
But linearization does not scale. When the system becomes distributed, it all starts to break down. Figuratively speaking, the linear system is something like the Greek phalanx. It is necessary that everything goes smoothly, and for this it is good that the ground is smooth everywhere. This is not always the case: somewhere the electricity is turned off, and somewhere the network is slow. Although in the Google Spanner system it was shown that with a sufficiently large budget, the earth can be made flat absolutely everywhere, we still note that Google also sometimes turns off completely for completely ridiculous reasons.
Of the Russians, this thesis was developed by Bartunov in his reports. To scale such a system, an asynchronous communication mechanism is added. A message queue, for example, is currently a standard solution. Or an intermediate option, when the linear Kafka is placed in the center as a master.
Of the popular systems, Cassandra is a notable exception. Due to the rollback to last-write-wins consistency, it can work completely anarchically, that is, without taking into account the order of writing. This is great for storing weakly structured data, such as address books from iPhones. Apple is very fond of Cassandra.
But there is one interesting level between linearization and total anarchy: Causal consistency, or causal integrity. This is roughly the same as happened-before in Computer Science or Minkowski geometry in physics. This is the cone of the past, the cone of the future, and the transitivity of cause-and-effect relationships. This is the most rigorous model that is still consistent with the physics of distributed systems. Unlike linearization, it allows for parallel events that do not affect each other. At the same time, it also provides for strict order – where it makes sense.
The only thing is that database theory has long ignored this remarkable level of consistency, and its full capabilities began to be revealed only with the advent of CRDT (Conflict-Free Replicated Data Types). This theory implies that each data structure exists in multiple replicas. The connection between replicas is not always there, so changes sometimes have to be checked. Unlike git, CRDT data structures can always be stored automatically. Otherwise, git is a very good example for explaining the properties of CRDT repositories:
RON is the most convenient notation for representing CRDT objects. There are 4 types of atoms in RON: UUID, INT, STRING, FLOAT. A set of atoms is called a RON tuple. A RON operation is a tuple with a pair of UUID metadata. In this pair, one UUID is the operation’s own ID, and the second one points to another previously created operation. Thanks to these IDs and links, you can collect any data structures from operations. Just like the LEGO pieces, if they were still numbered, so that nothing would get mixed up.
Continuing the analogy with git, in RON the very concept of branches / branches is generalized to such an extent that almost everything consists of branches. RON itself is built on operations, although it pretends to be object notation with all its might. Therefore, at the heart of any replica is a log of operations, as in a normal database. However, this log is partially ordered by happened-before, and the order of operations may differ between different replicas. From some point in the log, we can branch a new version, a new branch. In the same way, we can “graft” another branch to our own (get angry). In this scheme, both the database and the branch look the same as branches. And we can measure them equally. Whether it’s a different version of the data or a different set of data, it’s still a branch of the oplog. For example, if the reference book “exchange rate” is included in our database, it will be a separate small branch that we constantly add to our database.
Each branch is a partially ordered set of operations. We can perform all the usual operations on these sets. Merge is a Union; common ancestor is an intersection; diff is an XOR. It turns out that the database can look like a matryoshka doll, because nesting is possible. A database can be a mix of different databases or an amendment (branch) of another database. Or all of it together. The connection between replicas, which is important, is not lost, i.e. everything can be moved back to the original if desired. Algebraically!
The only problem is to make it all work fast enough and not take up much space. In particular, in addition to the text form, RON has more efficient binary variants RONv and RONp for this purpose. But the database is not limited to the oplog: it is built on it, like a house on a Foundation.
And now the main question – why? Why develop new databases on new principles when the old ones seem to work fine?
First, data networks provide connectivity without centralization. I will explain with the example of medical databases. Let’s say you were unlucky enough to ride an ambulance in St. Petersburg, and your medical documents in Yekaterinburg. It happens. Of course, I would like the documents to be synchronized in real time-both now and later, when you return to Yekaterinburg.
Of course, it is possible to create a single Central medical database the name of Deputy Spring. But what to do when some kind of failure occurs? Stop the medicine? When Google or Amazon go offline, a lot of things stop working. And then, the Central database will certainly leak to the public one day. Of course, it is interesting to read how Alexey Navalny was poisoned three times with chemical weapons by FSB agents. But I am not so immortal and I would like to see more reliable and secure information systems in healthcare.
The second aspect is local availability. Even Google or Amazon sometimes go offline. If the data is located on the local network or directly on the device, it will no longer be available only if the network and device break down. And then everything will stop working anyway. Also, the synchronized replica on the device will work in the field, and in the taiga, and devils where in the industrial zone. This is relevant for industrial applications.
The third aspect is collaboration. Thanks to automatic merges, CRDT is currently the most convenient technology for implementing collaborative applications. In the era of coronavirus, this is more than relevant!
The fourth aspect is reasonable consistency requirements. Anarchy in the style of Cassandra is, of course, brute force. But the linearity of the log, even in financial systems, is of limited value. As shown in the ACIDRain study, in real ACID/SQL systems, linearization is now used as a fetish. In fact, the transaction isolation settings used allow for various exploitable anomalies, and in fact security is implemented by other methods. And if business logic speaks to the database via RPC, then there is nothing to talk about at all.
The fifth aspect is data integrity. If we think about how data integrity is guaranteed in a classic blockchain, we will understand that this is a simple vote (by PoW or PoS). Global cross-signature and massive anchoring are implemented in data networks. It’s like in git, but only globally. These tools are much more powerful than PoW. To date, only the Linux kernel has a comparable degree of protection. If you think about it, in the era of deep fake, it may become impossible to distinguish reality from illusion without the help of cryptography. We have become too dependent on electronic means of communication. But this statement, of course, deserves a separate article.
To sum up.
In the modern world, there are not many different databases, but a lot of them, and we critically depend on them. RON allows you to combine data into a single live network, without pulling them into a single data center and without creating a single point of failure (SPoF). These are the main advantages of centralization, without the characteristic accompanying disadvantages.
Storage and transmission, DB and network are two sides of the same coin, and we need the whole medal, not the sides.
If you are interested in this topic, follow the RON project (ru, en), there will soon be a release of documentation for a new version. Interestingly, the documentation is prepared in the Darwin version control system, which runs on THE Ron oplog.