Cloud storage consensus: the early bird technique
Consensus is one of the complex pieces of a distributed system. Consensus is what leads to consistency. SStorage uses a peculiar technique that is partially decentralized which I am going to demonstrate and explain here.
Why? Because it is encrypted cloud
The fact that SStorage is an end-to-end encrypted cloud was an obstacle to lots of things. Algorithms that rely on the client and server being able to decide the shortest way to apply an operation are not applicable. So how to do it? I took inspiration from multi-level cache CPUs:
- Data is written to the lower media atomically using compare-and-swap, which is actually implemented as a compare-copy-on-write.
- Operations that overlap are cancelled, only leaving the one that is first in sequence.
- The operations need now to be retried with the new state.
This means that the storage is lock-free, a nice property for a secure cloud storage. It also means that it is possible to move back to a previous state of the data. It also means that small changes are far faster and more likely to succeed than large ones, particularly on contested data.
How? Using the d_system
The d_system
is here use to assign a consistent ordering: each node of the system will assign very close timestamps, but cannot assign ever the same timestamp. This property is used to check if, within a synchronisation frame of the system, operations overlapped.
The system provides for a way to process longer commits using a multi-stage approach that is based on the same premises. Instead of being validated immediately, the data validation is delayed to the last stage of the commit. This allows to push several changes up to a large size into the cluster.
In case of overlap, the first in the frame is accepted always accepted. This is what allows to keep the data safe in our privacy centered cloud storage.
The time-frame is determined by a more classic consensus that is calculated out of line with the rest of the storage system. The time frame is composed for each node of an offset to the master node and a confidence interval from the system. The time frame computation also acts as a heartbeat for the system: failing to answer synchronization queries will result in the pruning of the node from the tree.
Conclusion
I will publish later more articles on how I used the d_system
to make a complete end-to-end cloud storage for privacy centered use. In particular the features related to using this lock-free set of primitives for implementing file-systems.
I invite you to check the information at https://nekoit.xyz/, join us on Discord or Telegram, or follow me on Mastodon Archivist@social.linux.pizza to receive notifications of my followup posts.