C++ is one of the most powerful programming languages in its semantics. The language is so capable that with a little amount of assembly it is nearly capable of any feat in computer science. But some things are not included in the standard library. Networking, the basis of most distributed systems, is for one, not included. Asynchronous operations are not well supported either.
The pylons of cloud storage and security
Several things are critical for a good framework for distributed computing.
- A good grasp on networking as well as the ability to abstract it.
- Interchange formats to exchange information between computers
Likewise, for security, one needs:
- Data cleaning tools
- A way to ensure that data that needs cleaning is cleaned
This is the bare minimum for a system to maintain data privacy in the cloud is to have those points correct. But to also be able to safely store data, it needs to implement algorithms and procedures to manipulate data.
For that, we made as a framework the gplib, the general purpose freestanding library for unix system development.
Freestanding means that it bears very minimal binding to the traditional standard library, namely, it shares the trait library as well as the concept library. The goal is to free the development style from the styles of the library, allowing for a more flexible way to handle threads, tasks and operations in the system.
An end-to-end encrypted cloud storage, having an open-source client side, needs to have predictable code, so this also removes dependencies from the code. Making the analysis of security and logic flaws easier. It also needs to encompass its cryptographic primitives and algorithms.
The gplib uses green threads, implemented using stackful coroutines. Those green threads are scheduled using a provided scheme and are associated with the filesystem in a class that manages the entire system.
The system is a computer wide piece. Data in gplib in represented using the normal native properties of the system. Files are accessed using the normal
file_system API. It is the piece of the distribution engine that manages async and local processes.
Nothing in the system is taken out of it without going through filesystem calls or
d_system transfers. This is important for data privacy to differenciate what is in the system and what is on the larger bounds of the system.
system is used on the client side, it is connected to the
d_system by a discriminated connection.
d_system is the part of the system that is actually defined as cloud: everything in here communicates by CBOR, the CBOR API is also what transmits the filesystem mappings and sharing. Within SStorage, it is mainy used for routing and cluster management. A
d_system is organized with a sequence of mastery. It is not designed for master/slave.
For a master/slave architecture, the entire
d_system would be the master.
The sequence of mastery means that one of the nodes, the leader, has the smallest absolute counter of arrival in the cluster. It is responsible for adding new nodes and addresses to the cluster.
Cloud storage node are handled as slave nodes to the main
d_system infrastructure that manages transactions and operations. Communications between those are mildly encrypted because they are not critical: they only carry encrypted data, but they are authenticated. Communications between
d_system nodes are however encrypted strongly.
I will publish later more articles on how I used both
d_system to make a complete end-to-end cloud storage for privacy centered use.