A rough overview of filesystems on Unix-like Operating Systems
What are the ingredients of a filesystem?
Filesystems have 3 main components on any modern systems:
- Files
- Directories
- Permissions on the above
On Unix inspired filesystems, those are linked as presented below:
Composition (black diamonds here) represent data that is embedded in the item, Aggregation (empty diamonds) represent links pointing to file contents, we call those i-nodes
or more commonly inodes
.
You can represent a simple filesystem that way:
How are open files represented by the kernel?
On Unix-like systems, a process opens a file by obtaining a file descriptor fd
to it. This descriptor is linked to the process. The descriptor can access the metadata of the file, and owns another structure, the vnode
that contains an inode
to the data in the file.
It is to be noted that the descriptor doesn't own a link to the metadata of the file but a copy of it masked with the permissions of the file descriptor.
All of this is nice and abstract, but lets see what the actual elements contain.
Reference counting and unlinking
Unix inspired filesystems have a feature named hardlinks
. An hardlink
is a case where two file share the same inode and therefore share the same contents.
Now let's have a thought experiment about what must happen when you delete one of the files.
Let's say we delete file.a
, 3 things may happen.
file.b
is also deleted and the inode is cleared of its contentsfile.b
stays in disk, so does the inodefile.b
stays on disk, the inode is cleared
The 1rst and 3rd propositions doesn't make lots of sense: it means we have a behaviour identical to another feature of most filesystems (that Unix-like filesystems already have) which is symbolic links, also known as Shortcuts
on NT
operating systems like Microsoft Windows.
So you must be able to know when you delete the last file that points to the inode. For that there is a simple solution:
So we count the number of references to the file inode before deleting said inode when the count reaches 0.
Now off to a more complicated case: the filesystem with files opened:
All of the vnodes refer to the same inode. Just like the files, inodes are references to the file data. This means that the inode is here referred to not twice, but 4 times in total.
I will not cover the mechanics of creating and modifying files extensively, instead focusing on two mechanics and associated system calls:
link
(linkat
): makes a hardlink to a fileunlink
(unlinkat
): "removes" a file
link
Link takes 2 directories A and B, and a path in each (Ap and Bp), both directories can be the same, and increments the reference counter of the inode of the file A/Ap then create a file B/Bp with the same inode.
This is pretty simple in its function.
This increments the number of references, making the file able to survive the loss of one more reference.
unlink
Unlink does the very reverse of link
, it removes a link. But remember that a vnode is a link. This means that deleting an open file does not actually delete it: it makes it unreachable unless you have a descriptor to it until the file get closed, then it gets deleted
A last word: reflink
s
Another form of filesystem hardlink may exist on certain occasions. They are the hardlinks used for copy-on-write purposes that we name reflink
s.
The basis of the reflink is that it is a hard link, but writing new data to it will create a duplicate of the data instead of both files being the same.
The duplication is generally on the level of file blocks, meaning that both files may share only some blocks in common after changes done to one or another.
The only part of this operations that is exposed to the kernel by the filesystem driver is the advertisement that the system supports reflinks.
The associated system call on Linux is reflinkat
. There is no standard around reflinks.
If you like my content and want more, please donate. If everyone that finds my content useful paid $1 every week, I would be able to produce content for 1 week a month without relying on other sources of income for that week.