/ English

A rough overview of filesystems on Unix-like Operating Systems

What are the ingredients of a filesystem?

Filesystems have 3 main components on any modern systems:

  • Files
  • Directories
  • Permissions on the above

On Unix inspired filesystems, those are linked as presented below:

Diagram representing how directory data is composed of file metadata and links to file data as well as directory metadata and their directory data

Composition (black diamonds here) represent data that is embedded in the item, Aggregation (empty diamonds) represent links pointing to file contents, we call those i-nodes or more commonly inodes.

You can represent a simple filesystem that way:

Representation of a filesystem, the Directory points (INODE) to a Directory data, which in turns contains a File which points (INODE) to the File data

How are open files represented by the kernel?

On Unix-like systems, a process opens a file by obtaining a file descriptor fd to it. This descriptor is linked to the process. The descriptor can access the metadata of the file, and owns another structure, the vnode that contains an inode to the data in the file.

The previous picture has been changed to incorporate another block, named "kernel space", showing that a proccess is linked to a file descriptor, a file descriptor is linked to a file, a file descriptor is also linked to a  which is linked by an INODE to a File Data

It is to be noted that the descriptor doesn't own a link to the metadata of the file but a copy of it masked with the permissions of the file descriptor.

All of this is nice and abstract, but lets see what the actual elements contain.

Reference counting and unlinking

Unix inspired filesystems have a feature named hardlinks. An hardlink is a case where two file share the same inode and therefore share the same contents.

A picture with a directory containing 2 files that share the same inode

Now let's have a thought experiment about what must happen when you delete one of the files.

Let's say we delete file.a, 3 things may happen.

  1. file.b is also deleted and the inode is cleared of its contents
  2. file.b stays in disk, so does the inode
  3. file.b stays on disk, the inode is cleared

The 1rst and 3rd propositions doesn't make lots of sense: it means we have a behaviour identical to another feature of most filesystems (that Unix-like filesystems already have) which is symbolic links, also known as Shortcuts on NT operating systems like Microsoft Windows.

So you must be able to know when you delete the last file that points to the inode. For that there is a simple solution:

Picture of the Toy Story Buzz Everywhere meme: Reference counting, Reference counting everywhere

So we count the number of references to the file inode before deleting said inode when the count reaches 0.

Now off to a more complicated case: the filesystem with files opened:

The picture of the file system from before now contains 3 file descriptors A, B, and C. A and B refer to the vnode 1 and C refers to the vnode 2. vnode 1 refers to the file.a while vnode 2 refers to the file.b who both share the same inode.

All of the vnodes refer to the same inode. Just like the files, inodes are references to the file data. This means that the inode is here referred to not twice, but 4 times in total.

I will not cover the mechanics of creating and modifying files extensively, instead focusing on two mechanics and associated system calls:

  • link (linkat): makes a hardlink to a file
  • unlink (unlinkat): "removes" a file

Link takes 2 directories A and B, and a path in each (Ap and Bp), both directories can be the same, and increments the reference counter of the inode of the file A/Ap then create a file B/Bp with the same inode.

This is pretty simple in its function.

This increments the number of references, making the file able to survive the loss of one more reference.

Unlink does the very reverse of link, it removes a link. But remember that a vnode is a link. This means that deleting an open file does not actually delete it: it makes it unreachable unless you have a descriptor to it until the file get closed, then it gets deleted

Another form of filesystem hardlink may exist on certain occasions. They are the hardlinks used for copy-on-write purposes that we name reflinks.

The basis of the reflink is that it is a hard link, but writing new data to it will create a duplicate of the data instead of both files being the same.

The duplication is generally on the level of file blocks, meaning that both files may share only some blocks in common after changes done to one or another.

The only part of this operations that is exposed to the kernel by the filesystem driver is the advertisement that the system supports reflinks.

The associated system call on Linux is reflinkat. There is no standard around reflinks.

A rough overview of filesystems on Unix-like Operating Systems
Share this