A different kind of filesystem

I have a fair amount of data that needs archiving. It's in the order of a few tens of gigabytes, small enough to fit on a harddisk, but I want it backed up, and available on at least two computers in different locations (home and office). Snapshots and rollbacks would be nice; also, I'd rather have peer-to-peer synching than a central master copy.

If you're like me, this is starting to sound like a DVCS. Trouble is, version control systems are designed to have a working copy outside the repository, so my data would essentially be duplicated; storing large files would also be tricky.

DVCS for data storage

So what about using a DVCS to build a filesystem on top? FUSE is perfect for the front-end, and a bare Git repository can store the data. This has several unique advantages:

  • storage is abstracted as blobs, trees and commits;
  • we get snapshots and rollbacks for free;
  • efficient synching between repositories, also for free.

So, after a quick and fruitless search for existing implementations, I set off writing my own, using Python, Dulwich and fusepy.

SpaghettiFS

The first working code stored files as blobs, and folders as trees, just like a normal Git repository - it worked, but was inefficient for large files. Now files are split into small blocks, linked from a tree that is essentially an inode. Folder entries reference the inodes, just like in a typical filesystem.

So it works. Several gigabytes of my data already live happily in such a filesystem. Synching is nearly painless. Reading and writing are still slow, but usable, and there's plenty of opportunity for improvement. Some Posix filesystem features (symlinks, rename, permissions) have yet to be implemented.

Check out the code on GitHub, feel free to use the issue tracker, and please let me know if you find SpaghettiFS useful.

Created:
18 Nov 2009, 23:14
« previous
(Test-driven development)