Deconstructing Time Machine

Time Machine is Apple's big new feature in Leopard - it's a convenient backup program that solves most people's backup needs with a very simple interface. The reason it works is that it's very well designed, it does most things right, and it's built on a very solid technology base.

What it Does

Every hour, Time Machine springs into action and makes a backup of your files. To the user, that's all there is to it, almost. It does its job behind the scenes and it doesn't require any action on behalf of the user (they don't even have to stop working). It doesn't complain if the backup volume is temporarily disconnected - it will just back up as soon as the disk is available. You can specify the backup disk, what folders (if any) are to be skipped, and that's pretty much it. Some more options may be useful - for example specifying the maximum amount of disk space to be used for backups - but for the vast majority of users, this bare minimum set of options is enough. Anything else would add confusion, and backup is one thing that should be as simple and clear as possible.

Now, by itself, this would be a good backup program. But what happens when you actually want to restore your backed-up data? Time Machine has a visually-impressive interface for recovering information you accidentally erased; it's out-of-the-way when you don't need it, and simple and intuitive when you do need it. Also it's easy to restore the whole disk in case of catastrophic failure: the Leopard install DVD has an option to restore the system from a Time Machine backup. You don't need to understand how it works, you can just trust that it will do its job properly (even if you didn't back up your system folder, I can bet that it will do the right thing - install it from the DVD - so you end up with a functional system).

How it Works

Time Machine uses "off-the-shelf" technology wherever it's possible. The backup is made on an HFS+ filesystem, just like the original disk. When backing up over the network, the backup is stored in a disk image - disk images have been used with Mac OS X since its first version (most notably for distributing programs as .dmg files). Where existing technology was not enough, it was sensibly expanded: they implemented hard-linked directories (which is a very bold, potentially dangerous thing to do, but ultimately I think it's a good solution), and they have created a new storage format for disk images - instead of a single file, it's made up of many small "bands" that can be dynamically added and removed to create a sparse volume. Other than that, it's standard stuff: for example, when backing up over-the-network, a remote volume is mounted using AFP (Apple Filing Protocol), and a disk image in that volume is mounted locally; from that point on, it's just like backing up on a locally attached external disk. And, of course, this is all done automatically.

Time Machine only backs up new and modified files. It can find them because it uses the new FSEvents framework which tells it, most of the time, where changes happen in the filesystem. There are rare occasions when FSEvents can't track these changes; that's when Time Machine has to do a full traversal of the filesystem.

Backups are stored as directory and file trees that look just like the original. Since unmodified files and folders are hardlinked, little space is wasted. Clearing out old backups is also easy - one directory tree is removed, but directories and files that are linked by other trees are not destroyed (because that's how hard-linking works).

The backups are automatically "thinned" - hourly backups are only kept for the last 24 hours, daily backups for a month, and weekly backups after that. It makes a lot of sense - the farther back in time you go, the less you care about when exactly the backup was made.

The Scaffolding

I like how everything is out in the open. You can see how Time Machine does its work; it even logs a lot of interesting information, both in the system log and in its log files in the backup folders (for example, it says when FSEvents can't reliably tell which files have been modified). This gives me confidence in the application - I mostly understand how it works, so I know when I can trust it, and I can also use it more efficiently (e.g. when I know it needs to back up something big, I hook up a FireWire cable to the server - instead of wifi - and manually start a backup).

The Bad

I do have a couple of negative points on Time Machine. One is that it doesn't allow you much control other than its simple GUI (I'm not your average user, I want more). People have to discover undocumented features and then hope for the best if they decide to use them. This is a general problem with Apple software - people have been hacking with it and around it for a long time, navigating in the dark, with scary results sometimes. I think Apple should provide some kind of information - not full support, because it doesn't make economic sense - but at least help advanced users to tweak their systems, at their own risk, with as much information as possible.

The other problem is that you can only (officially) use it over the network with a Leopard server or an Xsan volume - AirPort Disk support is missing, even though it had been marketed. There are several possible explanations for this, but Apple should be more communicative about the issue, because people have spent money on AirPort base stations expecting Time Machine to work with them.