Keeping your data safe

Most of us assume that when we save a file on our computer, the job is done, and our work is safe. Sadly it's not that simple!

Most obviously, we can accidentally delete our file, or hard disks can just fail - so we need a back up.


Other bad things can happen as well. There's a phenomenon often called 'bit rot'. This is the tendency for stored data to change all on its own. It's sometimes ascribed to cosmic rays - I'm not sure of the truth in that - or to tiny errors and decay on the hard drive. Whatever the cause, sometimes a '1' can spontaneoulsy turn into a '0' when you're not looking. If that '1' was part of a video, you may not even notice, though in the wrong place it might stop the video from playing. If it's in your company's 'mission critical' production database, it could make your company grind to a halt.

10-15 years ago, a typical hard drive held 10 or 20GB of data. Now 1000GB drives are common. Consequently these small errors are many times more likely to crop up somewhere in your data.

Another relatively common cause of computer failure is the sudden power cut. If a computer is half way through saving a file when the power goes off, that file can become corrupt and unreadable, and that can prevent you from opening the corrupt file afterwards.


 The Good News

The good news is that there are things we can do about this.

1) The obvious solution - keep good backups. But you need to keep old backups too. If 'bit rot' strikes, or you just mistakenly delete a file, you may not notice straight away, and so you might want to recover a file from a year-old backup. You can't do that if you only have yesterday's backup. If you only have space to store recent backups, I'd advise making an occasional archive from time to time. The best strategy depends on whether it's for home or business, how important your data is, how much there is, how often it changes, and how much you are prepared to spend on it.

2) Have a UPS (Uninterruptable Power Supply). This is a fancy battery that allows the computer to close down gracefully if the power goes off. Like a car battery, they can be quite heavy and bulky. It always feels quite wasteful of resources  buying such a bulky thing just in case, but they are worth having on servers, or even desktop PCs running critical databases.

3) There are ways of storing data that can avoid most of this data corruption. Our problem is that the underlying file systems - the methods used to store and retrieve data - on the average hard drive are not as fault tolerant as they could be. But there are more robust file systems available. Here's an example of how they work. First, you can store two or more copies of the data, on separate hard drives (most servers do something like this already). At the same time, the computer calculates and stores a checksum for each piece of data. Simplistically, if you save the number "12", you also save a checksum alongside it: 1+2=3. Later, the computer comes along and tests whether the stored data is equal to the stored checksum. If the stored data has mysteriously changed to "11", the error is detectable because 1+1 does not equal 3. The final cunning part is that because we saved two copies of the information to start with, the faulty version can be corrected by replacing it with a good version.  Don't panic - we've mostly managed without this sort of safety net up until now just by keeping backups, but for some businesses, or people with very large amounts of data, it's worth exploring.


A few years ago, most people's lifetime accumulation of data was fairly small: a little email, some Word documents, and a few years worth of photos taken with a low resolution camera. Before long, people will want to keep decades' worth of high resolution photos, video, music, and more. This sort of fault-tolerance will gradually filter down to every-day use I believe, as we have ever more data, that we want to store for decades.