How long will your data last before it gets lost or destroyed
My generation is the last that doesn’t have their whole life electronically documented. When I started going to school, only two of my classmates had a computer. My digital life started roughly at age fifteen. So half of my life is digitally documented. But the question is for how long?
I recently found out my docx password-locked diary. I was able to guess the password (the one password I always used before using a password manager1). It was on a seven-years-old hard drive in a folder called
old that came from a backup of an even older hard drive. It made me think about the best way to store information that I want to access in 30 years and about how long data usually lasts.
Initially, I thought that the storage medium itself would become the bottleneck in data longevity, after consulting Wikipedia2 it seems that if the storage medium is well kept and not used, it might outlive you. The only exception to that rule is flash memory. The reading devices then become the main issue with physical media longevity.
The storage media longevity is poorly documented on the Internet. There is no definitive answer for how long the data will last on a well-stored and not-used device. Cloud companies sponsor most of the research, and it is interested in heavily utilised HDD and SSD drives3.
You can still easily read the high-density floppy discs (the 3.5" that were last before CDs), you only have to order USB floppy disc reader online for about $204.
If you’ve got the 5.25" disc, the USB reader will cost you double that. So if you had your data stored on 5.25" floppy disc in 19795 and you kept the floppy disc in a dry, dark place, you might still be able to recover the data now, 40 years later for only $40. But if you stored the data on one of the not-so-common version of a floppy disc, it’s going to be much much harder to extract the data.
If you nicely tuck away your hard drive in dry, non-magnetic place, you have a chance of the data there lasting anything between 10 and 50 years6, but since the magnetic field breakdown is not discrete, you are better of refreshing the data at least every five years.
USB sticks and SSD drives are using the same method for storing data7. The most important factor for us is the “retention”. Retention means how long the NAND memory cell can keep its programmed state without being connected to power source8
For USB sticks, the thing that’s most likely to break is the circuitry that connects the control unit and the flash memory. If you store the USB stick well, the data might last up to 20 years9.
The Internet started to be generally available in 1989 when first private internet service providers began offering their services10. But it was only in 2004 when Gmail launched11. And Gmail is what I call the earliest cloud storage that is still available today. I joined Gmail in 2008, and the first emails are my first cloud digital trace that is still stored.
Since I backed up my old hard drives to Dropbox, some data I now have in the cloud is much older than 2008. But those cloud bits are still younger than those first Gmail emails.
Sixteen years of Gmail’s existence is not long enough to count cloud storage as a viable long-term digital storage solution.
Data format and encryption
The worst way to store your data would be a proprietary binary format. That means the data is encoded in a custom way on your storage media. The best example, although extreme, is old video games. They often had both custom storage and custom file format and are not playable or readable on newer devices12.
The best way to store your data is in a plain text in UTF-8 or ASCII encoding. That’s one of the guiding philosophies of Unix systems13. Plain text data created on a Unix machine in 1970 is still going to be readable today, giving the format the best chance to survive.
When it comes to audio-visual data, I don’t have any suggestions. I think that the current VLC player can play any video file that I ever owned and I don’t think that’s going to change any time soon. But then again, digital video is still new data format compared to ASCII text. MPEG-1 has been developed in 199114.
I love encryption. But encrypting the data will significantly lower the chances of you or someone else being able to recover them decades from now, which is the point. Encryption is not suitable for storing data that your children might want to look at one day. Imagine that your grand grandma encrypted those black and white family pictures.
Encryption may exacerbate the problem of preserving data since decoding adds complexity even when the relevant software is available.
What can you do
Organise your data. The clearer your folder structure is, the more likely are you ever look at the data again.
Store your data in multiple places. If the data is only in one spot, they are bound to be lost sooner or later.
Every five years or so, you should take the data from your previous storage and copy it to a new storage medium. Magnetic hard drives need to refresh their magnetic field, and flash drives need to recharge the NAND gates.
Read more about how I use password manager in Security is like ogres. ↩︎
Digital Archaeology: Recovering your Digital History | The New York Public Library ↩︎
backup - How much time until an unused hard drive loses its data? - Super User ↩︎