paulsiu

Long-term archiving

Recommended Posts

Recently, one of my CD-R failed. Despite its claims of decades of archival life, CD-R seems to be hit or miss. Sometimes, I have CD-R that are 10 year old and still reads great. I also have ones that died after 2 years or so. There doesn't seemed to be a brand that's better or worse. I suspect that even if you stick with one brand, their supplier will change overtime. You still won't get consistency if you stick to one brand.

This brings me to the point of archiving, what would be the best media for archiving. Currently, I have multiple hard disk that I make copies of every x number of years. I also copy it to 2 hard disk and store one of them off site. Hard disk are a poor archive media, but by having multiple instances and replacing the hard disk, I am hoping that at least one copy of the data will survive. Even though that is a lot of hard disk, it is still cheaper than a whole back stack of DVD and practical (I can't sit there and copy 100 dvd!).

What do you folks think?

Paul

Share this post


Link to post
Share on other sites

Given the rapid change in computers, I would think that the most important thing is to realize that you'll be changing types of backup media every 5 to 10 years.

if you are burning to CD-R or DVD-R, stuff like PAR may be helpful as well.

Share this post


Link to post
Share on other sites
Even though that is a lot of hard disk, it is still cheaper than a whole back stack of DVD and practical (I can't sit there and copy 100 dvd!).

I don't quite get it. Are you working as a sysadmin in a company, and you need to preserve more than 450 GiB data pr. year for Sarbanes-Oxley compliance or something? In that case you'll just have to pay the price for Ultrium tape drives, offsite storage with controlled climate, etc. Shortcuts along the lines of "I had some DVDs, but they failed" will not look good if the IRS ever comes visiting....

If you're a private person, then it should be manageable. External harddisks and DVD+R disks, as you're already using, are the tools. DVD+R has better error correction than DVD-R, and much better practical longevity than CD-R. (DVDs have the information layer sandwiched between tough plastic. CDs have it below the thin layer of paint on the topside.)

Good software, like a WinRAR / QuickPAR combination, or a solid commercial backup solution will help a lot. Perhaps Firestreamer RM[1] together with NTBackup, perhaps Dantz Retrospect?

Keep all your data online and well sorted on a harddisk (or RAID disk system) on your PC / file server, and then do complete backups of the 'living' files from time to time, to DVD+R or a couple of external harddisks. Doesn't that work for you?

[1] http://www.cristalink.com/fsrm/Default.aspx

Share this post


Link to post
Share on other sites
and much better practical longevity than CD-R. (DVDs have the information layer sandwiched between tough plastic. CDs have it below the thin layer of paint on the topside.)

Unfortunately, that's not the only factor for DVD longevity.

Sure it helps, but due to the bulk of cheap, low quality media sold those days, and maybe a bigger strain on dye capabilities limits in DVDs, it seems acceptable quality DVDs are more likely to fail than Cds of the same level of fabrication (Taiho Yuden, Verbatim).

Anyway, its too early to judge DVD+/-R longevity since the technology is too young and it changes too fast anyway so that once you've identified more than average media, you've too begin your search anew with te new generation of media (from 2x to 4x to 8x to 16x to ...?).

So don't use optical media as your sole pseudo-"backup"/disaster recovery solution.

Personally, like 270673, i'd advise it as a secondary set of copies after offline HDDs.

Share this post


Link to post
Share on other sites

What people have to understand is that the whole "digital way" is inheritly less reliable than the old analog way. Digital doesnt really fail gracefully, and the increase in storage density doesnt allow centuries of lifetimes like for old paper or parchment.

The big plus, otoh, is the easyness of migration/copying.

In you example, 100DVDs may be a bad solution. Try getting 2 external HDs. Save your stuff on both of them, including parity information. Should cost you 250$ may for 2 400Gbyte hds. Now you just have to spare 15 minutes of your time every year to hook them up and start a comparison run/check.

Alternatively, buy 2 new disks every 2 year and copy your old data over. The increase in density will also allow you to fit the future data on it, too.

There IS work/expenture involved, but try to see the bigger picture: You are dealing with amounts of data undreamed even 2 decades ago, and you have to either invest a bit of your time, or some modest amounts of money in regular intervalls to keep them save.

Just compare that to the effort of keeping those half TB in other kinds of storage decive (like 250 000 books or 1000 VHS tapes). Nowhere keeping them 100% perfect requires less effort.

Share this post


Link to post
Share on other sites
DVD+R has better error correction than DVD-R, and much better practical longevity than CD-R. (DVDs have the information layer sandwiched between tough plastic. CDs have it below the thin layer of paint on the topside.)

Yes in theory this (and the fact that DVD+/-R has more error correction than CD-R) should make make it more reliable, but sadly I have had far more failures with DVD recordable media than I even had with CD-R. I know that many other people have had similar experiences, I'm still very dubious about using DVD's for long term data storage.

Share this post


Link to post
Share on other sites
Unfortunately, that's not the only factor for DVD longevity.

[...]it seems acceptable quality DVDs are more likely to fail than Cds of the same level of fabrication (Taiho Yuden, Verbatim).

You can very well be correct. There are many sensible arguments to be made for both CD's and DVD's better longevity.

For me, when CDs have failed me in the past, then damage such as scratches or delamination to the top level layer has been the cause. That's why I feel that DVD+R has better "practical longevity"; but of course everybodys usage habits and experiences are different.

I understand the arguments about DVD+R's superior error correction, CD-R's lover bit density etc. It's outside my possibilities to quantify these issues and really evaluate their impact against each system - in other words, a techies opinion about media longevity is more a feeling, and less of a fact. :-)

Share this post


Link to post
Share on other sites

Regarding parity, can someone enlighten me on parity information. Does this generate a extra file along with your regular file so that when there is a bad sector, you can use the parity to recover?

Does anyone recommend a decent compression program. WinRAR was mentioned. Is this the best?

Share this post


Link to post
Share on other sites

The best media for backup depends mostly on how much data you are backing up, and secondly how often. A basic chart for backup goes something like this:

0-1.3GB CD-R

0.7-9GB DVD-R

4-500GB Hard disk

250-10,000GB Tape

Changing out 1-2 CD/DVDs is a simple chore, but going over two discs increases user interaction and chances of error dramatically. Also, proper storage of many discs for transport to an offsite location becomes more difficult.

HD vs Tape is one of those old arguments that I don't want to get into very far, especially because they both cover different niches better. Backing up to a single hard drive is usually a better choice than a single tape for a couple of reasons.

1. For a 250GB unit, a tape and hard drive are usually about the same cost ($70). But the tape requires the purchase of a drive that usually costs ~$1000+ versus $40 for an external USB hard drive enclosure.

2. In five years, your tape drive will probably have been broken or replaced meaning digging around on ebay for a replacement, but a SATA drive will always have an interface readily available.

3. How many hard drives have you seen go bad sitting in a (static-free, unmoving, climate controlled) box? Probably less than I've seen tapes that silently fail on recording.

Anything over 1-2 hard drives runs into the same chore/error prone mess that CD/DVDs do, and getting very large drives quickly loses any cost benefits. But this is where tapes shine. For very large data sets, a tape drive is the easiest and most reliable method of backing up. You may have 10 tapes, but they all go in a box together and get shipped off.

A few other notes about backups:

1. Any backup that may be relied upon for over a year should have at least 2 copies, preferably in different locations. This can be as simple as a rotating backup where data for the past two months is backed up at least once a month so that there will be two copies of any given month.

2. Offsite storage should be far away from you. Preferably 100+ miles.

3. Small data sets can also use flash media, or online methods.

4. Don't underestimate the value of online backups. (Godaddy 100GB storage + 1TB transfer for $7/month with FTP)

5. If the data is sensitive, it should always be encrypted before going offsite.

6. Don't make up the backup plan for the company. Always get in writing what each department needs to have backed up, how often, and for how long. (Don't get fired because some idiot told you they don't need certain data.)

7. Consider sending periodic full backups on media offsite, and sending nightly incrementals over FTP somewhere.

Share this post


Link to post
Share on other sites
Regarding parity, can someone enlighten me on parity information. Does this generate a extra file along with your regular file so that when there is a bad sector, you can use the parity to recover?

Does anyone recommend a decent compression program. WinRAR was mentioned. Is this the best?

That is exactly what PAR does. PAR2 is the best method. And QuickPAR is arguably the best program at this. The incredibly thing about PAR2 is you can have a CD full of data with some PAR2 files that gets a corrupted file allocation table so you can't see where files begin/end. Rip the raw data from the CD to a file. Make a copy of the file with a .par2 extension, and then feed that into QuickPAR. It will search the .par2 file for all of the PAR2 data, and then use that to reconstruct all of the original files out of the raw data dump (assuming total data loss is less than the total available parity data).

Yep, it is that cool. Of course, hard drives tend to go completely bad instead of bit rot on CD/DVD. And how drives handle errors on tapes is often not consistent, so there is no telling with those.

WinRAR is pretty good, but you are supposed to pay for it. Also, it is possible for RARLAB to go out of business and no one to bother being able to decompress the latest RAR's features. I usually use 7-Zip as it has high compression rates, is free, and open source. If you like, you can copy the source and .exe onto every backup you make so that it will always be able to be decompressed, even 20 years into the future.

Share this post


Link to post
Share on other sites
4. Don't underestimate the value of online backups. (Godaddy 100GB storage + 1TB transfer for $7/month with FTP)

7. Consider sending periodic full backups on media offsite, and sending nightly incrementals over FTP somewhere.

FTP can be very unreliable, at least the way the majority of the servers are set up. I would never trust my data with a third party server without knowing how it was configured.

Also, it is possible for RARLAB to go out of business and no one to bother being able to decompress the latest RAR's features.

Sorry, but that's really far fetched. First of all WinRar is not going to appear very soon and even when they do I don't see millions of people being stuck with RAR files and nobody doing anything about it.

WinRar will not just disappear; it's being used by so many that it will continue to be commercially interesting product.

Even if the whole company and their sources would disappear, then people will still be able to open Rar files 10 years, even 25 years later. Even now you can run programs that are over 25 years old and if you don't have the compatible hardware or OS, then you simply use old hardware, hardware emulation or OS emulation. I’m for example still able to run the cobol compiler that I used for my programming study in 1985 and don’t even need emulation to run it on a brand core2duo or XP/Vista.

I wouldn’t be surprised to see people being able to open WinRar files even 100 years later. The point is this; even if WinRar would disappear, you would still have pleeeeenty of time to find solutions or to convert the data to a different format.

Personally I avoid storing important data in compressed form; it can in some cases decrease the changes for recovery but not only that, it’s much easier to go through your archive when you don’t store it in large, several 100MB huge compressed files. Of course you have to look at this from case to case, buy in my case the majority of the data is already highly compressed (image data, mp3, videos, etc) so I don’t waste all that much space if I don’t compress.

As for DVD reliable or not; ask yourself if you want to store your 4,5MB backup data on 1 DVD or 7 CDs….

What people have to understand is that the whole "digital way" is inheritly less reliable than the old analog way. Digital doesnt really fail gracefully, and the increase in storage density doesnt allow centuries of lifetimes like for old paper or parchment.

That’s such a misconception, but hey, you’re not the only one, a lot of people think like you.

Now let’s look at analog. How many people have their important documents, photo albums and other things protected against disaster? Not many, I can tell you that. A good example is the Katrina hurricane disaster.

But even a local fire can mean disaster or the guy who breaks into your house. The problem is that lots of people have made the switch to digital, yet few have made the mental switch. Take for example backups. It’s only the last few years that also non-IT people start to talk about RAID configuration, before that they didn’t gave a dime about making backups. Even today I would be rich man if I would get a cent for every person on this planet who doesn’t make any descent backups of his/her digital photographs. Does that mean digital is less reliable? No, it’s the people who use this new medium who haven’t made the complete mental switch or refuse to make it (often laziness or ignorance). Because let’s be honest; how many times have you heard those cry stories of someone who lost all his data. These are all people who refuse to face reality, namely that hard drives are little machines and like we all know about machines; they all break down after a while.

But even those who made the switch haven’t learned to use some common sense. Take for example RAID. Raid in itself is not backup solution. If you delete a file on drive A (intentionally or by accident) then the same file will removed on the other drive(s). If you want to go back in time, like going back to a Photoshop file you worked on 3 days ago, then you’re also out of luck.

So good backups start with a plan; for example what is it what you want to save, how many times per day do you want to save and how far do you want to be able to go back in time.

Other example; people wonder whether DVD is reliable or not. Let me ask you this; are you going to check out the quality of your DVD backups 10 years from now or are you going to check every 6 or 12 months? Are you going to use solutions like PAR2, quality media, quality burners and quality burns, proper storage conditions, multiple backups of the same data on different brand media, etc, etc.

The point I’m trying to make is this; you all have it in your own hands. Digital is not less reliable as analog if you approach it the right way. Don’t avoid what you have to do because you are too lazy or think “It won’t happen to meâ€, because then you WILL lose data. Be honest to yourself and you have little risk of ever losing your digital data, although there is no 100% failsafe solution for anything in life.

Share this post


Link to post
Share on other sites
That’s such a misconception, but hey, you’re not the only one, a lot of people think like you.

Now let’s look at analog. How many people have their important documents, photo albums and other things protected against disaster? Not many, I can tell you that. A good example is the Katrina hurricane disaster.

But even a local fire can mean disaster or the guy who breaks into your house.

YOU dont get it.

I wasnt talking about the PROCESS, i was talking about the single media.

My whole point was that while a single media is more likely to fail, the digital nature allows perfect redundancy at little to no effort.

Share this post


Link to post
Share on other sites
Regarding parity, can someone enlighten me on parity information. Does this generate a extra file along with your regular file so that when there is a bad sector, you can use the parity to recover?

Does anyone recommend a decent compression program. WinRAR was mentioned. Is this the best?

Paulsiu, try going to www.wikipedia.org and enter "parity" in the search. I think you want the entry for "error detecting code" in line 5 (error correction is the next step up). Generally error correction is buildt into the media design, ie. into the specification for DVD+R or into the harddisk itself, and it's operation is transparent to you. However the errors can be too much for the buildt-in system, and then you'll get a read error. This error could affect just a few files, or render the whole DVD / harddisk unreadable.

About compression, yes WinRAR is a very good program with extensive capabilities. I have used it heavily for years. 7-Zip (www.7-zip.org) is freeware, and the current champ as in high compression ratio, and also very good. QuickPAR (.co.uk) is a very nice program for adding extra file-level error correction capability.

But before you roll your own solution from scratch, consider just buying something good. Dantz Retrospect comes very recommended, and doesn't have to be expensive for home usage.

Another very good suggestion from this thread is to use two different backup systems. I personally use an external harddisk and Mozy Backup, a online service. Prices on online backup have changed since, with Amazon offering it's S3 backend service to developers, so look around. But anyway, having 2 systems, such as DVD and harddisk, or harddisk and a online service, very much betters your chances of retrieving data.

One last thing that hasn't been mentioned enough: Test your backup systems. At regular intervals perform a full restore to a temporary directory, and compare the files to the source versions. It's the only way to know for sure that your backup system still works, and secures all files...

Share this post


Link to post
Share on other sites

I can at least warn you about Princo dvd's. Only 1 out 5 of my Princo dvd's that are 1 year old are still readable. My Fujifilm and Verbatim dvd's seems to be ok so far.

Edited by stefanpi

Share this post


Link to post
Share on other sites
FTP can be very unreliable, at least the way the majority of the servers are set up. I would never trust my data with a third party server without knowing how it was configured.

Indeed. I probably should have mentioned to treat FTP like DVD. You don't know if anything was corrupted while uploading. Fortunately a couple of small PAR files will fix that.

Most ftp servers from large hosting companies are going to be at least as reliable as shipping a disc, tape, etc to an offsite location. In any case, you should always have a local back too. (Always have the original plus two copies.)

I wouldn’t be surprised to see people being able to open WinRar files even 100 years later. The point is this; even if WinRar would disappear, you would still have pleeeeenty of time to find solutions or to convert the data to a different format.

I have serious doubts that a competent consumer would be able to open a RAR file in 100 years, though I also doubt there would be the need. RARLAB regularly adds features to RAR that break read compatibility with older versions and 3rd party products. If they went out of business at the right time, chances are good there would be a lot of not easily readable files in 10 years. No company updates their old backups when they change to new methods. They assume that when the backups are unreadable, they won't want to read them, so this isn't really worth mentioning.

Still, none of this is likely to be an issue. I just thought I would mention it as a method to further my opinions about the benefits of open source products. ;)

Personally I avoid storing important data in compressed form; it can in some cases decrease the changes for recovery but not only that, it’s much easier to go through your archive when you don’t store it in large, several 100MB huge compressed files. Of course you have to look at this from case to case,

If you are likely to regularly need to retrieve a single file, then a single large compressed file is inconvenient. If not, it offers a lot of convenience as operations on a single large file are almost always easier than many small files. But yeah, spending time to compress mostly uncompressable files is silly.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now