All of us at storage Switzerland preach that backup and archive are two different practices, and have been for probably more than 20 years. But to see the results of a survey during one of our recent webinars, you would think it was a novel idea. Once more into the breach.
We asked the attendees of the webinar what they do for file system archiving. We gave four choices: no archiving, using secondary storage, part of their backup, and in the cloud. The biggest surprise for me was that no one answered the cloud. Archiving seldom-used data to cold storage in the cloud makes a lot of sense both financially and technically, so that came as a surprise no one on the webinar was doing that. It was encouraging to see that 13 percent of the people are doing some kind of archiving on their secondary storage. It’s not surprising that 20 percent of the attendees said that they do no archiving. Actually, I expected it to be the most popular answer. Many people look at the cost of archiving and weigh it against the cost of doing nothing. Doing nothing is easier, even if it costs more. So I was surprised to see that only 20 percent said they have no archiving practice.
Then I saw the answer that won the poll by a large margin. Sixty six percent of the attendees said their archiving system was part of their backup. I just wanted to scream. Backups are not archives. Archives are not backups. Really old backups do not magically turn into archives; they even make really lousy backups. It becomes harder to use them for what they were designed to do (i.e. restores) and it is near impossible to use them to do retrievals.
Solving the No Budget Problem
If you already know that backups and archives are different and just can’t get any budget to do proper archiving, check out the on-demand version of the webinar where we did the survey. We talk about some novel approaches to that problem that might actually help you make your case.
If you don’t know what I’m talking about when I say that backups and archives are completely different concepts, read on. A backup is a copy of your data that you use to restore the data back to a previous, known good state. You use it to restore your file or database server to the way it looked yesterday or last week. You can even use backups to restore your entire data center when it burns down, although using traditional backups for this purpose is probably not a good idea. You might want to look into a disaster recovery as a service (DRaaS) product for that purpose.
Archives, on the other hand, are used to collect and retrieve related pieces of information over long periods of time. An email archive, for example, contains all of the emails sent and received by a company over a particular period of time, typically three or seven years depending on your industry. It allows you to retrieve all emails sent by one or more person(s) to one or more other person(s) over the entire period of the archive. It allows you to retrieve all emails that contain a particular word or group of words, such as all emails that contain the name of a particular project or product you are being sued about or are suing someone else about.
Restore vs. Retrieve
The best way to understand the difference between backups and archives is to understand the difference between a restore and a retrieve. To do a restore, you need the name of a server, a database/filesystems/directory/file, and a single point in time. An example of this would be to restore the purchasing database on the Apollo server to last Thursday.
When doing a retrieve, you have none of these pieces of information. What you have is the content that you are looking for, such as all files or emails with the word umptysquat in them that were created over the past five years. These files and emails were stored on multiple servers in multiple formats and were created at many different times over the five year period. It is technically possible to do a retrieve with backups, but to say it is significantly harder is an understatement.
For example, I once participated in a project that was to satisfy an electronic discovery request for three years of emails where the customer did not have an archive system. What we had to do was restore three years of weekly full backups (156 restores), followed by 156 queries against Exchange for the data we were looking for. This request could have been satisfied in five minutes if the customer had had an archive system. Instead it took a team of 15 consultants working around the clock for three months. That’s 150 man hours-a-day times 90 days, for a total of 13,500 billable hours at $200 an hour. That’s $2.7M. Or the customer could have done a single query – if they had an archive system which would have cost, at the high end, $250,000 and been able to be used across many more such requests.
Please don’t use your backup system as an archive system. If I’ve piqued your curiosity, go check out this webinar where we talk about this problem and see if you can do something about your company changing its practice in this area.