Analyst Opinion: Backup Terminology Matters

If technology vendors don’t agree on what generic industry terms mean, how in the world are customers supposed to compare similar products?

Wait. Maybe that’s the plan.

Consider the terms backup, archive, & continuous, for example. The terms backup and archive are not interchangeable. Archives also are not old backups. Neither is an archive where one stores their backups. So why do so many backup vendors seem to use these terms this way?

Similarly, the word continuous means something. Specifically, continuous backups (officially referred to as continuous data protection or CDP), means a very specific thing – and that very specific thing does not include snapshots. So when a backup vendor says “we offer continuous backups as often as every n minutes,” it makes absolutely no sense to those who know what those terms mean.

In a recent column my colleague Joseph Ortiz explained just how different backups and archives are, and in a future column I will explain just how different continuous backups are from periodic backups (even if the period is very short). But right now the discussion is why vendors keep incorrectly using industry terms that have been well defined for years. (All three of these terms, for example, have official SNIA definitions that were agreed upon over 15 years ago.)

Can anything be done about this? Because it matters. For example, It’s hard enough to get an average IT person to understand the difference between backup and archive. It’s even harder to get them to understand what truly continuous backup is. But just for the moment consider a person who finally has a proper understanding of these terms, then getting a briefing from a vendor that says they have continuous backups every five minutes that you can archive to tape after a week. Imagine how confused that person would be?

The vendor could have said that they have near-continuous backups that you can move to tape after a week and everything would be fine. But no, they say they have continuous backups, and they say you can archive them to tape, or that you can store them in your archive.

This is not just not semantics. The hoards of people who think their old backups are archives have been responsible for billions of dollars of wasted IT expenditures and lost lawsuits that cost companies billions of dollars. This particular confusion must go away! (In contrast, the continuous issue is less important, but it’s still important.)

This problem is not new. Windows has a bit that tells you if a file needs backing up and they called it the archive bit. Twenty five years ago there was a backup product called SM-arch. And the UNIX tar command used to make backups is short for tape archive.

But the modern day requirements of actual archive systems has made it more important. So the question is: is there anything we can do about it? That’s our role at Storage Switzerland to define the terms, educate IT professionals not only on the terms but also how vendors might misuse them. And then, most importantly, how to use the technology to solve IT problems.

W. Curtis Preston (aka Mr. Backup) is an expert in backup & recovery systems; a space he has been working in since 1993. He has written three books on the subject, Backup & Recovery, Using SANs and NAS, and Unix Backup & Recovery. Mr. Preston is a writer and has spoken at hundreds of seminars and conferences around the world. Preston’s mission is to arm today’s IT managers with truly unbiased information about today’s storage industry and its products.

Tagged with: , , , ,
Posted in Blog
4 comments on “Analyst Opinion: Backup Terminology Matters
  1. Hi Preston, I understand the nuance of the definitions you are using. I understand backups on tape don’t magically turn into wine after the passage of time (great analogy I got from you by the way). However, in looking up the SNIA definition for “archive” it says, in part:

    1. [Data Management] A collection of data objects, perhaps with associated metadata, in a storage system whose primary purpose is the long-term preservation and retention of that data. – See more at: http://www.snia.org/education/dictionary/a#sthash.OMJUPu0V.dpuf

    So, it would seem one could argue that moving backup data to tape for long-term storage would be considered “archiving”, even if it doesn’t meet your definition of archiving which better addresses the need for searchability for legal discovery, etc., available within a robust archiving solution. I think we just need better terms/definitions and that “archive” is a bit broad and can apply to many aspects of data preservation. Or am I missing something?

    Totally agree on the CDP though! “Near CDP” is definitely more accurate for most solutions.

    • wcurtispreston says:

      I think the key there is “primary purpose.” I can haul fertilizer in a sports car, but that doesn’t make it a dump truck.

      The primary purpose of backup systems is to restore files/databases/systems to the way they looked a relatively short time ago, not to hold onto data for anything longer than a year. It is certainly not designed to hold onto data for ten+ years. Yes, it can do it. But it’s not designed to do it, and it’s REALLY bad at it.

      To do a restore you need: name of the system being restored, name of the entity w/in that system (e.g. filesystem/database), name of the entity w/in that entity (filename/tablename), & the single date to which you want it restored. When you go to to do a retrieve 10 years later, you have none of those things. You want all the files/emails that got created on any system during a range of dates (e.g. last seven years) that have a particular thing in common (they all came from fred or contained the word Excalibur). You don’t know which of your company’s 15 Exchange systems the emails went from or to, so if you did this w/backups you’re looking at 15 restores X 52 weekly backups X 7 seven years. That’s 5460 restores. You’re going to need multiple different versions of Windows and Exchange to restore these backups to, because they are version dependent. Once you do each of the 5460 restores, you’ll need to do system-wide queries against them to find the emails you are looking for and create a PST or EDB file from it. You will have tons of duplicate data within all those files, too. In addition to being a really costly way to do this, it’s also a risky way to do it. If the judge thinks you are deliberately stalling, he can order an adverse inference instruction to the jury and you’re done.

      I worked for a consulting company that got the job of doing this for a client. They spent about $1.5M just in consulting to perform the herculean effort necessary to make it happen — and it was only 3 years. If they had a time machine (and remember it doesn’t matter when you invent a time machine), they would have gone back in time and installed an email archive system.

      So, yes. A backup can be used as an archive. Just like a sports car can haul fertilizer. But it is not its primary purpose — and it’s really bad at it.

  2. Tim Wessels says:

    Well, it all reminds me or the endless discussions about the definition of cloud computing even after NIST promulgated a perfectly reasonable definition of cloud computing. Some people claimed that cloud computing could mean anything you wanted, which was nonsensical. Vendors were quick to “cloud wash” their products and services to claim they were a player in the new cloud computing market. That said, I think jumping on the cloud computing bandwagon is different than confusing backup and archive, which have longer histories in information technology. And it is inexcusable for vendors to fail to recognize the differences between backup and archive and properly reflect it in the marketing of their products and services.

    A backup is a copy of something. A backup is made so you will not lose access to something that you need, if the copy you are using becomes unavailable as a result of human activity, machine malfunction or an act of nature. A backup is a safeguard that is stored for recovery purposes, which is the other side of the backup “coin” and is the reason for making a backup.

    An archive is not a copy of something. It is the thing itself. An archive may need to be immutable if it has some legal or financial purpose or is part of a historical record. Some archives are more “active” than others, which tends to dictate how they are stored. Data durability is an important requirement for making an archive as it may need to be accessible for decades.

    Thanks for bringing up the subject of backup and archive. Their distinctions are important in information technology.

  3. wcurtispreston says:

    We’re mostly in agreement. SOMETIMES an archive is a copy, as in an email archive. It’s technically a copy of the email system. But generally, you’re right.

    I like this way of saying it:

    Backup is a secondary copy of primary data and archive is the primary copy of secondary data.

Comments are closed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 21,891 other followers

Blog Stats
  • 1,185,559 views
%d bloggers like this: