One of the things that makes a backup person’s head spin around is when they hear someone say, “my data is on RAID, so I don’t need to back it up.” The same would be true of erasure coding and replication. All three of these data protection techniques are important pieces of the puzzle, but none of them replaces backup.
Once upon a time, disk drive failures always resulted in a restore. If a drive failed, you reached for your latest backup tape. You experienced downtime as you waited to replace the drive then restore it, and then there was more downtime waiting for the restore to complete. Ah, those were the days. (Queue Wayne’s World-style dissolve.)
I can remember the first large restore that I did with my first commercial backup product, SM-Arch from Software Moguls. I speeded to the data center and ran to the server where I had to place the DDS-2 tape in order to start the restore. I remember also the incredibly low transfer rate I was getting due to a misunderstanding about how the software compression feature worked. I remember this vividly because we had already suffered hours of downtime just to replace the disk drive, and now the rest of IT was just waiting for the backup guy to restore a few gigabytes of information. That was a long few hours.
Later that year I experienced another large restore when some errant code accidentally deleted thousands of users’ home directories. Even if we were using RAID, that restore would’ve still been necessary. RAID does not protect against accidental or purposeful deletion or corruption of data at the file level.
RAID prevents outages caused by the loss of a single piece of media. If the bank I worked in had been using RAID, that incredibly stressful restore caused by a failed disk drive would have never happened. We would’ve waited for the appropriate time, opened up the server, replaced the failed disk drive, and then restarted the server. (We also didn’t have hot-swappable disk drives.) But again, we would have still needed to do the second restore where errant code deleted data. That’s because RAID isn’t backup.
Although erasure coding is a more updated data protection system than RAID, it still operates on the same basic principle – it protects against the loss of one or more pieces of media, nodes in the cluster, or even the loss of an entire site. But if someone goes in and drops a table in a database or deletes an entire directory, erasure coding will simply make sure that change is immediately represented on all nodes of the cluster.
Replication by itself works the same way. It automatically replicates all changes in one site to one or more other servers or sites. If you lose a drive, an array, a server, or a site, replication is your friend. Depending on your bandwidth and latency, the replicated destination can be updated to almost the exact same time as the replication source, and you will suffer no data loss as a result. However, get hit with ransomware and the rogue encryption of your files will spread to every replicated site. As mentioned in a previous blog post, snapshots by themselves do not protect against drive loss. Replication does not protect against user error or malware; however, the two together can actually protect against both.
There is nothing wrong with RAID, erasure coding, nor replication. They are each vital parts of most people’s data protection system; however, they do not replace backup. RAID and erasure coding are only designed to protect against physical hard drive failure, not logical corruption like the deletion of a file. And none of these systems has the concept of history; therefore, whatever happens ends up happening everywhere. There is no back button. That is what backup is for – it allows you to backup time.
Sponsored by Commvault