Does Object Storage kill RAID?

Like a lot of technologies that are past their prime, RAID will continue to serve a function for a very long time. It’s an inexpensive way to string together multiple drives and protect yourself against the loss of one or more of those drives. There is a huge amount of inertia behind RAID, volumes, file systems, etc. Let’s face it, the IT world gets them.

But when it comes to storing petabytes of unstructured data, RAID systems have more limitations than advantages. The biggest issue is the rebuild time of multi-terabyte drives — measured in days — and unfortunately drives are only getting bigger. If you’re running RAID 3-5 and you lose a drive during the rebuild, you also lose data. The whole point of RAID 6 is to mitigate the risk of a double-disk failure during a multi-day RAID rebuild! And yes, it’s possible to lose three drives in the same RAID array, and RAID 6 won’t help you if that happens during your rebuild. And of course the performance of the RAID array is not good during the rebuild time.

Another issue not addressed by RAID is the undetectable bit error rate of SATA drives, which is 10-14, or one undetected error every 10 TB. Think about that, this means an undetected error on every disk drive with a 10 TB drive (which are shipping in quantity now). With this in mind, IT professionals really need to consider spreading the risk out over more than a couple disk drives.

There is also nothing in RAID that deals with the issue of geographical dispersion. It is up to some higher-level process to get the data to some other location, and this process is complicated at best. It’s also extremely expensive. It forces you to make a number of complicated decisions, such as whether to replicate asynchronously or synchronously, replicate the entire volume or just part of it, and how many different locations to replicate to. And since replication also replicates human errors and corruption, you’ll need some type of version control or backup software.

Enter Object Storage

Object Storage takes away a lot of those questions and addresses the human error issue as well. Here’s how it works: treat data as objects and specify what type of protection each object should have. Want to be able to survive the simultaneous loss of three different data centers? No problem. Just configure your object storage system to do that.

They will do this via multiple copies, called replication or erasure coding. Erasure coding is a parity-based method of protecting data, but that’s where its similarity to RAID ends. Explaining its details is beyond the scope of this blog post, but suffice it to say that it uses a parity system that spans geographic locations (and does not have the same issues RAID has), and it ensures that each object can be read in at least n locations. The multiple copies method is easier to understand, and just makes sure each object is copied to n locations. If you lose one or more of those locations (or a node within that location), it copies the missing object(s) to another location using one of the other locations as the source.

User error, corruption, and malicious attacks are easier to overcome as well. Each new version of an object creates a new object, and previous versions are still available. A restore can be as simple as changing a pointer to the older version of the affected object(s).

Object storage systems are also infinitely scalable, as there are no volumes or file systems to manage. There are no forklift upgrades to replace large amounts of data. Simply insert a new node into the system, replicate copies into it, and retire the old node.

Object storage may not have killed RAID, but it sure makes RAID look ill-prepared for the future.

W. Curtis Preston (aka Mr. Backup) is an expert in backup & recovery systems; a space he has been working in since 1993. He has written three books on the subject, Backup & Recovery, Using SANs and NAS, and Unix Backup & Recovery. Mr. Preston is a writer and has spoken at hundreds of seminars and conferences around the world. Preston’s mission is to arm today’s IT managers with truly unbiased information about today’s storage industry and its products.

Tagged with: , , , , , ,
Posted in Blog
6 comments on “Does Object Storage kill RAID?
  1. Tim Wessels says:

    Well, who is offering a way to erasure code objects so that the data and parity fragments can be dispersed to multiple physical sites? My guess would be Cleversafe (IBM), since they rely exclusively on erasure coding to protect data objects, and possibly Amplidata (HGST). Cleversafe and Amplidata have a litigious history that likely touches on such matters. Cloudian HyperStore supports erasure coding objects within a single cluster location and then replicating the erasure coded objects to a cluster in another location. The issue with dispersing erasure coded data and parity fragments across multiple sites is one of increased latency on reads, and if a single site goes off-line, there may not be sufficient fragments in the remaining site to allow for reading the object. That said, work on developing hierarchical erasure codes looks like it will address the issues of latency and recovery of missing fragments when erasure coded objects are dispersed to multiple sites.

    • wcurtispreston says:

      I perhaps should have said that it CAN span multiple geographic locatioins. Theoretically, you could always ensure that each site always has enough chunks to be self-sufficient without the latency issue you describe. It’s probably easier, though, to just replicate the entire data set.

  2. Tim Wessels says:

    Well, the “undetectable” error rate on large SATA HDDs is not really an issue when it comes to using object-based storage (OBS) to protect data. Any worthwhile OBS software should be capable of doing a “repair-on-read” and be able to perform continuous “health checks” on stored data objects. Object-based storage systems can accommodate the failings of SATA HDDs without putting stored data at risk. OBS clusters can protect against the loss of HDDs, entire storage server nodes, and possibly an entire site, if that has been baked into the architecture.

  3. wcurtispreston says:

    I’m not sure why you put “undetectable” in quotes. It is undetectable, not “undetedtable.” 🙂 The whole point of the UBER is that the error will not be detected by the drive.

    Both RAID-based systems and object based systems can detect the error at a higher level if they do corruption checking on the objects themselves, but an object-based system has this builtin. An object-based system is also better setup to replace a single object that has become corrupted due to a flipped bit somewhere.

  4. Robert Cox says:

    Full disclosure I am a NetApp marketing guy. NetApp’s object storage solution StorageGRID allows for erasure code objects so that the data and parity fragments can be dispersed to multiple physical sites. StorageGRID also support erasure code at the node level so drive failures don’t create unwanted network traffic.

  5. Good article Curtis, we agree 🙂

    The object retrieval, compare to object index and original hash, is a great way to mitigate UEB. If they hashes don’t match, go grab another copy/copies.

Comments are closed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22,219 other followers

Blog Stats
%d bloggers like this: