Is Commodity Storage Hardware really less expensive?

In his recent blog, Hitachi Data Systems’ CTO, Hu Yoshida, discusses the top 10 business trends that will impact IT in 2015. Number four on his list is “Software Defined Everything”. One of the theoretical selling points of software defined initiatives is that they allow IT planners to build infrastructures out of commodity hardware, which should save the data center a significant sum of money versus custom built hardware. Don’t believe it, there is a cost associated with making commodity hardware reliable that must be taken into account before IT planners decide that a software defined solution leveraging commodity hardware is actually worth the savings.

What is the Real Cost of Commodity Hardware?

The key to the software defined crowd’s claim is that it’s ok to use less reliable hardware (a.k.a commodity) because software defined solutions can build in multiple points of redundancy. And for the most part this is accurate. But that redundancy costs money.

For example many software defined storage solutions leverage a replication data protection strategy to provide data redundancy and compute redundancy. In a three-node starter system, a virtual machine’s data on one node is replicated to the two other nodes. If the server or storage in the server on node one fails, the VM can be restarted on another node.

This sounds great, but take this to scale and it gets expensive. If you stay with the above replication model the commodity infrastructure will require 3X the storage capacity. That means 3X the network activity and potentially 2X the compute capacity, all for the privilege of using commodity hardware.

Compare this software defined enablement of commodity hardware to a more traditional, less headline grabbing SAN architecture backed by a quality SAN array. Data is protected on the device and consumes no additional server CPU or storage network bandwidth. Using RAID 5, RAID 6 or some form of erasure coding, the storage system can provide redundancy for an extra 20% to 30% of actual data, not 300% (3X).

Also keep in mind that this environment delivers the most important attribute to IT, predictability. The storage network and storage CPUs are dedicated to the task at hand.

For more on the top business trends impacting IT join Hu and myself for our webinar on December 10th at 12pm ET, 11am CT, 9am PT.

Click To Register

Click To Register

Click Here To Sign Up For Our Newsletter

Twelve years ago George Crump founded Storage Switzerland with one simple goal; to educate IT professionals about all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought after public speaker. With over 25 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS and SAN, Virtualization, Cloud and Enterprise Flash. Prior to founding Storage Switzerland he was CTO at one of the nation's largest storage integrators where he was in charge of technology testing, integration and product selection.

Tagged with: , , , , , , ,
Posted in Blog
10 comments on “Is Commodity Storage Hardware really less expensive?
  1. Tim Wessels says:

    Well, there appears to be a mis-conception of what “commodity” hardware is when it is used to build scalable object storage clusters that protect data using replication or erasure codes. Commodity hardware is not used to store high-volume, transaction-oriented, primary data. Commodity storage hardware typically uses off-the-shelf components for things like the chassis, mainboard, processor, disk drives and memory. That said, commodity storage hardware doesn’t mean it is prone to failure, but when it happens, the storage cluster software can work around component and complete node failures when they happen. It may have a lower price tag because the vendor isn’t offering a “bullet-proof” solution for data storage. Because object storage clusters can scale horizontally from a handful to thousands of computers, it would not be affordagle or technically possible to use high-end block and file storage systems. Stating that commodity storage hardware might not be less expensive than using a SAN or hardware RAID is making an “apples to organges” type of comparison. And as we all learned in our elementary schooling, you cannot compare apples to oranges and neither should Mr. Crump.

  2. George Crump says:

    Tim, Thanks for your comment. I have no issue with commodity hardware in general nor do I have a problem with it in the specific use case of object storage clusters. But primary storage systems are absolutely beginning to use commodity hardware to drive down price. Look at the pitch of almost every software defined storage vendor and even most turnkey storage system startups. In fact an overwhelming number of startups are using Super Micro servers as their “storage systems”. Again nothing wrong with that but there is a cost associated with it that the IT pro needs to take into account.

  3. Robin Harris says:

    George,

    The example of the cost of 3x replication is flawed. Virtually all newer object storage systems use advanced erasure codes – sometimes called rateless or fountain codes – that offer 3 or 4x redundancy (much better than RAID 6) with less than twice the data expansion. The original Google File System relied on raw 3x data replication, but that is no longer needed.

    The 3-node minimum is generally due to the need for a cluster quorum after a node failure. That said, customers who will never need more than a few hundred terabytes are not good candidates for scale-out storage. But as Big Data and IoT drive data volumes those customers will be rarer every year.

    Robin Harris

  4. George Crump says:

    Robin,

    Thanks for commenting. I guess I should have clarified, I was talking about the replication model that we see in primary storage systems or solutions (VSAN) as an example. I wrote about the advantages of erasure coding here: https://storageswiss.com/2014/12/03/backup-2-0-primary-storage-backup-software-and-backup-hardware/. As I said in the prior comment, I think commodity storage in the object, long term storage use case makes a lot more sense. By the way I am not saying commodity hardware does not make any sense at all in primary storage, just that there are some downsides to its use and those need to be compared to the purpose built solutions.

    George

  5. George, I believe you have over-generalized. We at Zadara Storage offer RAID-5 and RAID-6 on our systems, hence offering the same low overhead that purpose-built storage systems do. Given this counterexample, it is incorrect to draw the conclusion that SDS on commodity hardware necessitates lower storage utilization.

    Reliability-wise, we are no worse than purpose built systems, and arguably better, thanks to self-healing (something which hardware cannot do).

    • George Crump says:

      Noam, Thanks for commenting. I respectfully disagree. I am generalizing but not over-generalizing. Zadara has a unique solution to the problem, one that I like by the way. As I have stated in my other responses, I am not anti-commoditization, I am merely saying there is a cost associated with it.

      Initiatives like software defined, hyper convergence and the cloud should drive down the cost of IT, BUT they should also make IT better. Remember there are two problems that IT faces, not enough budget AND not enough staff. Commoditization helps to a degree with problem 1 but does little to help with problem 2. In fact you could make a case to say it makes it worse. If the above initiatives can also help with problem 2 and simplify IT while providing greater insight then we have a winner.

  6. Hi George, you are partly correct. Typicall software-defined storage systems like VSAN indeed have to use replication to solve the redudancy issue. The same goes for Ceph f.e. as they are all storing small objects (f.e. 64KB) on the backend. If you would erasure code these objects into smaller objects, your metadata will just explode. So basically for them replication is the only (costly) option.

    In case you apply a log based appoach and collect multiple writes of a VM into a single object (f.e. 1000-4000 4k writes) you end up objects which you can perfectly erasure code and store on commodity hardware. That is what we are doing with the open-source project Open vStorage (http://openvstorage.com/). We use SSDs/PCIe flash inside the host as Tier1 and use commodity hardware as Tier 2. On the tier 2 storage you can run almost any object store so Ceph + erasure coding is an option. This combination is suited for primary storage and will be much faster and cheaper than the traditional SAN you propose.

    • Tim Wessels says:

      hi Vim…I like the approach you are taking with Open vStorage and object storage. Do you need object storage that is AWS S3 compatible? Open vStorage looks like a way to compete with what Nutanix does. Do you agree?

  7. Hi Tim,

    yes, you need an AWS3 compatible object store for Tier 2 storage.

    I agree, Open vStorage is an open-source storage project but we will release a paying appliance with OpenStack and Open vStorage. This offering will indeed compete with what Nutanix is doing.

  8. George Crump says:

    Hi Wim, Thanks for commenting. As I mentioned in the other responses I’m not saying that commodity based systems are inherently bad and that IT pros should only buy name brand gear. My point, as is often the case with my personal blogs, is to bring to the attention of IT pros some of the areas that I see being overlooked when a particular approach is taken. Commodity solutions are just fine, as long as you know, and design around, the downsides. Open vStorage is an example of a solution that an IT pro should consider to help resolve some of the potential downsides to choosing OpenStack to enable a commodity path.

Comments are closed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22,245 other followers

Blog Stats
  • 1,563,235 views
%d bloggers like this: