The Problems with Hyperscale Storage

Direct attached storage (DAS) is the default storage “infrastructure” for data intensive workloads like Elastic, Hadoop, Kafka and Splunk. The problem, as we detailed in the last blog, is using DAS creates a brittle, siloed environment. Compute nodes can’t be easily redeployed from one application to another and storage can’t be easily moved to take advantage of nodes with more powerful CPUs. The result is over provisioning and underutilized resources. Large cloud providers work around this problem through sheer economies of scale, but enterprises with a lower server count can’t cost justify implementing another rack of servers to keep pace.

Consequently, the enterprise is forced to consider either limiting the amount of data it has available for analysis or developing a complicated tiering strategy. One alternative that enterprises may consider is a new type of shared storage infrastructure (not legacy SAN or NAS), especially in light of recent advances in networking like NVMe over Fabrics (NVMe-oF). To take full advantage of the I/O performance and capacity of NVMe drives, they need to be on a network fabric and accessed by a large number of servers rather than trapped inside a server chassis.

The Shared Storage Challenge to Big Data / Machine Learning

The reasons most data intensive workloads are built on DAS, instead of the more traditional shared enterprise storage (SAN/NAS), is to scale out cost-effectively, eliminate redundant data management already built into their data-intensive applications and to keep data in close proximity to the processor working on it for high performance. Legacy enterprise storage solutions along with network latency can create a performance bottleneck that negatively impacts workload processing time. NVMe-oF may go a long way toward reducing the latency implications of using shared storage for data intensive applications, but it does not overcome the other problems organizations face when trying to create a pooled set of resources from legacy enterprise storage appliances and storage arrays – mainly, premium costs.

Industry-standard, commodity storage media and enclosures are typically dramatically less expensive than media bundled with enterprise storage systems. These commodity solutions do not include many high availability options and software features that enterprise storage vendors factor into the overall cost of their infrastructure. In reality, workloads like Elastic, Hadoop, Kafka and Splunk don’t need these capabilities as the applications themselves have that functionality built in.

Finally, an enterprise shared storage system can’t overcome the final challenge of automatically connecting to the network or orchestrating the attachment of compute to storage resources. Organizations with data intensive applications needs a more holistic approach to disaggregating resources and then bringing them back together in a dynamic fashion.

The process, referred to as composable infrastructure, enables organizations to dynamically define a group of servers and a group of drives and allocate them to a specific task for an indefinite period of time. Composable infrastructure also enables organizations to disassemble these configurations just as quickly and reallocate data to new compute resources. Disaggregation of compute from storage, combined with the ability to quickly compose infrastructure, enables organizations to significantly improve resource utilization, especially utilization of servers. The result is the elasticity and adaptability of the cloud on premises at a significantly lower cost, performance equivalent to direct-attach and easier migration to new server and storage technologies.

In the next blog, Storage Switzerland covers in more detail why a fast, efficient networking infrastructure like NVMe-oF is only the beginning. We’ll detail what composable infrastructure is, how it is architected and why it is ideal for data intensive workloads.

In the meantime, register for our live 15 minute webinar “Composing Infrastructure for Elastic, Hadoop, Kafka and Splunk” on May 29th at 4:00 pm ET / 1:00 pm PT. Pre-register and receive a copy of Storage Switzerland’s latest eBook “Is NVMe-oF Enough to Fix the Hyperscale Problem?”

Sign up for our Newsletter. Get updates on our latest articles and webinars, plus EXCLUSIVE subscriber only content.

Twelve years ago George Crump founded Storage Switzerland with one simple goal; to educate IT professionals about all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought after public speaker. With over 25 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS and SAN, Virtualization, Cloud and Enterprise Flash. Prior to founding Storage Switzerland he was CTO at one of the nation's largest storage integrators where he was in charge of technology testing, integration and product selection.

Tagged with: , , , , , , , ,
Posted in Blog
One comment on “The Problems with Hyperscale Storage
  1. […] this one by George Crump on hyperscale storage is also worth a […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22,097 other followers

Blog Stats
  • 1,465,745 views
%d bloggers like this: