Disaggregated Hadoop Clusters – DriveScale Briefing Note

Posted on July 15, 2016 by George Crump

Web-scale applications are designed to run on dozens, if not thousands, of small commodity servers, which expect direct-attached storage. As a result storage performance and capacity is directly tied to the purchase of more compute (servers). Over time almost every scale-out cluster ends up with either too much capacity or too much compute, wasting data center floor space, power and budget. DriveScale is a new solution whose goal is to bring storage flexibility back to the scale-out data center.

A Virtualized Scale-Out Architecture

In the DriveScale design, the IT planners continue to buy the server nodes they choose but with more of a focus on compute capabilities. That focus should lead to smaller servers with plenty of CPU capabilities but limited storage media bays. The result is a pool of compute resources.

Then, the scale-out customer buys SAS attached JBODS (just a bunch of disks), which are the simplest form of an array. It is essentially a shelf of disks with no real intelligence other than its SAS connectivity. The JBODs form a storage pool.

In the middle of this is the DriveScale hardware, a 1U appliance that essentially turn the SAS JBODS into Ethernet-attached devices. The next step is for the DriveScale software to compose logical nodes. A logical node defines how many CPUs and how many disks are necessary for that node to operate. For example, the product can create a logical node by assigning two CPUs from a single 4 CPU server in the compute pool and 12 drives from two of the JBODs with in the storage pool.

The result is scale-out virtualization, which similar to storage virtualization, virtualizes the scale-out architectures resources (compute and storage) so that they can be pooled and allocated to exact specifications. Most importantly both pools can scale independently.

Ending Silos of Clusters

DriveScale is not limited to a single cluster. It can span to handle multiple cluster types. It can create a virtual Hadoop cluster alongside a virtual Splunk cluster, for example, each sharing resources from the same pools of storage and compute. If a big Hadoop job comes in, the system can de-allocate compute from the other virtual cluster and then allocate it to the Hadoop cluster.

StorageSwiss Take

Scale-out architectures specifically, and their designs in general, look great on the IT whiteboard. Need more resources? Add another node. And for your first dozen or so nodes, the architecture works well. But, as they scale, they become inefficient, wasting storage or compute resources and consuming precious data center floor space. DriveScale fundamentally solves these problems by virtualizing the cluster itself, allows resources to be granular – allocating them as necessary to insure smooth operations.

About DriveScale

DriveScale is leading the charge in bringing hyperscale computing capabilities to mainstream enterprises. Its composable data center architecture transforms rigid data centers into flexible and responsive scale-out deployments. Using DriveScale, data center administrators can deploy independent pools of commodity compute and storage resources, automatically discover available assets, and combine and recombine these resources as needed. The solution is provided via a set of on-premises and SaaS tools that coordinate between multiple levels of infrastructure. With DriveScale, companies can more easily support Hadoop deployments of any size as well as other modern application workloads. DriveScale is founded by a team with deep roots in IT architecture and that has built enterprise-class systems such as Cisco UCS and Sun UltraSparc. Based in Sunnyvale, California, the company was founded in 2013. Investors include Pelion Venture Partners, Nautilus Venture Partners and Ingrasys, a wholly owned subsidiary of Foxconn. For more information, visit www.drivescale.com or follow them on Twitter at @DriveScale_Inc.

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Capacity, DriveScale, JBODS, performance, Scale-Out, Virtualization
Posted in Briefing Note

One comment on “Disaggregated Hadoop Clusters – DriveScale Briefing Note”

Disaggregated Hadoop Clusters – DriveScale Briefing Note says:

February 2, 2017 at 7:28 pm

[…] Web-scale applications are designed to run on dozens, if not thousands, of small commodity servers, which expect direct-attached storage. As a result storage performance and capacity is directly tied to the purchase of more compute (servers). Over time almost every scale-out cluster ends up with either too much capacity or too much compute, wasting data center floor space, power and budget. Read more… […]

Comments are closed.