Disaggregated Hadoop Clusters – DriveScale Briefing Note

Web-scale applications are designed to run on dozens, if not thousands, of small commodity servers, which expect direct-attached storage. As a result storage performance and capacity is directly tied to the purchase of more compute (servers). Over time almost every scale-out cluster ends up with either too much capacity or too much compute, wasting data center floor space, power and budget. DriveScale is a new solution whose goal is to bring storage flexibility back to the scale-out data center.

A Virtualized Scale-Out Architecture

In the DriveScale design, the IT planners continue to buy the server nodes they choose but with more of a focus on compute capabilities. That focus should lead to smaller servers with plenty of CPU capabilities but limited storage media bays. The result is a pool of compute resources.

Then, the scale-out customer buys SAS attached JBODS (just a bunch of disks), which are the simplest form of an array. It is essentially a shelf of disks with no real intelligence other than its SAS connectivity. The JBODs form a storage pool.

In the middle of this is the DriveScale hardware, a 1U appliance that essentially turn the SAS JBODS into Ethernet-attached devices. The next step is for the DriveScale software to compose logical nodes. A logical node defines how many CPUs and how many disks are necessary for that node to operate. For example, the product can create a logical node by assigning two CPUs from a single 4 CPU server in the compute pool and 12 drives from two of the JBODs with in the storage pool.

The result is scale-out virtualization, which similar to storage virtualization, virtualizes the scale-out architectures resources (compute and storage) so that they can be pooled and allocated to exact specifications. Most importantly both pools can scale independently.

Ending Silos of Clusters

DriveScale is not limited to a single cluster. It can span to handle multiple cluster types. It can create a virtual Hadoop cluster alongside a virtual Splunk cluster, for example, each sharing resources from the same pools of storage and compute. If a big Hadoop job comes in, the system can de-allocate compute from the other virtual cluster and then allocate it to the Hadoop cluster.

StorageSwiss Take

Scale-out architectures specifically, and their designs in general, look great on the IT whiteboard. Need more resources? Add another node. And for your first dozen or so nodes, the architecture works well. But, as they scale, they become inefficient, wasting storage or compute resources and consuming precious data center floor space. DriveScale fundamentally solves these problems by virtualizing the cluster itself, allows resources to be granular – allocating them as necessary to insure smooth operations.

About DriveScale

DriveScale is leading the charge in bringing hyperscale computing capabilities to mainstream enterprises. Its composable data center architecture transforms rigid data centers into flexible and responsive scale-out deployments. Using DriveScale, data center administrators can deploy independent pools of commodity compute and storage resources, automatically discover available assets, and combine and recombine these resources as needed. The solution is provided via a set of on-premises and SaaS tools that coordinate between multiple levels of infrastructure. With DriveScale, companies can more easily support Hadoop deployments of any size as well as other modern application workloads. DriveScale is founded by a team with deep roots in IT architecture and that has built enterprise-class systems such as Cisco UCS and Sun UltraSparc. Based in Sunnyvale, California, the company was founded in 2013. Investors include Pelion Venture Partners, Nautilus Venture Partners and Ingrasys, a wholly owned subsidiary of Foxconn. For more information, visit www.drivescale.com or follow them on Twitter at @DriveScale_Inc.

Eleven years ago George Crump founded Storage Switzerland with one simple goal; to educate IT professionals about all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought after public speaker. With over 25 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS and SAN, Virtualization, Cloud and Enterprise Flash. Prior to founding Storage Switzerland he was CTO at one of the nation's largest storage integrators where he was in charge of technology testing, integration and product selection.

Tagged with: , , , , ,
Posted in Briefing Note
One comment on “Disaggregated Hadoop Clusters – DriveScale Briefing Note
  1. […] Web-scale applications are designed to run on dozens, if not thousands, of small commodity servers, which expect direct-attached storage. As a result storage performance and capacity is directly tied to the purchase of more compute (servers). Over time almost every scale-out cluster ends up with either too much capacity or too much compute, wasting data center floor space, power and budget. Read more… […]

Comments are closed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 21,860 other followers

Blog Stats
  • 1,166,707 views
%d bloggers like this: