Solving Hadoop Storage Challenges by Converging Storage and Compute

Posted on July 26, 2012 by Eric Slack

Cleversafe’s object-based, scale-out storage architecture leverages a unique information dispersal algorithm to distribute data objects across multiple storage nodes (called Slicestor ** nodes). These nodes can be located in a single data center or spread around the world. Using a proprietary erasure coding process and tier-one data encryption Cleversafe can create a secure, redundant infrastructure that scales without limits. It’s ability to scale efficiently (without creating multiple copies) makes it ideal for cloud providers and cloud-based services with very large potential capacity requirements, such as photo storage websites (Shutterfly is a Cleversafe client).

In its latest product release Cleversafe is adding an embedded compute capability to their dispersed storage architecture enabling users to run Hadoop MapReduce with CPU and memory resources available on each node. By distributing the compute function in the same way it distributes data sets, Cleversafe’s converged architecture can deliver significant benefits to Big Data environments, such as improved scalability, lower storage costs, reduced network congestion and increased reliability.

As an example, the Hadoop Distributed File System (HDFS) uses a single server to manage file system metadata and creates three copies of data objects to provide redundant data protection. Cleversafe’s data dispersal architecture provides data reliability without generating redundant copies of data objects, keeping storage consumption to a minimum and enabling larger effective capacity. It also eliminates a single point of failure in the lone metadata server.

Cleversafe’s object-based architecture eliminates the metadata server completely by distributing metadata with data objects, spreading them across nodes in the cluster. This removes a potential limitation on metadata processing and allows the cluster to scale larger. It also improves performance as growing data sets are distributed to multiple nodes instead of being potentially bottlenecked behind a single metadata server.

Replacing HDFS, Cleversafe’s dsNet system is accessed through an API, making this transition transparent to MapReduce. In order for distributed data to be usable by MapReduce, the SliceStream protocol must create ‘computationally operable’ data chunks in a process similar to the way it parses data objects for dispersal. Then these data are stored in logical ‘vaults’ within the Cleversafe cluster, allowing users to store other data sets not destined for Hadoop processing as well.

Traditional Big Data architectures separate the storage and compute processes generating the requirement to move large quantities of data in order to facilitate complex analyses. Cleversafe’s dsNet with MapReduce can cut some of this inefficiency by distributing the compute process and locating it with the data storage. This greatly reduces the amount of data handling, improves overall scalability and reduces performance bottlenecks. Cleversafe plans to make this product generally available in Q4 2012.

Storage Swiss Take

Hadoop has been described as “bringing the compute process to the data”, unfortunately, traditional storage infrastructures which consolidate data don’t support this very well. Distributed data storage is the right first step, but why not just combine compute operations inside of storage nodes themselves? This is what Cleversafe has done and in the process claims to have reduced cost, improved scalability and maintained performance.

** “Slicestor” is a trademark of Cleversafe, Inc.

Cleversafe is not a client of Storage Switzerland

About Eric Slack

Eric is an Analyst with Storage Switzerland and has over 25 years experience in high-technology industries. He’s held technical, management and marketing positions in the computer storage, instrumentation, digital imaging and test equipment fields. He has spent the past 15 years in the data storage field, with storage hardware manufacturers and as a national storage integrator, designing and implementing open systems storage solutions for companies in the Western United States. Eric earned degrees in electrical/computer engineering from the University of Colorado and marketing from California State University, Humboldt. He and his wife live in Colorado and have twins in college.

Tagged with: Apache Hadoop, Big data, Cleversafe, Data center, Hadoop, Hardware, HDFS, MapReduce, Object Storage
Posted in Briefing Note

Solving Hadoop Storage Challenges by Converging Storage and Compute

Storage Swiss Take

Share this:

Related