Scale-out storage has a fundamental weakness, according to startup Qumulo, it can grow easily but also increases data management requirements as it does capacity. This means that as storage systems grow they get less efficient and more expensive to run, forcing administrators to focus on managing data instead of understanding that data, its value and how it’s being used. Qumulo developed a next-generation data-aware, scale-out NAS solution to address this problem of data management at scale, one that they say makes “data visible but storage invisible”.
Founded by the inventors of scale-out NAS from EMC Isilon, Qumulo leveraged extensive customer research when developing their technology. Before building a product they held over 600 meeting with users, asking them what was wrong with storage in their environments. They heard things like “capacity planning is a wild a** guess” or “we drop files into a folder and lose them forever”. Comments like this prompted the company to develop their Qumulo Core ‘data-aware’ storage platform.
Qumulo Core a software-only solution that integrates the Qumulo Scalable File System (QSFS) with an object-based, hybrid architecture. To address the data management challenges that existing storage has, QSFS captures information about data as it’s being stored, building a ‘footprint’ of the data.
To leverage this information, Qumulo built a patented database directly into the inodes of the file system that’s designed to efficiently answer queries against that data. This approach gives the system real-time visibility into which data is more valuable, what applications are using it and why it’s growing. These real-time analytics also enable the system to manage data transfer between flash and hard drives more efficiently.
Qumulo Core is designed to be run on commodity hardware (physical or virtual) or sold as the Qumulo Q0626 appliance. That appliance is available in 1U nodes, currently configured with 2 x 800GB SSDs, 4 x 6TB HDDs and 2 x 10GbE connections. Each cluster starts with four nodes ($50,000 list for the cluster) and can scale up to 1000+ nodes with a single namespace. Users access data via NFS, SMB or REST. Data is written initially to flash first, and then ‘aged out’ to disk drives. Support for replication is coming in a future release.
Qumulo is being sold into environments with large, unstructured data sets including media and entertainment, oil and gas and other use cases in the commercial high-performance computing space. These companies need scalability, performance and the ability to store data for extended periods of time. Most storage systems can’t answer all three of these demands in a single system. Qumulo believes its solution can.
The Qumulo Core architecture is scalable for all data types, large files that require lots of throughput performance and small file, transactional workloads that require lots of IOPS performance. It also addresses the problems of data availability and performance during large capacity drive rebuilds.
Faster drive rebuilds
Every drive is divided into 5GB chunks that are logically dispersed around the cluster. This enables drives to be rebuilt at the chunk level, only chunks that have been written to, and allows the rebuild process to be done at the block level, separate from the file system. With this architecture, all physical drives participate in that rebuild process and block contention with the file system is eliminated.
When a drive fails, those chunks are rebuilt and distributed to other drives in the system, not to a specific hot-spare. And, since all new writes during this rebuild process go to flash first, new write activity, when it does interrupt that rebuild, is sequential. Those disk drives are not forced to deal with the random writes that kill traditional scale-out NAS rebuild performance. According to the company, the system can rebuild a failed 6TB drive, under load, in 17 hours.
When we asked Qumulo what their objective was when designing this product one of the things they said was “to build a NetApp that could scale”. Using an object-based, scale-out architecture and a software-only format can certainly provide that scale. But making a storage system big is not enough. As Qumulo found out talking with storage users, big storage that’s too big to manage isn’t a viable solution for handling data growth.
So instead, with the Qumulo Scalable File System, they’re trying to be smarter about how they store that data. The approach is to capture and present more information in a real-time process to keep data management from overwhelming the storage user. It’s certainly early but this technology seems to get a lot of things right.