Data continues to grow and the number of storage systems required to support this growth is growing right along with it. Nowhere is the growth of data worse than in secondary storage, in fact some studies indicate secondary data set now accounts for as much as 60% of total data center capacity. This reality, plus the additional use cases for secondary data, has lead to secondary storage system sprawl. Today in the average data center there are storage systems dedicated to backup storage, archive storage, replicated data and increasingly analytics data. Cohesity recently announced their Cohesity Data Platform to help eliminate the sprawl of secondary storage systems.
It is not just the storage hardware that has sprawled out of control, the software that drives data to these secondary storage devices is also out of control. And, of course, all of these siloed pieces of software and hardware have to be managed separately. Secondary storage is expensive and complex. The data center needs a single hardware platform that can scale to not only meet the classic demands of secondary storage by processes like backups, replication and archive but also the new demands caused by processes like big data analytics.
The Cohesity Data Platform
Cohesity’s Data Platform is set to address the problem of secondary data sprawl by delivering a storage platform built with a distributed systems foundation. The scale out storage system is sold in 2U blocks. Each block has four nodes with a total of 96TBs of HDD, 6.2TB of Flash and 8 10GbE ports. The system can scale a block at a time or as the data center requires. Cohesity’s data services includes cloning/snapshots and global deduplication. The platform can also leverage the cloud as an archive or burst storage platform.
High Performance, Highly Available Secondary Storage
Cohesity is one of the few secondary storage systems that provides complete fault tolerance and non-disruptive operations. The solution is even more unique in that it provides auto tiering between the HDDs and SSDs to support random I/O operations. Few secondary storage platforms can leverage SSDs at all but it is critical that they do support random I/O now that they are being asked to stand-in for primary storage. For example, many data protection solutions can provision backed up virtual machine data stores as live volumes directly from the secondary storage platform. Reliability is also becoming more important because solutions like copy data management software and archive software are limiting the growth of copy data by reducing the number of data copies. Less copies of data will reduce secondary storage requirements, but it will increase the importance of secondary storage system reliability.
Supporting all of these different secondary storage use cases also means supporting multiple interfaces to the storage system. Cohesity supports distributed NFS, HDFS (Hadoop), CIFS, SMB and iSCSI. These protocols are not emulated, they are supported through the use of software adapters.
Cohesity’s consolidation also includes software. Their snapshot/clone functionality features enhanced meta-data management. This allows the solution to support an almost unlimited number of snapshots with no performance impact, enabling truly continuous data protection. Many storage systems claim high numbers of snapshots, but because of poor meta-data management these systems can’t come close to their claims without a significant impact to performance. Organizations can set the right policy for them, instead of shaping their policies around other product’ limitations that may only enable snapshots a couple times a day.
In addition to the snapshot capabilities, Cohesity will also provide data protection capabilities. In the first release they will be able to protect the VMware environment through integration with VADP. Over time they will add support for databases and Hyper-V. The protection software will run on the Cohesity cluster, no additional hardware appliances will be needed.
Once data is consolidated onto the Cohesity platform, in-place analytics and programmability kick in. Along with global indexing with instant search, real-time reports on storage utilization are provided, as well as the ability for customers to inject custom queries tailored to their specific data workflows, eg searching for sensitive data. In-place analytics are supported via HDFS support and built-in MapReduce, making more efficient use of the large repository of data housed in secondary storage.
Secondary storage is a mess already, and it is being asked to do more than ever. Data protection and recovery windows are narrowing and the addition of processes like test/dev or big data analytics may be the straw that breaks the back of most secondary storage infrastructures. Cohesity can provide immediate assistance by consolidating secondary storage hardware and then potentially change the game all together by integrating data protection software.