One of the original promises of a shared storage system was to eliminate storage silos by consolidating all of an organization’s data assets onto a single system. But data centers have more storage systems than ever, each creating a discrete silo of storage that needs to be independently managed and monitored. Many medium to large size data centers, and most enterprises, have four or more systems from different vendors dedicated to specific workloads. Are these storage systems a necessary evil required to meet the performance, capacity and data availability expectations of the organization, or is there an alternative that can provide the benefits of consolidation without its compromises?
Why Are There More Silos than ever?
The increase in storage silos is the direct result of meeting the increased performance, capacity, data protection and availability expectations of the enterprise. It is compounded by the dramatic increase in the number of applications that the typical organization now counts as critical, as well as the rise of server and desktop virtualization.
Many environments now have specific storage systems for each business and mission-critical application in the environment. They also have specific storage systems for desktop virtualization, and most have multiple storage systems for the virtual server environment. Finally, most data centers have multiple storage systems dedicated to the data protection process, a storage silo that is seldom even factored into a consolidation conversation.
Silos Are Justified by Tiers
Applications and the data that they create can be classified into one of four tiers. Tier 0 are mission critical applications that need high performance and high reliability. Performance, or lack thereof, for these applications directly impacts company revenues, reputation or both. Even for the largest of data centers there is typically a finite number of these applications, often less than 1% of the application population. In addition, most tier 0 applications don’t usually have a high capacity demand.
Tier 1 consists of business critical applications that also need high performance and reliability, but while a temporary drop in either will affect user productivity, it will not typically have an impact on company revenues or reputation. Like tier 0, most tier 1 applications don’t have a high capacity demand. However, there are often more tier 1 applications in the environment so the aggregated capacity of this tier is significant.
Tier 2 is made up of mainstream applications and unstructured data sets like office productivity data and analytics data that will occasionally need performance, but not continuously. The overwhelming majority of applications in the environment are tier 2 and they can have a significant total capacity demand, especially when office productivity data and analytics data are factored in.
Tier 3 is archive data that has virtually no performance requirement, but has a significant capacity requirement. While it also has a lower requirement for availability the data that is stored in tier 3 must be durable and accessible.
The dramatic difference in the performance, capacity, data protection and data availability profiles of these data sets have led to a call for implementing a specific storage system per tier and then by environment. This explains why a data center might have multiple storage systems for each environment it supports. This results in even more storage silos than ever.
The Problems with Storage Silos
Storage systems provide a variety of services, ranging from snapshots and replication to automated tiering or caching. Each storage system implements these services differently and requires that the administrators be trained on each one. The result is typically an administrator per storage system. A single storage system for the environment allows one administrator to manage all storage, reducing costs significantly.
A second problem is workload distribution. With multiple storage silos, one storage system can fill up while another is relatively empty. Alternatively, one can be taxed with high demand workloads while the other is basically left idle. Investing in a single storage system allows for all the performance and capacity investment to be available to all applications at the same time.
A third problem is data protection. For applications with strict recovery time and recover point objectives, data centers may use replication to copy data to a secondary storage system in near real-time. But most vendors require that the second storage system be identical to the primary or at least of the same brand. Since these same vendors can’t provide a consolidation solution for primary storage, the organization ends up with multiple secondary systems that act as a replication target.
Silos of storage even impact less critical systems which may be protected via backup software. Most backup solutions can interface with the snapshots of storage systems so that the snapshot can be backed up instead of a production volume. But these integrations are often manual meaning it has to be created, monitored and maintained separately for each storage silo.
Even the storage hardware used to store replicated data and backup copies creates yet another storage silo to be managed. In fact, secondary storage often has multiple silos of its own including a storage system to be used as a replication target, another for backup and still another for archive.
Send in the Band-Aids
The storage industry has come up with multiple workarounds to help IT professionals keep from drowning in data when managing all of these silos. One of the most common “solutions” is an all-flash array (AFA). The claims are first that the cost of flash based storage, especially when combined with deduplication and compression, has reached parity with hard disk based storage to meet the capacity demands of tier 2 and 3 applications. The second claim is that AFAs, of course, have the performance to meet the demands of tier 0 and tier 1 applications and can provide that performance consistently compared to a hybrid array that tries to balance the use of HDDs and flash.
In order for AFAs to be an effective consolidation solution these claims must be accurate. Is flash really cheaper than hard disk drives, and is there no way for a hybrid system to deliver consistent performance?
Debunking the All-Flash Myth
For AFA vendors to substantiate their claims of price parity they are counting on flash storage prices to decrease while hard disk prices stay flat. This is simply not the case, hard disk drive capacities are on the increase thanks to Helium and Shingled Magnetic Recording (SMR) technologies. Also, thanks to these technologies, the reliability of hard drives is increasing.
All-Flash vendors also claim consistent performance, which is true since there is only flash in the storage system. But when all-flash vendors lay the “not consistent” performance label on hybrid systems they are assuming a very small flash cache area. Some hybrid array vendors can now provide very sizable flash cache areas. The larger cache area essentially eliminates cache misses and with it the unpredictable performance claim.
Requirements for True Storage Consolidation
The justification for a single storage system that supports all of an organization’s workloads remains as compelling today as it was 10 years ago. If accomplished it can save the organization money and time while increasing efficiency. But for a storage system to support all the workloads in the data center, it has to meet certain requirements.
1 – Performance AND Capacity
If the data center consolidates to a single storage system, that storage system needs to be able to support both flash (performance) and disk (capacity) storage. Flash is of course needed to meet the random I/O demands of high performance database and virtualized workloads. But disk is needed to provide the cost effective capacity needed to store the unstructured data in the environment.
2 – Scale Right
The second requirement is that the storage system be able to scale. Discussion on storage scaling often leads to a scale-out architecture, where the storage system is built from storage nodes that are clustered together. While scale-out can be an important capability, the system also needs to scale-up within each node so that the performance and capacity capabilities of each node are fully realized before additional nodes are added.
3 – Intelligent Data Placement
The third requirement is that data be intelligently placed between the flash and hard disk tiers. This can be done by caching or tiering the most active data to flash storage, while storing less active data on the hard disk tier. Additionally, the system should provide the ability to granularly control this caching so the mission critical applications can be assured of flash performance and not have to risk a cache miss.
4 – Integrated Data Protection
To truly consolidate to a single storage system will require that the storage system have the ability to protect itself. Ideally this should be enabled by leveraging snapshot replication, where data is copied to a secondary storage system in near real time, so that a failure to the production system has little impact on applications. The second system should be able to be configured with less flash capacity so that it can provide a more cost effective alternative to the primary storage system. Finally, for the ultimate in protection another copy should be replicated off-site to a third storage system for protection from disaster.
While this protection scheme does actually leverage three systems, if a single vendor can provide them then the management of these systems is identical and is far superior to the alternative of 3 or 4 different primary storage systems and 2 or 3 secondary storage systems.
5 – Analytics
The final requirement is analytics that can provide proactive monitoring of the system. The analytics should report on flash effectiveness, capacity consumption as well as protection status. It should also provide recommendations as to when to add flash or hard disk capacity to the environment. Ideally, the vendor should allow customers to opt-in to sending diagnostic information creating cloud like analytics to improve overall support capabilities.
The quest for a single storage system is a never ending challenge, especially in this era of the rapidly evolving data center. The agile data center has a wide variety of workloads. If a single storage system is going to meet the performance, capacity and data availability demands of this new data center that system will need to have to have a mixture of flash and hard disk, plus intelligent software to manage these distinctly different storage devices.
Sponsored By Nimble Storage
Nimble Storage (NYSE: NMBL) is redefining the storage market with its Adaptive Flash platform. Nimble’s flash storage solutions enable the consolidation of all workloads and eliminate storage silos by providing enterprises with significant improvements in application performance and storage capacity. At the same time Nimble delivers superior data protection, while simplifying business operations and lowering costs. At the core of the Adaptive Flash platform is the patented Cache Accelerated Sequential Layout (CASL) architecture and InfoSight, an automated cloud-based management and support system that maintains storage system peak health. More than 5,500 enterprises, governments, and service providers have deployed Nimble’s flash storage solutions across 49 countries. For more information about Nimble Storage, visit www.nimblestorage.com and follow them on Twitter: @nimblestorage.