Initiatives like server virtualization, cloud infrastructure-as-a-service, and real-time analytics are allowing IT to meet today’s ever-increasing business demands. These initiatives are designed to bring agility to the data center, yet they trip and fall when they have to interact with the silo’ed storage infrastructure. Software Defined Storage (SDS) was supposed to be the answer. But for the most part, SDS is really storage software without the reliance on dedicated hardware, and is limited to specific storage containers. Most SDS solutions cannot extend a container across vendors, formats (file, block, object) or protocols. Even hyper-converged systems don’t help. They attempt to solve this problem by moving all enterprise data into a bigger and proprietary container at the server layer.
The Legacy Storage Problem
In the mid 2000s, conventional wisdom held that the storage architecture should be consolidated into a single network system that could meet the performance demands of all the enterprise use cases then in the organization. In reality the opposite has occurred. In 2015, storage is more fragmented than ever. It’s not uncommon to find separate storage systems for multiple workloads, including virtual server infrastructure, another for virtual desktops, a different silo for a clustered database and still another to store unstructured data.
As an example, flash storage in servers is used to deliver very high performance to mission critical applications without network latency. However, these systems require a shared-nothing architecture for reliability. At the opposite end of the spectrum is public cloud storage, which offers limitless capacity but at a loss of visibility and control. To leverage both of these storage “extremes” appropriately will require a mixed model where data is moved between on-premises and off-premises storage. How this data will be moved and how to manage this fragmentation of data assets is an important missing piece of the puzzle.
The reason for this ‘storage fragmentation’ is that each of these use cases asks for different things from the storage infrastructure. Some need incredibly high performance no matter the cost, others need modest performance at a reasonable cost, and still others need almost no real performance but long-term retention at the lowest possible cost. Vendors have emerged to provide unique storage systems that address these use cases; but unfortunately, they are all operationally incompatible with each other, making it impossible to the align application workloads to specific storage features.
The agile data center needs service-driven data orchestration: the ability to move discrete data objects between these storage tiers in an automated fashion. This fined grained data movement allows each of the above use cases to leverage the most appropriate application Service Level Objectives (SLOs) set by either the application owner or storage manager. SLOs allow data centers to buy just the performance and capacity they need instead of continuing to spend on costly overprovisioning that makes storage such an expensive part of the IT budget.
For example, the production virtual server environment is typically given access to high-performance storage. But not every component of a virtual server needs to be on high performance storage. Most research indicates that as much as 80% of data, even though it is considered “mission critical”, does not require high performance storage 100% of the time. The challenge is making sure this data is on high performance storage when it is truly mission critical. IT needs to have this insight so it can set policies to ensure data is on high performance storage when it counts. Agility also requires being able to move data seamlessly at a granular level, so as to not disrupt applications.
The Software-Defined Storage Problem
Storage virtualization, and later SDS, is supposed to resolve these issues by unifying storage hardware and data services. The problem is that instead of unifying data services, they replaced them. SDS either turns the current storage investment into dumb RAID arrays, or replaces it entirely with aggregated storage internal to the server infrastructure, also known as “hyper-converged” architectures. Hyper-converged architectures create a bigger storage container from smaller ones in order to tier and centralize management, but still create a proprietary storage architecture that is unaware of shared storage the enterprise has already purchased.
SDS solutions were a step in the right direction. For example, they did allow for data to be managed from a single interface. And, a few SDS solutions allowed data to be seamlessly moved between various types of storage hardware. The problem is that most SDS solutions stop at this point, falling short of what data centers really need.
Taking the Next Step With SDS
When the SDS solution allows for data to be moved between storage arrays, that movement affects the entire volume. While some file-based solutions (NAS) can support a more granular movement of data, it is often manual, requiring staff to evaluate what needs to be moved and then move the data, a process that’s both time consuming and prone to error.
The problem with this lack of granularity is that most volumes have a mixture of data sets, each with varying types of data performance needs. In order for SDS to deliver on its value proposition, it needs to provide granular and automated data movement based on application service levels. This type of data movement is required because parts of an application are best stored on high performance flash storage, while other data sets that can’t take advantage of flash performance should be stored on the least expensive tier possible.
SDS should also deliver granular, sub-volume control of data placement. In other words, data should be freed from the confines of a volume dedicated to a storage system and be allowed to move between volumes and storage systems on a regular basis.
The second step is a robust set of global data services that are equally granular. Storage services are available to any application that might need it, but only as it is needed. For example, there is little need to waste storage capacity and performance making snapshots of temp or swap files. Or the ability to move deduplication and compression to storage hardware that has more processing power than needed to perform the function.
The optimal solution needs to augment and complement existing data services, adding capabilities that don’t yet exist and spanning storage hardware.
Is it Time for Software Defined Data?
Data virtualization is the next step in the evolution of storage. Instead of replacing what is already working well – like data services and shared storage arrays – data virtualization adds new capabilities and enables them to be applied across storage hardware. Data virtualization provides data mobility and compliments existing storage, allowing data to be seamlessly distributed at a sub-volume level to multiple storage systems. This means that the part of a data set that can benefit from all-flash performance can reside on either server-side flash storage or an all-flash array and parts of a data set that would not benefit from flash performance can be stored on cost efficient hard disk storage, or even cloud storage.
The other capability that data virtualization brings to the storage infrastructure is automation. While some storage systems have the ability to move data within the same system, to different storage tiers, they all perform this movement in a reactive manner.
In these scenarios, with traditional SDS or storage virtualization solutions, data often sits on a hard drive array until it has been accessed enough to be promoted to flash-based storage. This means application performance always has to suffer for a while until that data can be analyzed, deemed flash worthy and promoted to the flash tier. The storage system is reacting to usage. As a default action this is typically acceptable but the analysis needs to happen in real-time, not as a batch process at the end of the day.
There are also many times where the need for high performance data access can be predicted; for example, when a company reaches a seasonal busy time or needs to do end-of-quarter reporting. Data virtualization allows for the scheduling and movement of this data to a faster tier of storage proactively, before a single slow access occurs.
A user-defined, policy-based control plane is a critical element of a data virtualization solution. It allows customers to focus on managing the data instead of the underlying storage. This policy-based engine allows for the automation to be overridden in specific situations, so that data can be positioned in advance of a predictable spike.
A key attribute of data virtualization is that it’s transparent to applications. There should be no downtime to migrate nor should there be extensive planning for failover and recovery. Most importantly, this all happens automatically until the user is ready to customize its behavior through policy.
The Impact of Data Virtualization
Data virtualization allows the data center to maximize its agility through automation. Data can be moved from storage platform to storage platform, at a data level, without IT intervention, based on predictable usage while still responding to unexpected peaks.
In addition data virtualization allows for best-of-breed storage hardware selection. This can be a mixture of dedicated all-flash arrays for high performance, high capacity hard disk arrays for long term storage and retention and moderately performing disk arrays for data in the middle. Each of these storage arrays could come from different storage manufacturers, allowing IT to take full advantage of the unique data services that these systems offer.
Since the first storage network was implemented over 20 years ago, data has been trapped in the storage tier on which it was first stored. Migration of that data from one platform to the other has often been a painful, time-consuming process. While some SDS solutions improve this situation they often don’t provide true data mobility. They’re not granular enough to deliver true agility and not automated enough to provide advanced data placement. Data virtualization takes storage abstraction to a new level by providing an automated way of moving data according to the business environment instead of reacting to it.
Sponsored By Primary Data
About Primary Data
IT’s problem isn’t multiple types of storage. Companies accumulate different storage platforms to support the various types of data they manage. Their problem is the inability to move data between these platforms. Primary Data was founded to address this issue through Data Virtualization. By automating the placement of data across platforms Primary Data can end the problem of stale data on the wrong storage system and the costly remediation that follows. Primary Data’s solution is currently in the development phase and the company expects to have a product available this year.