Organizations are expected to respond in real time to the needs of their customers. “Real Time”, however, is much more than just reacting faster to queries, it means tailoring the customer’s interaction with the organization so that they are presented with personalized information and pertinent options. Increasingly, people are using their mobile devices to access information on-demand and this is creating a fundamental change in the workloads for compute and storage systems.
To achieve this real-time personalization, applications are leveraging big data techniques, complex analytics that often involve large data sets and enormous numbers of data points. As a result IT is now at the core of enabling this real-time data center and the storage system is the key component to that enablement.
When designing architectures that deliver this real-time experience, IT planners need three basic elements; compute power, memory and storage. The use of compute and memory is fairly automatic, as the application via the operating system uses these resources in an efficient manner.
Storage on the other hand is a different story. These environments are often too demanding to run exclusively on hard disk devices, but too large to run exclusively on flash. The storage system needs to offer both, with the ability to move data between these tiers. However, that data movement should be a real-time process in order to support the real-time nature of the environment.
The All-Flash Challenge
When it comes to on-premise storage there are several methods to deliver this multi-tiered experience. Vendors are providing all-flash arrays and hybrid arrays (mixed flash and hard disk drives). All-flash arrays are appealing because they eliminate the need to manage storage by data value. There is only one tier.
The goal for vendors of all-flash arrays is to reach price parity with hard disk based solutions. In fact, most all-flash vendors claim they have indeed reached that parity. But the fine print shows that these vendors are making this claim by comparing flash to “performance” drives, 15K RPM, low capacity hard drives. They’re not comparing flash to the low cost 4, 8 and 10TB hard disks that are much more commonly used, and much less expensive. And, they often include the benefits of data compression capabilities applied to flash devices, but usually not to rotating disk devices.
The reality is that most data in the data center is not being accessed and storing that data on flash based storage could be considered a waste of premium real estate. To date all-flash arrays don’t have the ability to form a complementary relationship with hard disk storage and migrate data between systems. The IT planner who wants to accomplish this is forced to go down a software-defined storage (SDS) path and build a system on their own, something that most are unwilling to do.
The First Generation Hybrid Challenge
The other option is to use hybrid arrays, systems with both flash and hard disk storage. But unlike an SDS solution hybrids deliver the flash, hard disk capacity and storage services as a single unit, a model that IT seems more comfortable with. Hybrid systems seem like the ideal method to meet the performance expectations of the real-time application environment, the capacity demands of big data that feeds that environment and the budget realities of the data center. But the first generation of these systems has several problems.
Many first-generation hybrid arrays write all data to the hard disk storage area first. This is considered the “safe” approach, since there is still a concern over flash storage durability. The challenge with this approach, however, is that all data, once on the hard disk, has to go through a qualification process, typically based on access frequency, to make sure that it’s worthy of promotion to flash. At a minimum this means that an application has to suffer at least one slow access to data before it has the potential to be moved into the flash tier. In most cases it takes multiple hard disk accesses before data is promoted.
Some hybrid storage systems don’t actually analyze this data until after normal working hours. Again, this is considered a safe approach because the movement of data between tiers consumes storage I/O, compute and bandwidth. But the impact on an application trying to deliver a customized, real-time experience is dramatic since often that data is on hard disk not flash. This would be somewhat like a newspaper using a ‘batch delivery’ scheme where every other day you get the current paper and yesterday’s paper too.
The way these hybrid systems perform data movement analytics creates their biggest drawback, a lack of predictable performance. Since data is first written to the hard disk tier and then batch processed for promotion after hours it may be on the wrong tier for as long as a day. In addition many of these systems are sold with far less flash storage than what’s considered ideal because of cost concerns. The combination of slow data promotion and too small a flash area leads to small performance gains. The only winner in the way these generation-one hybrid systems are designed are the all-flash vendors, who are seen as the only viable option when predictable performance is the concern.
These challenges, durability, data movement and cost, can all be overcome with proper storage system design. Flash storage has proven far more reliable than originally assumed, processing power to dedicate to data movement is plentiful and flash costs have come down significantly, assuming that the storage system vendor passes those cost on to the customer.
The Real-time Hybrid Answer
A real-time hybrid array is similar to a legacy hybrid array in that it leverages all-flash storage for the most active data set and high capacity hard disk drives for the least active data. But there are a couple of key differences. First, these generation-two hybrid systems offer the flash tier at a much more affordable price point than does the legacy hybrid system. This means the data center can afford to purchase more flash from the outset, lessening the impact of a cache miss.
Second, these systems either write new or changed data to flash when it hits the array, or at least have the ability to quickly promote newly modified data to flash. They don’t wait for system idle time in order to run data analysis and data movement, as a batch process.
But there is a risk to this type of real-time data handling, the potential performance impact of constantly analyzing and moving data. To overcome this concern, generation-two hybrid vendors should provide additional processing power via dedicated compute or even custom ASICs (Application-Specific-Integrated Circuits) to specific functions within the system.
The cost for this additional processing power is minimal, but its payoff can be tremendous. Data can be analyzed for activity and moved between tiers of storage continuously, so that the storage system can respond in real time to workloads that are themselves changing in real-time.
The data center is, or at least should be, a key enabler for the organization. No longer does it just store and protect a company’s data assets. Through the use of the right compute, memory and storage infrastructures, the ‘real-time data center’ can enable the organization to attract more customers, retain them longer and make their internal users more productive as well. Compute and memory are already plentiful and fairly automated in how they respond to this demand, but storage needs to strike a careful balance between ‘data guardian’ and ‘organizational enabler’. Real-time hybrid arrays are able to deliver this balance and deserve strong consideration.
Sponsored By Dot Hill
Dot Hill’s AssuredSAN arrays with RealStor are hybrid storage systems that provide the real-time data movement that data centers need today. Each controller has two processors dedicated to providing predictable performance. One is a custom ASIC that provides I/O processing, RAID calculations and data movement. The other performs the analytic functions in real time to deliver data analysis, data management and performance statistics.