Heterogeneous Storage Analytics Difficult to Gather

Posted on May 24, 2016 by wcurtispreston

Knowledge is power. But knowledge in a heterogeneous storage environment can be difficult to come by. Modern data centers often have a mix of software-defined storage, traditional proprietary storage, all-flash or hybrid storage arrays, and even cloud storage – all connected via some kind of storage network. The first challenge this environment brings is simply managing and collecting statistics from these disparate devices. The second challenge is being able to gather valuable information from the data once you have it.

Is it possible to migrate a stale workload from an all-flash array to a less expensive array, or vice versa? If both of these things were possible, you could obtain tremendous business value, significantly reduce storage costs by removing wasted resources, and increase performance of underperforming applications by moving them to the correct storage resource.

The ultimate goal is knowing what the actual capacity and performance needs are of each application and to seamlessly migrate those workloads to the most appropriate storage device. Unfortunately, for many businesses, that process or practice is unattainable.

The first reason is because most environments run a variety of storage systems. An environment may be running a few traditional monolithic arrays without flash, a few with a significant amount of flash, and some newer all-flash arrays. They may also experiment with or even migrate to modular software-defined storage or hyperconverged systems from various manufacturers. Finally, they may be using multiple storage protocols including Fibre Channel, iSCSI, FCoE, NFS, SMB, or perhaps a cloud-based storage service, such as Amazon’s S3.

Since it is likely that these various storage systems will be from different companies, there is also almost no possibility that there will be a single tool to communicate across that environment or gather statistics from them. This means customers wishing to get capacity utilization and performance statistics from each system will probably need to install and learn many different reporting tools, each of which giving varying levels of detail in many different formats.

The next challenge is aggregating all of the data being collected. As mentioned previously, each tool will likely output data in a different format, such as CSV, HTML or XML, or even json format. In addition, there may be cases where a statistics gathering tool may output their data into a database that customers must search against to get the data they want.

Once the various formats are worked out and all of the data is collected, the data will need to be normalized. Different tools may report things differently. Some storage arrays may report only on IOPs, where others may focus on throughput. Some may report statistics by disk drive or SSD, others by volume, object, array, or by port. Some may segregate data by storage type (e.g. HDD or SSD), others may not.

Once you perform the arduous task of working out all of these differences, the typical process moving forward is the manual collection, transformation, and dissemination of this information. Unfortunately, most environments will not have an automated process for this step. Even if they did, the information would be outdated as soon as a new version of any component was installed.

Whether the process is automated or manual, the next challenge is the most difficult – doing anything with the data. Migrating data between storage systems is no easy process; this is the reason professional services exist. Even if one is able to successfully migrate a given workload from one storage system to another, switching the application to the new storage system will likely require downtime. This usually means that even if storage administrators can handle the migration in the background, the downtime required for moving the application postpones the migration project to a date that often never comes.

One solution to these various challenges would be to virtualize all storage resources behind a system that sees everything. The first problem this solves is the collection of the data. Since the virtualization system sees all I/O operations, one no longer needs to rely on the underlying storage systems for reporting. Instead, storage administrators can look at the virtualization systems reports and see everything in one place. Viewing data across the environment requires no normalization, since it is all coming from one place.

In addition, because everything is behind a virtualization system, moving data between devices becomes relatively easy. The reporting system can be used to identify under- or over-performing applications, and the virtualization system can be used to move them to the appropriate storage system. The administrator can even program the system to do this automatically.

The business value of automated collection of storage I/O information and acting on it is tremendous. Companies can be reassured they have proper utilization of their storage resources. It will move Applications needing less performance out to older or slower storage systems, and move those needing more performance to newer flash based systems. Applications can be configured to use both faster and slower systems at the same time, to constantly and automatically move the “hot” data within that application to the faster tier of storage, leaving the “colder” data on the slower storage system.

This allows companies to fully leverage slower systems that otherwise might go unused for fear of under-serving an application. It also can give businesses the power of knowledge of which systems need faster storage, allowing them to purchase just enough of the more expensive storage to make those applications happy.

StorageSwiss Take

While it is possible to collect data from heterogeneous storage systems, doing so without massive amounts of internal coding is highly unlikely. Even if you collect such data, it is quite difficult to act on the information in a timely manner due to downtime. Virtualizing the storage, on the other hand, significantly increases the ease of analyzing the performance of the environment, as well as automating any actions one might want to take based on that information.

Sponsored by FalconStor

About wcurtispreston

W. Curtis Preston (aka Mr. Backup) is an expert in backup & recovery systems; a space he has been working in since 1993. He has written three books on the subject, Backup & Recovery, Using SANs and NAS, and Unix Backup & Recovery. Mr. Preston is a writer and has spoken at hundreds of seminars and conferences around the world. Preston’s mission is to arm today’s IT managers with truly unbiased information about today’s storage industry and its products.

Tagged with: All-Flash, Cloud, FalconStor, Flash, HDD, Hybrid, Hyperconverged, SSD, Virtualization
Posted in Article

Heterogeneous Storage Analytics Difficult to Gather

StorageSwiss Take

Share this:

Related