Almost since their inception, storage systems have been keeping log files that track everything that is going on with that storage system. Over the years the type and granularity of data that is captured has increased, and today these systems are essentially a big data generator in their own right. The problem is that most of these storage systems don’t provide any meaningful analysis of the data being captured, and the analysis that is reported on is isolated to a single system.
Imagine if rather than just providing a log file of what has happened to the storage system, these systems instead organized this data in a meaningful way so that administrators could act on this data to fine tune storage performance, capacity utilization, or protection levels. Furthermore, imagine if a single storage system’s log files where shared with all the other storage systems that the vendor has in the field, and that data was operated on in a big data analytics fashion.
Imagine no more. Companies like Nimble Storage and Cloudian are providing the capability for their customers to opt-in to a consolidation of storage system meta-data. This data is just data about data, it has no specific customer intellectual property and it is only consolidated at the trusted source. This is very similar to the opt-in that users participate in with their word processing or email clients. Reports are sent to these software developers and they use the consolidated information to create a better more stable product. Storage vendors should, and in the case of Nimble and Cloudian, are doing the same thing. Cloudian describes their implementation in their blog “Cloudian HyperStore Rings in Era of Smart Data”.
Clearly the technology to create a smarter storage system exists. Transfer of thousands of log files from hundreds of customers is relatively easy in our highly connected world. Digesting and analysis of this data is available through technologies like Hadoop, the missing ingredient for the storage vendor to do this backend work.
The Value of Smart Storage Systems
The value of this type of consolidated processing of storage analytics is multi-fold. First, storage vendors can use this information to help diagnose specific problems in one organization’s configuration by comparing it to hundreds or thousands of storage systems from other customers. In fact the more storage systems that the storage vendor delivers, the more accurate they can become in their diagnosis.
Second, storage system problems could be proactively diagnosed for performance tuning or problem avoidance. A supplier like Cloudian and their customers could benefit particularly well from this capability. Cloudian allows the deployment of their software on commodity hardware in addition to turnkey appliances. Their analytics could detect, in a commodity configuration, a high number of drive failures of a particular hard drive made by a certain manufacturer in a specific date range. They could then cross reference this drive across their customer base and warn them of the drive problem. Customers could then proactively replace these drives.
The problem could also be broader than a specific drive use case. It could be network cards, certain motherboards, CPU or RAM configurations.
Third, the analytics could help guide customers to better performance. For example the analytics could detect file size information and node configuration. This could be compared to other client’s configurations, and recommendations for optimization could be made.
Conclusion
There are three constants when it comes to data center storage; data is growing, user expectations for performance and availability are increasing, but IT staff and budgets are flat. While the move to commodity hardware makes storing all this data more affordable it does not necessarily lead to better reliability or ease the IT management burden. Analytics leverage a group knowledge capability to make sure that storage systems, vendors, and as a result storage administrators, gain the insight they need to increase reliability and be able to proactively manage storage.
The key element, logging, has been available on storage systems for decades. The next step is for storage system vendors to practice what they preach and consolidate this meta-data from their customer’s, as well as leverage big data analytics to extract actionable information that they can provide back to their customers so they can create a more reliable, high performing architecture. They should also be able to use these analytics to provide continuous improvement to their storage systems in each subsequent release.
[…] [to continue, click HERE] […]