It has been long acknowledged in the storage industry that two-thirds or more of data that sits on primary storage arrays is inactive. Storage managers hesitate to archive more data, due to the concern that an urgent request for the data will come in after it is tiered to a slower-performing media. This creates an expensive burden that is only becoming exacerbated as data growth and access and retention requirements explode.
In an effort to better balance storage capacity and performance requirements, many IT shops are embracing hybrid cloud architectures. However, most overlook a component that is important to optimizing this equation – metadata.
Metadata provides important descriptive information about data, such as reference, statistical and structural attributes. For instance, metadata will be queried to determine the last time that a file was opened or closed, and by whom. To accelerate storage performance while enabling more inactive data to be tiered to low-cost storage, storage planners should consider abstracting and managing metadata centrally and externally to the primary storage array.
Metadata is fundamental to accessing, updating and managing data. As a result, it accounts for anywhere from 70% to 90% of the data requests being served by a typical network attached storage (NAS) primary array; this substantially impacts I/O performance, especially considering that storage arrays are optimized to move data in an out of the array, as opposed to being optimized to respond to queries about the state of the data. The additional latency inherent in public cloud services compounds the metadata bottleneck in a hybrid cloud model, especially considering that the entire file typically needs to be retrieved to service the metadata request.
Keeping an up-to-date copy of all metadata on a dedicated appliance that routes all metadata access requests can greatly accelerate storage performance. In addition to reducing the I/O load from the primary storage array, obtaining a consolidated and current mapping of all metadata enables data stores to be presented as local and instantly accessible, regardless of their physical location. As a result, an external metadata controller also enables storage managers to offload a larger portion of their inactive data to less expensive storage resources, including public cloud-based archive storage services such as Amazon S3 and on-premises object storage deployments.
Additional insight into creating a smarter metadata strategy to overcome the “80% inactive” storage pain point and to accelerate performance can be found on demand in Storage Switzerland’s webinar in collaboration with Infinite io, The Hybrid Cloud Data Gravity Problem and How to Fix It.