Metadata is an often overlooked but leading storage performance bottleneck. Metadata maps how files are stored on disks, and it summarizes characteristics of file attributes such as author and last date of access. As such, it is crucial to data access and management. For example, one of the most common metadata operations is garbage collection, which enables memory space that is occupied but no longer needed by the application to be reclaimed automatically. Every input/output (I/O) operation that the storage drive handles, executes many background metadata operations. As a result, metadata operations significantly impact the storage system’s ability to achieve the levels of input/output operations per second (IOPS) performance of which its drives are capable.
Metadata operations especially impact performance when it comes to solid-state drives (SSDs) as opposed to hard disk drives (HDDs), because SSD latency levels are so low. This is also true because, by and large, most SSDs employ NOT-AND (NAND) cards, which require high levels of garbage collection. Whereas HDDs can overwrite data blocks as those data blocks need to be updated, NAND SSDs cannot. As a result, data must be written to a new block, and then the old data must be deleted through a garbage collection process. Additionally, the application must be routed from the data block’s logical address (its original location) to its new physical location. Furthermore, metadata tends to turn over more frequently on SSDs because enterprises commonly employ deduplication technologies, which add additional metadata such as change log files, to optimize capacity utilization.
To optimize their investments as they migrate to SSDs, storage professionals should be careful to choose a drive architecture that was designed to minimize the impact of metadata operations on performance. Gunna Marripudi, a product manager with Western Digital, recently joined George Crump, Storage Switzerland’s founder and lead analyst, for a Lightboard video discussion of how his company has architected its IntelliFlash SSD-based arrays to alleviate metadata bottlenecks.
The IntelliFlash solution separates metadata processing from processing of the data itself. A component of the SSD is designated specifically to aggregating and sorting metadata, which is then tiered and cached based on its access levels. As a result, users can quickly access the files that they need. At the same time, more advanced data services are accelerated, because they are being run on a section of the SSD that is not also processing metadata.