Hybrid Cloud Storage is ideal for most organizations seeking to reduce the cost of their storage infrastructure and to leverage computing resources in the cloud. The problem in fully enabling a hybrid cloud storage strategy is the latency between on-premises and the cloud. The latency problem is most apparent when dealing with metadata operations, which account for up to 90% of all IO traffic. Poor metadata performance is what usually causes users to complain about poor performance. These are the operations that make them wait. Accelerating metadata performance is critical to gaining organizational acceptance of a hybrid cloud storage strategy.
The metadata performance problem is not only a problem for public cloud storage. On-premises object storage systems and even high-performance network attached storage (NAS) systems are also suspect, depending on their utilization and workloads they are supporting.
Solving the metadata problem is a known issue, and it is one that vendors of all types are trying to resolve. Chapter Two covered the majority of approaches to work around the metadata problem as well as their shortcomings. The problem is most of these approaches attempted to address metadata as an afterthought or through a brute force method. Instead, vendors need to take a metadata-first approach to solve the problem by creating a metadata controller that manages and accelerates metadata regardless of which device stores it.
Designing a Metadata Controller
A metadata controller should abstract data from the actual data and serve it directly from the network. Separating the metadata from the data enables the data to exist anywhere within the storage infrastructure, on-premises NAS, on-premises object storage or in the cloud without requiring the users to know its location. They are accessing the metadata via the controller, which routes the user to the actual storage location. Serving metadata from the network means that all metadata queries (again up to 90% of all IO) are turned around in the network without requiring the requesting user to wait for the storage target to respond, which is especially important for data stored in the cloud.
The first step then is to create a method for the metadata controller to capture the metadata for storage on the controller. There are two essential aspects to metadata capture. The first is the initial load, which means the metadata controller crawls the file systems similar to how an archive solution might work. The second aspect, metadata updates, is what makes the metadata controller different from traditional data management solutions like an archive or tiering appliance. Instead of repeatedly crawling the file system, the metadata controller uses deep packet inspection of accessed and newly created data. The result is the metadata controller has a continually updated copy of metadata without having to repeatedly crawl the organization’s network.
With a metadata controller in-place, all metadata access is in the network. Metadata IO improves not only when querying hybrid cloud storage but also when querying on-premises storage. Access to data appears local for users even if the data is in the cloud. Cloud connections are improved because the metadata controller only uses bandwidth when it makes sense, like when the metadata IO subsequently pulls data from the cloud for file reads.
The Metadata Controller Use Cases
The metadata controller has two primary use cases. The first is to enable organizations to leverage cloud storage as a repository for older (cold) data that requires retention. Policies can be set to automatically and transparently move data from on-premises file systems to low-cost cloud storage without all the negatives typically associated with a data management solution like regular file system crawls, replacing existing files systems with an overlay file system, and stub file or symbolic link vulnerability.
The second use case is to accelerate metadata performance regardless of storage location. Many NAS systems in high-performance use cases, All-Flash NAS for example, are bottlenecked by metadata IO. The metadata controller intercepts this IO, and the NAS doesn’t have to deal with it, allowing it to focus on delivering data and improving the performance of the existing All-Flash NAS. Offloading metadata will also make low-end or midrange NAS perform as fast as the All-Flash NAS system to extend the life of the asset.
In the final column of this series, we’ll introduce you to a vendor that is taking the metadata controller approach, InfiniteIO. In the meantime, listen to our webinar that details the challenges of creating a hybrid cloud storage strategy and provides practical solutions for overcoming them.
By attending this webinar, you’ll gain:
- A clear understanding of why data gravity and the lack of metadata management break hybrid cloud
- Why most cloud solutions do not provide predictable performance in a hybrid cloud due to their lack of metadata awareness
- How to finally fix the latency and operational problems and create the ideal hybrid cloud infrastructure
Listen to the Storage Switzerland and InfiniteIO webinar, “The Hybrid Cloud Data Gravity Problem and How to Fix it.”
Register Now for a Free eBook:
All registrants to the webinar also receive a free copy of Storage Switzerland’s latest eBook “Why Data Gravity is Breaking Hybrid Cloud Storage.” This ten-page eBook explains the Hybrid Cloud metadata problem and provides advice on how to fix it. Register now to receive the eBook and for notification when the webinar is available for on demand playback.