Metadata is data about data. In the case of files, metadata includes information like the last access date and last modified date. An archive or backup solution may use metadata to track the location of the file, or it may use a stub file that points to the location of the file. Where the archive or backup solution stores the metadata, impacts the user experience of the solution, like its responsiveness and the total cost of ownership.
Why Cloud?
Organizations want to store information on cloud storage to reduce the amount of on-premises infrastructure they have to support. The cloud offers a “pay as you go” model, free of migration, and storage refreshes. Those capabilities appeal especially to the backup and archive use cases because their storage capacities can be five to ten times the size of production storage. The high storage needs of these use cases also lead to a frequent need to refresh storage and migrate data.
Cloud storage, though, is not without its challenges. When IT is asked to restore data from either a backup or archive, a metadata query needs to take place. At a minimum, the administrator is searching for a file name. More sophisticated solutions may allow searching for a specific version or content within files. If the solution stores part of the metadata in the cloud, there is latency involved in completing the search. The more sophisticated the search, the more likely it is for cloud latency to impact the recovery request.
The cloud’s latency also impacts the recovery performance of the actual data. As a result, many solutions use a hybrid cloud model. These solutions store the most active data on-premises and older data in the cloud. This solution creates another metadata problem. If the backup or archive solution stores metadata with the data, then a query needs to span both on-premises and public cloud storage. The split search degrades search performance, even if the required data is already on-premises. Another challenge is that these queries incur egress fees, which raises the cost of public cloud storage and makes it less appealing.
Cloud backup and archive solutions need to separate metadata from the actual data so they can keep a copy of all the metadata, both on-premises and in the cloud. Keeping all the metadata on-premises means that queries to data are answered instantly regardless of the location of the actual data. It encourages using both on-premises and cloud storage since there is no penalty for querying cloud-based data. It also eliminates egress fees since no data needs to leave the cloud until necessary.
Another advantage of separating metadata from the actual data is that the organization can afford to place metadata on a higher-performing storage tier like a flash drive, which further accelerates query response times. The improved performance also enables the vendor to improve the richness of the metadata by including context search and data classification in it.
Using cloud storage to store backup and archive data can reduce infrastructure costs, but organizations can’t risk lower or inconsistent performance in the process. Separating metadata and keeping a copy of it both on-premises and in the cloud enables the organization to not only experience faster response to queries, but it also allows the vendors to offer higher quality metadata that further adds value.
Rich metadata and locating the metadata information in the right location is especially important when an organization is protecting or archiving file data. In our upcoming webinar, Storage Switzerland and Aparavi will discuss how to leverage cloud storage as part of the backup process and why metadata location is so critical.
Register now for the event and get an exclusive copy of our white paper “It’s Not If Backup Software Is Using Cloud Storage, It’s How“.