Data growth requires a response. For some organizations, that response is upgrading to a more scalable higher performing NAS or an on-premises object storage. For others it is moving some data sets to public cloud storage.
Historically, these migrations are a manual copy or move of data from the origin to the new destination. The problem is these manual movements of data are time consuming and have a high chance of error. There is a good chance that not all the data is moved, the incorrect data is moved or when the data is moved, links back to applications needing that data are not updated.
Recently, data management solutions have emerged to automate this data movement process, making both the movement and access to this data more seamless. Data management solutions have three primary objectives: To identify and organize data, to move data to the most appropriate storage system and to provide access back to that data.
Providing access back to the archive data is what causes the biggest challenges. The vendor solutions to create a transparent response to the recall request make the data management solution appear more like data jail. Once an organization commits to one of these solutions they are locked into that solution for the foreseeable future. Also, these solutions tend to take a lowest common denominator approach sacrificing each storage platform’s unique features to homogeneity.
The Battle of Transparent Recall
In most cases, to facilitate transparent recall, these solutions leave stub files in the original location that links to where the data has been moved. Alternatively, vendors create a centralized metadata controller that routes users to the location of the file. While this solution eliminates the need for stub files, it does create the potential for a centralized bottleneck which impacts performance and scalability. These solutions also typically create a unique file system on the storage system that replaces the capabilities of the underlying storage system.
Both situations create a total dependence on the data management solution; neither stub files or metadata is universal across vendor solutions. While the vendor may suggest the user could manually path to the file, it certainly removes the need for the solution in the first place.
Great angst is spent trying to create a transparent recall environment. By implementing a data management with transparent recall solution, the organization is potentially putting its data in jail. IT needs to decide if their organizations really needs that level of convenience. It is important to remember most data movements are to either archive a dormant set of data to a more cost effective storage area or to migrate active data to newer storage system and in some cases both.
IT Should Still Be in Control of Data Management
Clearly, because of the rapid growth of data IT can’t afford to manually move data nor can it take the risks associated with that movement. But that does not mean it needs to give up total control.
The reality is, in both the archive and the upgrade use case, data movement is a one time occurrence. And in both cases IT is well aware of the activity. IT needs a solution that will help identify files based on a given criteria, move those files and create a way to find those files if a user requests them.
Archiving with Freedom
The main attraction to archiving data is to save the organization money. The goal is to move in-active data from primary storage to a secondary archive that frees up capacity on the primary store. The result should be less frequent upgrades of primary storage, at least for upgrades for reasons of capacity.
One of the realities archive vendors tend to miss when discussing ROI, is the primary storage is already bought and paid for. Archiving all of it to have an empty primary storage system doesn’t make any sense. Instead, IT really only needs to free up enough capacity on the primary store so they don’t need to buy more storage.
Instead of archiving the entire 90% of data that hasn’t been accessed in the last 90 days, just archive enough to meet the current capacity demand. This means the organization can archive the oldest 10% of data, which probably hasn’t been accessed in years. Archiving data this old also means the chances of that data ever being recalled decreases considerably, which minimizes the need for a stub file or metadata management structure.
The value of archiving data without the need for a stub file or metadata controller is data is now stored in the native format of the target file system, free from any data management construct. It also means it can fully exploit the capabilities of the archive target, which if that target is an object store means the organization can take advantage of advanced metadata tagging. If the target is the public cloud, it can leverage cloud compute to run processes against that data.
To accomplish this type of archive, IT needs to be able to analyze the current data set, have that data set identified by various parameters (like oldest 10%) and then be able to give a command to move that data based on those parameters. The solution should also help the IT find and give access to data if it is requested for by a user.
Upgrading with Freedom
The other data management use case, upgrading to a new storage system, also benefits when IT is in more control of the data management process. The upgrade use case is also another excellent opportunity to identify data and to only move certain data types. For example, if the organization decides to upgrade to an all-flash array, it doesn’t make sense for the organization to move data that hasn’t been accessed in years to that upgraded array. Idle data performs the same on a hard disk-based storage system as it does on a flash array.
Instead, IT should identify the active and near-active data sets on the current storage system and then move just that data to the new system. The old data could stay in-place on the current NAS if the organization intends on keeping it or it could be archived to less expensive storage either in the data center or in the cloud. Again, these two data sets are moved once. The need to move them again is rare until the new primary storage system fills up. Once that situation occurs, IT either begins the archive process above or starts another round of upgrade migrations.
Conclusion
The amount of recalls that come from an archive is largely dependent on how aggressively the organization moves data to the archive. If data is moved after a few months of inactivity then the chance for a recall is quite high, and these organizations might consider a transparent option. But most organizations have data that is years old on their storage systems. If they start by migrating the oldest 10% of data, they can forestall new hardware upgrade but not have to deal with the potential complexities and vendor lock in of transparent recall.
With this type of data management solution in place, upgrades are really a form of archive. Active and near active data are moved to the new system and old data is either left in place (the old system becomes the archive) or old data is moved specifically to the a cost effective storage tier.
Sponsored by Data Dynamics