Data is growing and most of that growth is unstructured data (machine data, files, images, etc …). That’s not surprising to any IT professional, but what is unnerving is the rate of that growth and the fact that the growth is going to get much worse. An increasing amount of this unstructured data is critical to the organization, which it needs to manage for legal or regulatory reasons or to keep it so it can monetize the data at a later date. The problem is most data protection solutions have not evolved to better manage the retention of unstructured data and most data management solutions have never really focused on data protection.
What’s Missing From Unstructured Data Protection?
Organizations have a fundamental problem when it comes to protecting unstructured data, conquering the challenge of protecting what, for most cases, is millions of discrete files. There are two approaches and each have their downsides.
The first approach is the somewhat legacy method of walking the file system, finding the files that have changed and then backing those files up. With this method the files are discrete entities and are not somehow linked to some other backup. The problem is, when there are millions and millions of files, the file system walk by itself will take a very long time, potentially hours before backup can begin. In fact, many organizations find it is faster to perform a brute force full backup rather than wait for the file system to be traversed.
The second approach is the more modern concept of image-based backups. This backup technique does not track the individual files, instead it is looking for blocks of data that have changed. Those changed blocks are then replicated to backup storage. The problem with this approach is that access to specific files requires the “mounting” of the backup volume and pulling the required files from it. Also, that image is typically proprietary and not searchable. There is also a limit to the number of iterations of backups that can be done prior to the original image being refreshed.
In both cases the organization is faced with a capacity and retention issue that is going to get a lot worse as unstructured data continues to grow. First, of course, they need a production storage system that can store all this data. Second, they need a data protection storage device that not only stores all of this data, but also all of the versions of the data as it changes. The secondary storage market could be a huge expenditure for the organization as unstructured data continues to grow it becomes the “gift that keeps on taking.”
What Organizations Need
To manage and protect their unstructured storage growth, organizations need a solution that can protect files as the discrete entities, but still perform that protection rapidly. Organizations also need the ability to prune the on-premises store so their data centers do not get overwhelmed by secondary storage hardware. Organizations also need the ability to search and find the data being protected and retained.
Aparavi is a software-based data protection solution designed specifically to protect the rapidly growing unstructured data set. It does this by installing light-weight client software on systems in the environment that need protecting. Typically, these systems will be Windows Fileservers and Linux NFS servers. The software client monitors the servers for files being created or changed. It then performs a checkpoint as frequently as every 15 minutes on those files. A checkpoint is essentially a local (on the server) copy of that data. It serves as protection against software-based failures and cyber attacks.
The next component is the Aparavi software appliance which also installs on-premises. It receives what Aparavi calls snapshots from the protected systems. These snapshots are less frequent than the checkpoints and serve as protection against a local hardware failure.
The final component is what Aparavi calls the retention function. At this point Aparavi copies the changed files to a cloud storage provider like Amazon, Google, IBM BlueMix, or Microsoft Azure. Aparavi also supports local S3 object storage like those from Scality and Cloudian. The archive function eventually removes the data from the on-premises software appliance keeping the investment in on-premises protection storage as small as possible.
Aparavi provides organizations with a unique way to protect, retain, and access their unstructured data while minimizing the cost of the on-premises storage investment. The solution provides search and retention management capabilities and has an open format that is accessible from third parties. For organizations looking for a new way to manage and protect unstructured data, Aparavi provides a fresh take on an old problem that deserves a hard, new look.