Backup and archive have always been on opposing ends of the data management spectrum. Conventional wisdom suggests that the two should never meet and cries of “backup is not archive” fill the air. The reality is, the “backup is not archive” mantra is based on the limitations of old technology and doesn’t take into consideration, the capabilities of modern protection software and hardware.
What Happens In Archive
When describing archive, it typically is defined as a process that identifies old data, classifies that data and then moves it to some low-cost storage device with terrible recall performance. The reality is that nothing can be further from the truth. First, archive seldom actually “moves” anything. The first step in an archive process is to COPY data not move it. The archive software will wait for a data set to reach certain criteria, typically not accessed for a predefined period, and then make a copy of that data to the secondary storage device. At some user defined point, the data is deleted from the original target, which frees up primary storage and makes the copy available only on secondary storage. It is expected that the archive software will provide some way to quickly find these files, by either name, or group or tag.
Except for waiting, backup software does the same thing. It copies data once per night to secondary storage. As we discussed in the last chapter, protection software designed for unstructured data will often make copies of data, at the point of creation or as it changes, to secondary storage. Legacy backup software can also find files based on name or backup job.
Legacy data protection solutions fall short when trying to also be an archive solution. The problem is archiving is an afterthought. The way the legacy solution backs up data, either via a backup job or via an image, is at odds with the way archive needs to work with discrete files. As a result, organizations that want to manage data are forced to implement a separate archive system that requires an additional and separate scan of unstructured data and often a separate storage architecture.
For an unstructured data protection solution to handle unstructured data archiving as well, it must resolve something which legacy solutions are particularly bad at, managing large backup indexes. Part of the solution to this is simply to use a indexing architecture designed for long-term scale, which most backup solutions do not. Another part is for the software to collapse intelligently, the number of copies/versions it has of data. Over time, having every single version of a file becomes unnecessary, and in most cases, the organization needs only the final copy. Deleting unnecessary prior file versions from secondary storage could possibly reduce the size of the solution’s database.
The last remaining need is for the backup solution to add a data grooming feature. Here it could have an advantage over traditional archive. It could make sure that no grooming takes place unless there is X number of copies on protection storage and Y number of copies are available off-site (or in the cloud). Since most archives have no integration into the backup solution, they are unaware of the protection status of a file.
For a data center considering this integrated strategy, data grooming is not a “must have now” feature. As long as the protection vendor has created a foundation to add it later, then it will take at least months or probably a year before the organization needs to start grooming data from production storage.
IT professionals have resisted archiving for decades, choosing instead to keep buying more and more primary storage, even though most of the data on that storage is inactive. Those same IT professionals HAVE bought data protection solutions from day one and continue to buy them. Perhaps, instead of forcing IT to buy a separate solution, it’s time to integrate archive into the protection process. Backup is, after all, the first step in any archive process. An unstructured data protection solution with the right core capabilities designed into the architecture from the beginning could serve as an excellent foundation for implementing an archive strategy that works.
Sponsored by Aparavi