In our last few columns, Storage Switzerland has described a data protection solution that allows primary storage to leverage snapshot technology to protect itself. The basic design requires two storage systems, one on-premise and one at a disaster recovery location. This is an ideal way to recover from a disaster where the latest copy or near-latest copy of data has to be made available quickly so an organization can return to operation. We designed out most of the shortcomings of this approach in those columns as well. Assuming, for a moment, that we have achieved a near-perfect data protection strategy, what is the role of backup in this design?
Problem One: Timing
For a data center to move to a self-protecting primary storage design, they must be able to afford it. As stated above, the design does require three storage systems. This means that those storage systems have to come from smaller, more price competitive vendors or you have to be willing to adopt a software-defined storage (SDS) solution. The problem is one of timing, since the organization may not be ready to move to a new storage platform or be ready to jump into SDS with both feet. They may be better off enhancing their backup strategy instead of replacing their primary storage infrastructure.
The good news is backup software can respond to the challenge. There are many solutions that can capture data more frequently, reducing Recovery Point Objectives (RPO). They can also allow for applications to access data directly from backup storage, saving network transfer of data and reducing Recovery Time Objectives (RTO).
Problem Two: Finding
As we discuss in my recent column “Not All Snapshots are the Same” re-directed snapshot technology has the ability to store potentially thousands of snapshots without impacting performance. The problem is finding the one file you need across all those snapshots. Backup systems have sophisticated databases that carefully track each version of a file and the location of that file. In the self-protected primary storage design we have three storage systems, each with potentially different snapshot schedules.
When it comes to a “normal” recovery where you need the latest version of a file, tracking down the right data shouldn’t be problematic. But what if there is a need to recover a data set that’s months or even years old? With no organization or indexing this can be very difficult. That is one of the reasons we suggest backing up a version of the snapshots periodically so you can get it into a backup software file tracking database.
Problem Three: Paying
The third problem with the self-protected primary storage design is paying for it. Certainly disk capacity is cheaper than it’s ever been, in fact let’s pretend for a second that it is free. As we discussed in our article “Even if Disk were Free You’d Still Want Tape” there is still the cost associated with powering, cooling and housing that capacity. The real limit on ‘limitless’ snapshot technology is the cost, and frankly the need.
While we believe that data should be retained for a long period of time just in case it’s needed, it should be done so cost effectively. Also, most requests for data that are months or years old doesn’t need to be fulfilled in seconds. If it takes a few hours to get this data back there is generally no issue with that. Backup software that stores backups to tape is ideal, as we discussed in our article “The role of Tape in Primary Storage Data Protection“.
Should backup software still have a role in a data center that has implemented our self-protected primary storage design? Absolutely. First, not all data centers can move to this design. Second, even when they do, the backup solution brings value in finding and storing older data sets.
There are primary storage solutions that are building search capabilities into their products and there are secondary solutions, often called “copy data” solutions, that either replace primary storage snapshot technology or augments it with better search and long term storage. In fact, we believe that over time most data protection solutions will evolve to snapshot data protection for data capture.