Recently I sat down with George Crump, founder of Storage Switzerland, and Scott Baker, Director of Enterprise Data Protection from HP’s Data Protection Division to discuss what the next generation data backup solutions might be.
Data is growing not only in terms of capacity, but also in terms of value. The data process needs to evolve not only to meet the recovery demands of the enterprise, but it also needs to become more intelligent so that the value of data can be successfully mined. No other process in the data center has the opportunity to know more about this than data in that enterprise than the backup process.
Charlie: George, when we say that backup solutions have the best opportunity to mine data for information, what does that mean?
George: We’ve looked at ways to classify, organize and optimize data for a variety of different reasons and mine it for future value. One of the big questions has been “how to get all of my data to the thing thats going to tell me what my data is?” If you think about it, the data protection process has that happening already.
We have to get data to the data protection process, so it’s protected. The data protection process is the ideal place to do this data analysis. The challenge is very few vendors have the scale and “robustness” in their data protection or backup software architecture to be able to actually mine that data.
Charlie: Scott, how can backup best take advantage of this tremendous opportunity to understand so much of the enterprise data set?
Scott: Well, I think simply put the backup process needs to become more intelligent. I believe there are two forms of intelligence though, that we need to separate and understand. The first is analyzing how the back of the infrastructure is being used, in an effort to optimize the recovery process and enable the IT leaders to be a lot more strategic in how they plan that process for the protection and retention of the organizational data.
The second form of intelligence is in analyzing the data that is being backed up as a form of discovery – to equate the referential value, or to determine its probability of reuse so that the data itself can be used appropriately and pulled back into operational status when necessary. The goal is to incorporate data driven decision making in the organization itself while creating a data protection strategy infrastructure that is just as agile as the rest of the data center.
Charlie: George, armed with this intelligence what can the backup process tell enterprises about their data?
George: What you can get out of this analysis comes in a couple of phases, but I think the immediate and most pressing one is we can improve the accuracy of backup by analyzing this information. For example, we can set priorities now, on specific jobs or groups of jobs, and we can analyze that data to make sure its being protected properly. We can also identify data that isn’t being protected properly.
It gives us a whole wealth of information and really, at the end of the day you look at these processes almost as triage. The thing that is most immediate and most pressing is getting the backup job complete and making sure we can recover the right data at the right time. That’s what this intelligence is going to enable us to do.
Charlie: Scott, what else?
Scott: Building on what George just said, I think that the use of these operational analytics in the backup process, really is going to drive three key benefits to the organization.
The first is hindsight, really being able to understand what has happened by drilling into correlated information that’s essentially stored in a “backup data warehouse” if you will. We refer to this as back-casting where the goal is to combine a correlation with a causation so that you can standardize which works effectively within the organization. Then more quickly remediate issues as you discover them.
That leads into the second key benefit, which is insight, really understanding how one part of the backup process relates to another by using these analytics to drive real time visualizations of the overall infrastructure and use visual cues to establish a backup and recovery operation center, like a network operation center. We’re really giving you an opportunity to see what is actually happening in real time and make changes as quickly as possible.
The last key benefit – probably the most exciting and most important – is corporate foresight. That is the ability for this intelligence to be mined in such a way so that you can run modeling or what-if based scenarios to really understand what would happen if I did whatever that thing would be. Especially around the backup and recovery process where often you have to react to problems whether its data loss, or infrastructure needs – things that tend to occur outside of the planned events of the day. So those are the three key benefits that I would really point out.
Charlie: Beyond intelligence, what else do we see happening in the data protection market, George?
George: I think there are two things that work together and they drive to what this intelligence is all about. The first is backup has always been the Rodney Dangerfield of the IT world. It was kind of the job that got assigned to the new guy. People are now beginning to understand that there is too much value in this data, that you have to really protect it. There’s case after case of people going back and using old information to create new products, new services, or offer new capabilities. This data now has to be protected so the first copy can get back as fast as possible and the long term copy can be retained in a highly data integrity type of fashion.
So what we’re seeing as a result of that is a convergence on the market where there are no longer separate processes of backup archive and replication. They all are intermixed now and need to be able to, as Scott said, provide forecasting. The software needs to be able to say “ok this is my data set that needs to come back now, this is my data set that I need to keep for a very long time. I might never need to restore it but if i do, it’s got to work” and also the ability to understand the correlation between those two data sets.
Scott: Gosh it’s really hard to follow a Rodney Dangerfield reference, wish I could impersonate him a little better than I can. But I think George is spot on with his reference to convergence with backup and recovery. When you combine with that archive to be able to do discovery along with replication, and to really be able to address the availability piece of this, thats when you have the total package.
Convergence allows an organization to standardize their data protection process with a unified approach that will better support the notion of automated tiering of protected information based on the probability that its going to be reused in the future.
I don’t mean to contradict what had just been said, but this is the movement by which that converged solution has to be able to communicate with targets that are no longer traditional backup and archive kinds of devices. They have to broaden out to include any kind of storage – JBOD, direct attach, network attach, SAN attach. They also need to include any location –the core, the edge, an on-prem cloud solution, the host solution, a public cloud, and more importantly for any kind of workloads. So we’re definitely seeing that convergence at the top layer to address backup and recovery archive, and then the replication core availability in the long run.
Charlie: Scott, how do you see backup hardware changing?
Scott: I believe that the future of backup and recovery is really going to be more software focused than hardware. But from a hardware perspective, I think we’re going to see both of these products go the route of virtualization. That data protection software layer is going to be used to create a form of abstraction over the various backup targets using standard points of integrations or programming interfaces etc., to take full advantage of the hardware capabilities while also creating a very agnostic backup and recovery layer to support the movement of data to any form of media – disk, tape out in any location, core, edge, Colo. Whatever you may have. However, we’re going to see that separation between the software layer and the hardware layer, to make the mobility of the information much easier to align with the service level agreements that had been negotiated.
Charlie: George, anything you’d like to add?
George: I think as you start to look at that separation and that abstraction, I think Scott’s dead on there. Interestingly what happens in hardware right now on primary storage and what we’ve seen with terms like “software defined storage” and “storage virtualization,” is that the quality and the capabilities of the hardware itself can now stand on its own two feet.
In a software defined world you’re not buying substandard hardware just to get these cool software features. Because you can now select the exact software you want, now the hardware has to stand on it’s own two feet and solve specific problems. That’s where the ability to do things, as Scott mentioned, like ingest data at a very high rate becomes important, the ability to scale becomes important, and availability becomes increasingly important. Now we’re counting on this thing for more than just backups, so all of those things start to tie in together. I think what we’re seeing as a result of software abstraction is that companies that can innovate in hardware still see a lot of traction within the market because now they can be judged as a stand alone entity.
Charlie: Scott, obviously HP is a big player in this market, can you give us a quick overview of your solutions in strategy?
Scott: Sure thing. You know our conversations to this point in the interview have really focused on where HP is investing in backup and recovery in the long run. Last year we announced our adaptive backup and recovery framework and we have already had two product releases into it. Our framework is simple. It’s all around protect, analyze and visualize. With our data protection engine, which many readers will know as DataProtector, we deliver applications aware of zero downtime backup and instant recovery for more than 11 mission critical applications with more than 14 of the most commonly deployed operating systems, and I mean that from the perspective of physically and virtually with support for four of the leading hypervisors.
Equally important in those transitions, HP is really looking at trying to create that agnostic backup and recovery layer. We are going to focus deep on the integrations into the HP storage line. However, we will continue to work in heterogeneous environments by moving into more of an extensionable snapshot management framework. With that framework we can dynamically add support for third party players as they get released or as our customers require these supports.
The other question is how to analyze this backup data. At HP we introduced HP backup navigator. It’s one of those products that’s continually analyzing that backup and recovery process and building up that backup warehouse, if you will. We use 75 key performance indexes. The product ships with a 90 standard interactive reports that provide users with that trending gap analysis what-if based scenario, modeling that I’d mentioned before. This sets the stage for later, future evolution around automation. Kind of like VMware distributed resource scheduling, but for backup. That’s pretty exciting for me.
Finally, we also created a visualization layer so that we could extend the data protection awareness into tools such as Microsoft Center Operations Manager. This really drives real time monitoring into the analysis of the backup and recovery infrastructure. We can use visual cues to allow administrators to always be aware of real time help and status of their Data Protector environment. If a problem does arise, they can use one click assistive remediation to solve common issues. For those who require even more manual intervention, there’s an entire knowledge base that’s built into it, right at their fingertips to describe the problem, the symptoms that lead up to it, and recommendations on how to resolve it before they have to pick up and call the HP support line.
This is the future of backup and recovery from HP’s perspective and really what we’re driving at. For all I know, next year or so when we talk again we’ll have the ability to extend, protect, analyze, and visualize. Plus we will also include optimizing, as we really drive toward that automation piece of data protection.
Charlie: To listen to the audio for this podcast click here. Also we did a webinar featuring George and Scott. Click here to watch that on-demand webinar right now.