The data center’s biggest challenge is not dealing with the capacity growth of the primary data set, it is dealing with the growth of the secondary data set. The secondary data set is copies of data it creates from primary data to serve a variety of functions the organization needs, such as data protection, feed data to reporting or analytic software and using a copy of primary data for test and development. To perform these tasks, the storage capacity of secondary storage might need to be 10 to 20 times the capacity of the primary storage.
Copy Data Management (CDM) is a new market that will address the challenges of managing all of this secondary data. If these solutions work as advertised and IT fully embraces them, they have the potential to not only reduce the amount of capacity required by secondary storage systems but to also simplify and automate the creation of the copies that IT managers today administer manually. The manual process is both time consuming and potentially risky as IT needs to interact with production data.
The potential for this market did not escape the attention of storage giant EMC. While we heard many messages at EMC World 2016, the declaration that EMC is in the Copy Data Management business came through loud and clear. EMC knows the needs of its users, who need help managing and curtailing the growth of copy data. It is clear EMC believes this is a legitimate market that is also potentially lucrative. While the strategy is still unfolding, it seems EMC will take a two prong approach to copy data. First, it is positioning the data services capabilities of its storage systems (deduplication, compression and writeable snapshots) as a form of copy data management. Second, creating a manager of copy data that coordinates the copy data functions of its above solutions.
Copy Data Refresher
CDM solutions create a single secondary copy of primary data and then provision images of that copy to those various tasks like data protection, feed data to reporting or analytic software and using a copy of primary data for test and development. CDM solutions curb the capacity growth by providing writable snapshots of secondary data to these tasks. It enables IT to buy less capacity for their secondary data set and spend more on secondary storage performance, which is critical for analytics and test/development. CDM is the future of data protection. It provides better and more frequent data protection copies, while allowing the leveraging of the data protection process for more than just an insurance policy.
CDM solutions come in several options. You can read about two of them in Storage Switzerland’s article “Copy Data Management: In-Place vs. Rip and Replace“.
The first option of copy data management that came to market took a brute force approach, replacing the existing storage technology and software. The other approach, off-host copy data management, provides more universal support but does so by standing up an entire silo of storage and replacing current snapshot and data protection processes.
The second option of CDM is an in-place solution that leverages the customer’s current storage solution and adds copy data intelligence to it. The value here is that (assuming it supports the customer’s storage system) it retains and leverages current hardware and processes.
The in-place solutions can come in two forms. The first is where the storage hardware vendor provides copy data management services either through a software upgrade or a new storage system all together. Modern storage software can keep an almost unlimited amount of snapshots which are akin to zero impact copies. The storage software can also present snapshots as writable images that grow as changes are made.
While these solutions have little impact to the various data protection processes in place they may make copies differently. The bigger challenge is that most of these solutions stop after presenting writable snapshots. The tools to interface with, to search and to automate the provisioning and assigning of these snapshots may have limits. Most of the storage hardware suppliers that are claiming to provide hardware-based copy data management are essentially trying to make a feature into a product.
A third category of copy data management is emerging a software-only tool that essentially manages copy data. It identifies data copies in the environment, categorize those copies so customers can determine which copies they need and it can automate the presentation of those copies to other applications or services. It does not change the way it makes copies, but other than managing data copies it does not provide any space efficiency capabilities.
EMC’s two prong approach consist of first, leveraging the data services capabilities within their storage hardware solution and second, creating a manager of copy data software solution. EMC has multiple storage systems in its portfolio, it seems EMC will expand each of these to provide a more robust copy data offering. The combination of snapshot technology that does not impact performance, can present writable images and can leverage in-array features like deduplication and compression should help organizations to curtail copy data. The downsides are that the services are only available to that specific storage platform and limits the ability to search and automate copy data.
EMC’s Enterprise Copy Data Management product is the second prong of their approach. This product is software only and will work to coordinate copies across EMC’s storage portfolio as well as other copy generators, including utilities built into the application. In EMC’s case it also provides automation to manage copy retention and presentation. It does not in and of itself provide data efficiency, instead it assumes you will write copies to a device like a Data Domain appliance.
Copy Data Alternatives
Leveraging the current attributes of the storage hardware (writable snapshots) is the right approach but that has to go along with the appropriate level of data efficiency and orchestration. These “in-place” solutions leverage the current snapshot capabilities of the storage hardware instead of replacing them. To that they add the ability to manage, search and present copy data as necessary to other services including backup, Test-Dev, DevOps and Analytics processing.
EMC’s entrance into the copy data market is expanding and CDM trailblazers should welcome them. EMC brings a new level of awareness to the realities of the copy data problem, but leaves plenty of room for those trailblazers to differentiate and leverage their first mover advantage. Which system is best for an IT professional will depend largely on which solution will have a minimal impact, provide data efficiency on the copy data and provide orchestration to automate provisioning of the copies to the applications that need it.