Data protection, including backup and disaster recovery, has always been expensive and a hassle. In fact, it was so much of a hassle that some organizations did not protect their applications. This was because they considered that the cost and time associated with purchasing, deploying and managing a data protection infrastructure outweighed the risk of data loss. Today however, the pendulum has in a way swung in the other direction, in that data protection is a given. No amount of data loss is acceptable, and lines of business and employees assume that their applications and data are protected.
There is clearly now a need for production storage systems that can protect themselves. The problem is that, historically, production systems were not designed to do so. Typically, production storage systems use replication and snapshots to provide rapid recovery to a very recent point in time. They were not designed for long-term retention. For example, it quickly becomes very expensive, from a storage capacity standpoint, to keep several months’ or years’ worth of snapshots on production-grade storage media. This is especially true when we consider the fact that data protection technologies typically have more limited cost reduction capabilities. Even compression and deduplication will offer limited returns considering that most secondary copies are unique. Retaining a large volume of snapshots would also bog down the storage system’s performance.
Meanwhile, most production-grade data protection technologies were not designed for long-term manageability. For example, snapshots weren’t designed to be searchable, so that specific data may be easily identified to comply with an eDiscovery or “right to be forgotten” request. Another problem in the era of ransomware is that these data protection technologies do not provide an “air gap” between the production and the secondary copy. These typically reside together, so both are easily accessible.
The cloud can potentially be used to address backup and data protection requirements. It is becoming more common for enterprises to deploy an on-premises storage implementation as a cache to support their most performance-intensive data, and to then tier copies (including snapshots) to a lower-cost cloud-delivered object store. Cloud compute cycles can also be used for more cost-effective replication. This approach stands to minimize the amount of production-grade data center (not just storage) infrastructure required to support backup and disaster recovery processes. The cloud can also add value, but it must be used intelligently to avoid potential pitfalls including latency, egress fees and cache misses.
Interested in learning more? Access our on demand webinar with ClearSky Data, “How to Design Self-Protecting Storage and Gain Backup Independence.”