A recent Storage Switzerland Webinar “How to Design Self-Protecting Production Storage and Gain Backup Independence” introduced the concept of self-protecting primary storage but what about self-archiving primary storage? The value of self-protecting primary storage is that it improves the quality of data protection as well as delivers better protection service levels while reducing backup infrastructure costs. The value of self-archiving primary storage is that it explicitly reduces physical storage costs thus providing potentially much larger cost savings than self-protecting primary storage.
Self-Archiving is Not Hybrid Storage
Self-archiving storage is more than just another term for hybrid storage which has a fast flash tier and a large high capacity hard disk tier. The problem with counting on the hybrid system to also act as an archive is that it stores active and retained data within the same physical system. A hybrid system also still consumes a large amount of data center floor space.
If an organization attempts to use a hybrid system as an archive, it still needs to be refreshed every three to five years, the typical refresh cycle for an on-premises storage system. Using the hybrid system to store a decade or more of data makes the refresh even more difficult because the capacity and number of files increase the migration time. Finally, the file systems that run on most hybrid storage systems were not designed to maintain potentially decade’s worth of files. These systems often run out of metadata space and force a refresh even before they reach their actual capacity limits.
Requirements for a Self-Archiving Primary Storage System
The first requirement is for the self-archiving primary storage to meet the performance expectations of production workloads. Saving money on storage costs is a worthy goal but meeting the demands of applications and users is typically the higher priority. Flash is the storage media for storage workloads. All active data and even near-active data should be on flash storage which enables IT to meet the performance demands of the enterprise without compromise.
The reality is though, that most data is not active or near-active. Storage Switzerland finds that organizations do not access over 80% of the data they store. That inactive or cold data needs to be accessible, but it does not need the performance of flash storage. There is a balance to strike, however. The data can’t be on storage so slow that users complain about response time when they do eventually access it. The latency of the cloud then makes it a weak candidate for recently archived data as it might cause performance slowdowns significant enough that users complain.
The second requirement is for a middle tier storage area that is less expensive than the flash-only first tier and for it to have lower latency than a direct to cloud connection. The enterprise has two options. The first option is to place an object storage system on-premises and then use an archiving software solution to migrate data older than one year to it. The problem with this solution is that it is not a self-archiving primary storage solution. This option introduces a third party hardware solution and third party software which identifies and moves old data, both of which need to be managed and monitored. This approach also only works with file-based data. Finally, it also consumes data center floor space, which for many data centers is the biggest problem.
The alternative option is to use the cloud as the location where the primary storage data sits and then push that data out to the data center as it becomes active. This architecture uses an on-premises appliance to act as a cache large enough to store the most active data. Instead of having older data solely in the cloud, however, the solution should leverage a middle tier that is closer to the primary data center, within a few hundred milliseconds, so that users don’t notice a response time difference. The public cloud is used as a disaster recovery copy and as a long term archive for data that users haven’t accessed in more than three years.
Data within this storage architecture can then be snapshotted at different intervals at each tier (on-premises, middle tier, and public cloud) for long term data protection and data retention). Since these snapshots are not occurring on the active tier, they do not impact performance. Storing snapshots on the lower tier also reduces the cost of maintaining them for years.
The process to create a self-archiving primary storage system and a self-protecting primary storage system is the same. The middle tier that is milliseconds away, in terms of latency, is critical to the successful adoption of the solution. Also critical is the software that drives the data movement and retention processes. To learn more about self-archiving and self-protecting storage watch our on demand webinar “How to Design Self-Protecting Production Storage and Gain Backup Independence.”