Most primary storage systems and software provide some form of data protection. That protection comes in the form of protection from media failure (typically RAID), snapshots and clones. All-Flash Arrays (AFA) seem to provide a better than average level of data protection. The perceived improvement though comes from the performance and reliability of the hardware itself, not from improving the efficiency of the software.
The Problem with RAID and All-Flash
Protection from media failure is at the heart of most primary storage solutions. These systems typically use a form of RAID to make sure data is still accessible after a drive or two fails. There are some challenges with the use of RAID, especially in all-flash arrays. First, there is a concern about the write efficiency of RAID. Flash media is still sensitive to the number of writes that each drive can take. If one drive in the array or volume group gets the majority of the parity bit IO then that drive, which is key to protecting the organization from data loss during a media failure, can prematurely wear out.
A second concern is most arrays are limited in how many drives can fail before they expose the organization to potential data loss. Most systems leverage RAID 5 or RAID 6 and only protect from one or two drive failures. Organizations in need of a higher level of protection resort to RAID 1 (mirroring) which means double the consumption of flash capacity which increases the price of the system significantly
.The final RAID concern is performance impact during a rebuild. While all-flash systems rebuild to a protected state faster than hard disk-based systems, there is a performance impact while the rebuild is occurring. The entire system is affected by the massive amount of read and write IO. The rebuild process also stresses the target drive during reconstruction of a full drive and is limited by the performance of that drive.
All-Flash software needs to rethink parity-based protection so that it gives storage architects greater flexibility in protection. If the system has enough drives, it should allow the system to be resilient to any number of drive failures before data loss occurs. It should also efficiently write parity data so that writes are optimized, maintaining flash durability.
Snapshots and All-Flash Software
Most all-flash storage vendors claim unlimited snapshots without performance impact. They make this claim assuming perfect conditions; snapshots are taken in order and are in a read-only state. Snapshots that are in this ideal condition and state are less valuable to the organization and less valuable for data protection. Many systems, for example, show performance problems if an administrator removes a snapshot or set of snapshots from the middle of a snapshot chain. The AFA may also exhibit performance problems if a snapshot is set to read-write and used for test/dev or even as a golden master for other applications.
The all-flash software also needs to rethink snapshot features. If they are genuinely going to be part of the data protection process, then snapshots should be read-only to start. The storage software needs to enable IT to execute snapshots as frequently as possible (every minute), and the software needs to manage all these potential versions for the organization.
Affordable, Efficient Replication
One of the challenges to using primary storage for data protection is snapshots are vulnerable to corruption or failure of the primary volume. Ideally, the organization replicates these snapshots to a second on-premises system and then to a third system, off-site. The three-system design plus a separate off-site backup makes for a near perfect data protection solution. The problem is, with traditional storage arrays, buying three all-flash systems for complete data protection breaks most IT budgets.
SDS should be the perfect answer, but the problem is most SDS solutions also require very high-end hardware. IT planners need to look for SDS solutions that can drive near raw performance from the flash media without requiring the latest server hardware and most expensive Intel CPUs. They should also look for solutions that can mix the media within the three systems, where the primary array may be all-flash, the secondary system is a hybrid (flash and hard disk), and the DR system is made up of hard disk drives. If the same SDS environment manages all three storage types, administrating these systems is easy, and keeps cost to a minimum.
StorageSwiss Take
Protection from media failure, flexible snapshots that truly don’t impact performance, clones, and efficient replication to another storage system are all functions of metadata. The more efficiently the all-flash storage software manages that metadata as these different features interact with it, the better these features and the flash system itself will perform and scale.