The first three chapters presented the flash to flash to cloud architecture as a design that enables an organization to overcome the majority of its primary storage challenges as well as provide a method to improve its data protection and long term data retention strategies. For data centers looking to implement this design, the next step is to select a vendor or vendors that can deliver the product. There are specific requirements that participating vendors need to meet, and this chapter will outline them.
Requirement 1 – Limit Storage Software
While technically one storage system could deliver all the capabilities of the flash to flash to cloud design, more than likely that will lead to some compromises. More than likely it will require multiple systems with different feature sets to create the desired architecture. However, IT planners need to be careful to limit as much as possible, the number of systems and storage software they need to manage on a day-to-day basis.
Ideally the architecture can be established with one vendor providing the primary and secondary storage, and another providing the cloud or object storage. There is also the potential for the on-premises cloud/object storage to replicate data to a public cloud provider.
Requirement 2 – Intelligent Data Movement on Primary Storage
The second requirement is for the primary storage to have the ability to move data intelligently between two media types. For many data centers, those types will both be flash based, consisting of a small high performance NVMe flash tier and a larger high capacity SAS based flash tier. The data movement between these two tiers will be frequent, occurring almost every minute of every day. For that reason, the data copy should be internal to the storage system, which should also perform the analysis of what data segments should go where. External communications across a network will add too much latency.
Requirement 3 – Hard Disk Based Disaster Recovery
All primary storage systems should have a disaster recovery copy in a remote location and the ability to replicate data asynchronously from one storage system to another is increasingly common. As organizations move into the all-flash era, they are finding that most vendors also require them to have an all-flash system at the DR site. While there may be some data centers where all-flash to all-flash replication is a requirement, the vast majority cannot justify an all-flash system sitting idle, waiting for a disaster that may never occur.
At the same time, replicating to a hard disk only system at the DR site may be problematic as well. IT planners need to keep in mind that once they move to all-flash at the primary data center, users will come to expect all-flash performance all the time. Additionally, application developers will develop their applications counting on all-flash.
The combination of these two factors means, that the DR site storage system should also be a hybrid system, but with a mixture of flash and hard disk drives instead of flash and flash. The flash tier allows applications that need flash performance to execute as they always have. The hard disk tier allows DR data to be stored cost effectively.
Requirement 4 – Object Storage
As discussed in chapter 3, the Flash to Flash to Cloud approach fulfills much of the data protection requirement but it does not fulfill the need to retain data. For proper retention, the organization should use the same storage architecture that most cloud providers use, object storage. Object storage solutions are scale-out in design and leverage commodity hardware making them a very cost effective method for storing petabytes of data for decades. Object storage systems also have the ability to verify the integrity of data continuously to ensure that data written today will be readable years from now.
Requirement 5 – Data Classification
The final requirement, the need to classify data, potentially introduces a third vendor but this solution does not need to be an elaborate data management solution. The flash to flash to cloud architecture does not move data from primary storage to object storage until after a year or two since the last access. That means that a tool that can simply identify old files is all that is necessary. IT can then manually move data based on age if it chooses.
A manual movement of data to object storage does mean that if that data is required in the future, then the data will require a manual recovery. But, the probability of inactive data, which is anything more than a year or two old, being recalled is relatively low. In most cases, only compelling events like a legal discovery request or the decision to run analytics or reports against old data would trigger a recall of this type of data. In both cases, IT has time to meet the request and the actual recovery time required is relatively low since the data is stored on hard disk media.
Certainly, IT can take the next step and implement a solution that automatically moves data as well as sets up a system that automatically recalls it but for many organizations, manual movement is a simpler and more cost effective alternative.
The Flash to Flash to Cloud design is an ideal architecture for organizations looking to address primary storage performance problems, increased data protection expectations and the requirement to retain data for a long period of time. The design takes advantage of the affordability of flash as well as the improvements in storage software so that primary storage can meet the wide variety of use cases but also recognizes that it is difficult for one storage system to do it all. The architecture enables primary storage to partner with object storage to create an environment that meets both short term and long-term data goals.