The primary goal of a data management strategy is to reduce storage costs. Achieving that goal requires that at some point data will have to move from primary storage to a less expensive secondary storage. The movement of data between these two points, and perhaps more points as it ages further, is often the key element that breaks a data management strategy because once data is moved, users are impacted. Fixing the data movement problem is critical to both the long-term and short-term success of a data management strategy.
The Problems with Data Movement
The first problem with data movement is that the traditional data management architecture requires copying data from one storage system to another and that data copy happens through an intermediary server or servers that are connected via a network, to both the primary storage system and secondary storage system. The process is time consuming and has potential for error.
The second problem is one of access. While it’s true that most data, which has not been accessed in the last 90 days, will never be accessed again, some of it will, especially data that was newly created within the last six to nine months. Users want access to that data to be seamless both in terms of finding the data and in terms of how quickly they can access that data.
The problem is, especially in the all-flash era, that the performance delta between primary storage access performance and secondary storage access performance is too great. Users are accustomed to accessing their files off of a high performance all-flash array, but with data management, they now have to access their data off a high capacity NAS or object storage system across a slower network. Additionally, in many cases, that access is not direct; data must first be copied back to the original location before it is accessible.
Flash to Flash to Cloud and Data Movement
Using a flash to flash to cloud architecture solves this problem. The heart of the solution is the flash to flash component. Within a single storage system is high performance NVMe flash and high capacity SAS flash. Data is automatically, by intelligence within the storage system, moved from high performance flash to high capacity flash. The movement is completely transparent to the user and application with no changes needed in their workflows.
Data movement is also internal to the storage system; it copies or moves data directly from NVMe flash to high capacity flash. As a result, transfers are rapid and are highly unlikely to be noticeable to users. The recall performance difference is also almost unnoticeable since both tiers are flash based and provide excellent read performance.
Attributes such as rapid data movement between tiers and rapid recalls make sense for data that is in that 90-days to 1-year of being accessed category. It is the data that is the most likely to be recalled. Once data has passed the one-year of not being accessed threshold, the likelihood of someone accessing it again is very low. At this point, it makes sense for the organization to move data not accessed in more than a year, to cloud or object storage. The movement to cloud storage further drives down cost and it allows the organization to take advantage of object storage’s excellent data retention capabilities.
The movement to the object storage tier can be either manual or it can also be automated via one of the various data management solutions on the market today. The organization will need to decide on the importance of transparent recall. While the feature is compelling, it does come with some overhead. IT needs to decide if this overhead is worth it for files that users may never again access. Storage Switzerland finds that when most organizations need data that is over a year old, the request is a known event driven by a discovery request or a need to run analytics on a specific set of data.
Conclusion
The flash to flash to cloud architecture is a cost optimized alternative to going flash-only in the data center. It enables internal data management for recently active data, which means rapid response for the requester and it makes practical the use of cloud or object storage as the long-term repository. All data movement can be done manually without requiring the organization to implement a completely new file-system infrastructure.
In our next blog we will discuss how flash to flash to cloud, helps solve the data protection problem in addition to the data management problem.