Either as part of a normal storage refresh cycle or when consolidating storage assets after a merger, almost every data center will eventually need to migrate data from one storage system to another. The problem is that data migrations are often performed reactively rather than executed from a well planned process. Ideally, data migration should be a core infrastructure element that is future proofed. Once this foundational element is in place, data can then be migrated from any storage system to any storage system while maintaining data integrity and not impacting application availability.
SAN Migrations Past
In the past, some large amount of downtime was scheduled following a storage purchase and/or storage consolidation. The data migration process was often conducted after business hours or over the weekend and typically involved a backup of the old system and a recovery to the new system. Obviously, the servers and or applications involved in the migration had to be down during this time. It was also a “one-way” transition. Once the application was running on the new storage system, there was seldom any simple method to revert back to the old system if something went wrong.
The modern data center has to deal with much larger data sets that, even with advances in networking, are often too large to transfer in an overnight or even weekend maintenance window. In addition, an increasing number of data centers no longer have a scheduled downtime window. Applications run 24×7 and simply can’t come down for the length of time required to do a full data transfer. Finally, if something did go wrong with the data transfer process or with the storage system itself, there is no time to troubleshoot and correct the problem. Simply put, a “bail out” strategy has to be available.
The Key Requirements of SAN Migrations
In the modern data center, SAN migrations need to be done while applications are still running so that no downtime is needed; other than a few minutes for conducting cut-over activities. SAN migrations also need to be able to verify that the data on the old SAN and the new SAN is 100% identical. As mentioned above it should also allow for a rapid switch-back if something goes wrong with the new storage system.
Finally, to be part of a foundational capability of the data center, the migration function needs to be independent of the storage system it operates on. After all, it is unknown what storage system will be brought in for future storage refreshes or be part of future mergers and acquisitions. The only real “known” is that these migrations will be needed in the future.
The Conventional Migration Problem
In an attempt to meet these requirements, migration utilities have evolved from the legacy shutdown and restore methods. New methods include specific copy utilities built into the operating system, generic replication software or storage system specific migration utilities.
Operating System Migration Tools
Operating system utilities have the advantage of being very cost effective (often free). The process involves mounting a volume from both the new storage system and the old storage system and somehow migrating that information across the arrays. However, operating system utilities typically cannot perform these tasks in a non-disruptive fashion. While some can do a file-by-file sync, they typically require application downtime to ensure a clean copy of the data gets transmitted. These utilities also consume most of the CPU and memory of the host when doing the copy.
Operating system utilities also inefficiently use storage networking resources. A big challenge for these solutions is that the copies run “north and south”. Each file needs to be copied up from the original storage system to the connected server and then copied again down to the new storage system. In other words, the biggest bottleneck to impeding rapid migration, the storage network, is used twice. Furthermore, while all this is occurring, CPU and memory resources are actively being consumed.
These migrations have to be done on a per volume basis, usually by the actual host that is assigned to these volumes. This means that multiple migration jobs have to be done per host and there could potentially be dozens of migrations running simultaneously. Additionally, all of these jobs need to be individually monitored for successful completion. Operating system utilities are notorious for providing limited details about the copy process and usually only provide cryptic error messages when something goes wrong. This increases the risk of failure and naturally requires increased management time by the IT staff.
Finally, few of these utilities provide any sort of data verification or a “fail back” process once data has been slightly modified on the new system.
Generic Replication Software
Generic replication software is a step in the right direction, but it does not fully create a foundational migration service that IT can rely on. A typical implementation involves copying data at a block level north from the original storage volume, to the connected host and then via the IP network (a whole separate highway) to another host that is connected to the new SAN. Lastly, that host then has to send the data south to its assigned storage volume.
While replication can be done in real-time with minimal application downtime, the software consumes the CPUs and memory resources of the two hosts. Replication software is further burdened by having to not only copy data across the SAN (twice), but also copy it once across an IP network. While this type of replication is fine for disaster recovery, it is not well suited for a migration where a large amount of data needs to be moved all at once.
The good news is that once the copy is complete, only the changed blocks need to be synchronized. This should have minimal impact to network I/O throughput and can be completed in near real-time. Additionally, this does not place much overhead on the new storage system since these copies are done asynchronously over an IP network. This adds latency and does not place any real-world stress on the new storage system. Replication does allow for a reasonable fail back capability if something goes wrong with the new system.
The main issue is that each host connected to the SAN needs to have this software installed on it. Each host then individually needs to move data north up the SAN to the host, across the IP network, to a second host and then south to the new storage system. The management overhead involved in monitoring this process can be severe, especially if server (physical or virtual) count is high.
A final challenge for host based replication software is that these applications are very operating system dependent. They count on a very detailed understanding of the operating system in order to do their block copies. As a result, most developers tend to specialize on one particular operating system, so not only are there many jobs to manage, there are often multiple tools from multiple vendors (one for each operating system).
Utilizing replication as a migration strategy is less than an ideal due to the multiple, complexities involved – it requires a large initial seeding of the data, there is no methodology for stress testing the new system, multiple software applications need to be deployed to support various operating systems and each replication process has to be individually managed.
SAN Specific Migration Software
Finally there is SAN specific storage migration software; often provided by the incumbent storage manufacturer. This is a step in the right direction in that the copies are typically done east and west across the SAN, instead of the north and south communication described above. Typically there is more bandwidth across east-west connections and the copies take place from storage system to storage system, so no host resources are consumed. This also means that a single copy job can migrate data between the two systems.
The big challenge for SAN specific replication software, however, is that it is usually single purpose and does not provide a long term migration foundation. In other words, each time a new storage system is purchased, a new migration software needs to be acquired as well. This also of course, would require learning a new migration application. Additionally, SAN specific software has a limited set of storage hardware that it will typically migrate from.
The single purpose nature of SAN specific migration software combined with the lack of testing of the new system and with its limited amount of source storage system support also makes it less than ideal for a long term, foundational migration strategy.
Storage virtualization, while often thought of as more of a foundational storage strategy change, may be the ideal migration solution. It provides the foundational migration strategy that the data center needs while at the same time opening the door, when the IT team is ready, to the other operational and cost saving benefits of the virtualization of storage assets.
A storage virtualization solution typically installs as an appliance within the storage infrastructure, it then has the ability to copy data from the original storage system to a new storage system, in an east-west manner, without involving any server hosts in the process. This can be done volume by volume or the entire array can be copied as needed. This leads to a rapid data copy and almost no application downtime.
Another benefit is that after the initial copy is made, both storage systems can be written to in synchronous fashion for a period of time to stress test the new storage system under a production workload. This can be done as long as needed to gain 100% confidence in the new storage system. Then when the time is right, production can cut over fully to the new system. However, instead of deactivating the old system, it can be set to be the secondary array, where data being written to the new storage system is now mirrored to the old storage system in either synchronous or asynchronous fashion. Providing a fail-back mechanism is an ideal way to repurpose old hardware while improving data resiliency.
Finally and most importantly, storage virtualization is agnostic but not equal. It can migrate from any-to-any SAN based storage system; but beware, as some storage systems require a migration process to preserve existing data. A complete storage virtualization system can see and access existing data from any SAN-based storage system, negating the need to take the storage system offline when migrating between storage platforms, causing unexpected downtime or a loss of productivity. This means that storage virtualization can be that foundational migration component that each data center needs.
The obvious benefit of selecting storage virtualization as a migration strategy is that it also opens up the opportunity to leverage some of the other capabilities of storage virtualization when and if the IT manager decides to. For example, storage virtualization allows for all storage management and storage services to be provided from a single interface and single set of tools.
Ironically, when storage virtualization is fully embraced, it actually may eliminate the need for migration altogether. It allows storage systems to be added seamlessly, and volumes can be moved between those storage systems based on performance and capacity demands instead of product life cycles.
FalconStor is a client of Storage Switzerland