Is it possible to scale-out your scale-up storage solution even though it wasn’t built to do that? The only question is how you’re going to scale it out. Most methods of scaling (both out and up) add to complexity. So how do you scale a scale-up system in a scale-out way without adding complexity?
First, it’s important to understand why scale is crucial. One reason is organizational simplicity. Adding storage islands – the usual answer to scaling – are the opposite of simple. It is difficult to find and protect the data you are looking for when it can be many different places. It is also inefficient. One silo of storage ends up running at its performance and/or capacity maximums, while others sit empty and idle. Another reason is cost. The more systems you have to manage, the higher the management cost, and in most cases, the higher the actual acquisition cost.
When your storage requirements increase, you will eventually run out of either performance or storage capacity in a given storage system. If the system wasn’t built to scale, your choices are relatively simple: buy a bigger one or buy another one. Customers usually try to avoid buying a bigger one, as it typically increases waste when the outgrown appliance is either thrown away, or at the very least traded in at a reduced value. In terms of product acquisition cost, repeatedly scaling your system by replacing it with a bigger one is much more expensive than the alternatives. Add to this the typical downtime associated with a migration, and you understand why most people just buy another system.
Buying another system seems easier because you can address the problem at hand and not have to worry about predicting future needs. And most data centers choose this path. This is why many environments end up with more than one storage array.
The problem with buying another system is that you have now created another storage island. If you kept the original storage and just put another system next to it, you now have at least two places where data can reside. If this is your typical practice, over time you will have many storage islands that make it hard to protect and find data.
One proposed solution for scaling non-scalable systems is to place all of your storage behind a storage virtualization controller that aggregates that storage by creating virtual pools of storage comprised of your existing storage. For example, if you outgrow a 100 TB disk array and buy a second 100TB array, you can virtualize the two as one 200 TB disk array. In reality most data centers don’t do this, mix and match arrays. They will instead use virtualization to provide a common management interface but seldom do they stripe data across multiple arrays.
The storage virtualization method is preferable to the other alternatives we discussed, but it is not without its challenges. If all I/O must route through the virtualization controller, then all I/O must go through the virtualization controller and the original storage controllers. This will inevitably add latency. The second challenge is that while it does provide a common storage management interface it does replace existing interfaces that users may like and understand. Third, storage virtualization is not granular to the data level, it only understands volumes, moving specific data sets within that volume is typically not possible.
Another approach is a virtualization system where the control path and the data path are separate. If one can separate these two paths, you can have the advantages of storage virtualization without the disadvantages of the extra latency. Once a user requests a directory, file, or LUN, the storage virtualization system can create a direct path back to the original storage without increasing latency. It also enables the organization to continue to leverage the storage management tools that are in place on their various storage systems. Finally, it is granular, policies are set on the actual data instead of its volumes. For these reasons we call this approach Data Virtualization. It focuses on decoupling data from the hardware, so it can be moved across any storage from any vendor.
The Impact of Data Virtualization
With data virtualization in place, IT can add new systems at will. The data virtualization software will analyze the new system and determine which of the data it manages works best for what storage system. With IT approval, but without their involvement, the software will move the most appropriate data to the new platform. For example, if the new system is a high performance all-flash array then it moves data designated as needing the highest level of performance to it. If the system is a high capacity storage system or a cloud storage target, it moves data that has not been accessed for a period of time to those systems. This frees up existing storage, and new systems perform at their most optimal purpose.
Scaling is always necessary. The only question is how you’re going to do it. Since waste and complexity are never a good thing, the typical methods of scaling are definitely not ideal. Storage virtualization is a better approach, but the traditional approach of that does add some latency. The idea of separating the control path and the data path, which allows for data virtualization without increasing latency, is intriguing and worth examination.