Scale Out or Scale Up? 6 Key Considerations for the Flash Array Buyer

In theory, scale up storage appeals because the data center can start small and add capacity and performance as needed. Do these theoretical advantages apply to the use cases in which All-Flash storage is most commonly deployed; databases and virtualization? In this article, we will analyze which scaling approach is best for flash storage.

Scale out storage has become one of those check box terms in the storage industry. After all, who wouldn’t want a storage system that can scale to infinity? The problem is that scale out storage systems are more expensive to build, implement and maintain. There are many use cases for scale out storage; it is most ideal for situations where meeting a high capacity demand takes precedence over a performance demand. Current scale out storage architectures, however, may not be right for performance centric environments like All-Flash.

Defining Scale Up vs. Scale Out

In scale up architectures, all the performance and capacity potential of the storage system are provided in a single controller unit, typically upfront. Current scale out architectures provide performance and capacity as storage nodes (servers with internal capacity) are added to the infrastructure. These architectures have their ideal use case depending on performance and capacity demands. As stated above, the appeal of a scale out storage system is that performance and capacity can be added incrementally as needed.

1. Starting Small?

One of the theoretical advantages of a scale out storage system is that IT can start small and then add storage capacity and performance at the same time. In reality, this is seldom the case. Scale out storage systems count on the cluster of nodes for availability as well as capacity and performance. This typically means an initial purchase of at least three systems or the vendor needs to put high availability in each node, raising the initial entry cost of the solution.

The problem is that All-Flash systems are most commonly implemented in Database and Virtualization environments. While there are capacity needs in these environments, they are typically not extreme, like archive and backup, where scale out systems tend to make more sense. It is also important to remember that many All-Flash systems come with some sort of data efficiency like deduplication or compression. This means less physical storage capacity actually needs to be purchased for these environments.

The result is that in scale out storage systems, the initial nodes required to form a quorum may far exceed the capacity needs of the environment that it is being placed in. This means wasted capacity, which given the premium price of flash storage, is particularly troublesome. It also means that the cost of the initial implementation may be similar to the cost of a scale up storage system.

By comparison, a scale up All-Flash Storage system is designed to provide full high availability and performance in a single product. Capacity can start as small as is actually needed by the environment and be added to the storage system without having to buy and connect additional nodes.

2. Do You Want To Scale Performance and Capacity?

Performance and capacity operate on different vectors and are not necessarily linked together. Most environments that can take advantage of All-Flash storage will typically run out of performance long before they run out of capacity. In a scale out architecture, this means additional nodes will need to be purchased with capacity (flash capacity) in order to scale performance. Once again this more than likely wastes capacity.

In a scale up architecture, all the performance is delivered with the unit upfront where capacity is added, as needed, to the system. While performance can’t necessarily be scaled, it is delivered in its entirety up front and essentially is a fixed cost with no surprises.

Another side effect of scale out storage is that the nodes typically need to be homogeneous. Each node needs to have a similar processor chip set and must leverage the exact same size SSDs. A scale up system could intermix SSDs of different sizes and even different types as new flash technology becomes available.

3. Is Scale Up Performance Really An Issue?

While the scale up lack of performance scaling is often cited by scale out advocates, the reality is that the overwhelming majority of applications can’t push current scale up flash based systems. Additionally, some scale up systems can do a periodic controller unit upgrade. So as processing technology continues to advance, the head can be upgraded to offer more performance to the existing storage shelves. As a result, there actually is some performance scaling capability in scale up systems.

Finally, some scale up vendors have the ability to add a scale out design to their architecture if the need ever becomes relevant. It is hard to imagine that processing technology would fall behind storage I/O performance, but if it were to happen, this is the ideal way to scale; scale up completely first, then start scaling out if performance exceeds the capabilities of the current processors.

4. Is Linear Performance A Reality?

Current scale out storage systems are actually a very sophisticated clustering application. In fact, they can be just as complex to design as a clustered database application. While some of this complexity can be hidden from the storage administrator, there is a limit to how much complexity can be hidden. There is also a performance complexity introduced with scale out systems – internode communication.

The nodes within a scale out storage system need to stay in sync with each other to make sure the right nodes have the right data and the right nodes are accessing the right data. This is called internode communication and it typically requires a dedicated backend network. This communication, and the requirement of a network communications protocol, introduces latency. In hard disk based scale out architectures, this latency is not typically noticeable. In a scale out All-Flash storage system that does not incur HDD latency, network communication latency may very well be noticeable. As a result, the concept of linear performance growth may not play out when it comes time to scale.

If in theory, internode latency could be effectively hidden to enable scale out storage to enjoy its promised performance advantage, it would incur a potentially significant cost disadvantage. To hide this latency therefore, would require a high-speed backend network like infiniband adapters and switches; something that most mainstream data centers have no experience with, which only serves to increase management complexity. It would also require more powerful processors that increase the cost-per-node to the point that it could be more than the cost of the scale up storage controller.

5. Is Scale Out Cheaper?

In storage there are two hard costs to be concerned with. The first is the initial purchase cost. In theory, this should favor a scale out storage system since it can start small. But again, current scale out designs need to have an initial cluster created or they need to deliver high availability in each node. Counting on the cluster for HA requires the purchase of potentially more performance and capacity than the customer needs because more nodes are needed initially. Building HA into each node requires added expense per node, probably equivalent to the scale up storage system.

A case could be made that a storage node could be delivered less expensively than a scale-up controller unit. This would require that the first option be chosen, that nodes are delivered with no HA and require a quorum to do that. Again, buying multiple nodes eliminates that advantage and it leads to node sprawl because nodes have to be added to address performance issues, not capacity issues.

At a minimum, the initial cost difference between the scale up and scale out implementation types may be a wash. When implementation time or time to data is factored into that equation then scale up systems have a clear advantage. It simply takes longer to install more pieces and get those pieces working together.

The second cost, incremental cost, is an area where scale out storage should have an advantage. But again the limits of current scale out designs tell a different story. The only way a scale out All-Flash system would have a cost advantage is if the need for expansion is being driven by performance instead of capacity. But as mentioned earlier, the overwhelming majority of flash vendors and customers report that they can’t exceed the performance of a single box. So any scenario that would justify a scale out deployment will probably not happen in most data centers.

6. Is Scale Out Really Simpler?

Another theoretical advantage to scale out is how simple it is to expand. “Like adding Lego blocks” is the common analogy. But current scale out systems don’t actually “snap” together. They are a series of individual servers with clustering software that must be carefully networked together for maximum performance and availability. This combination makes initial implementation more complex and it makes ongoing upgrades something that needs to be carefully planned.

Scale up architectures are actually relatively simple. All the capabilities, at least from a performance perspective, are delivered upfront. There is nothing to “click” in. Capacity can be added incrementally either by inserting drives into the existing shelf or adding shelves to the existing storage controller. While adding shelves also requires planning, the capacity per shelf is high and as long as the scale up All-Flash array can do non-disruptive upgrades, no down time should result.

Conclusion

Scale out storage is one of those technologies that looks great on a white board but at least in the All-Flash instance, does not play out well in the reality of the data center. All-Flash is essentially too fast for current implementations of the scale out architecture. Most customers don’t yet need the performance that scale out systems provide. Furthermore, the costs to achieve that performance requires a significant investment in the system’s infrastructure, which for many, puts its cost out of reach as well.

Scale up storage, while having the disadvantage of buying all the performance capabilities up front, has the dual advantage of more incremental capacity expansion and a less complex backend infrastructure. And leveraging data in-place storage controller upgrades can easily eliminate the lack of performance scalability.

Pure Storage is a client of Storage Switzerland

Twelve years ago George Crump founded Storage Switzerland with one simple goal; to educate IT professionals about all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought after public speaker. With over 25 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS and SAN, Virtualization, Cloud and Enterprise Flash. Prior to founding Storage Switzerland he was CTO at one of the nation's largest storage integrators where he was in charge of technology testing, integration and product selection.

Tagged with: , , , , , , , ,
Posted in Article, Video
10 comments on “Scale Out or Scale Up? 6 Key Considerations for the Flash Array Buyer
  1. While not a fully balanced view, this article makes some interesting points about scale-out vs. monolithic scale-up controller-based storage systems.

    Just to briefly comment on each point:

    1/5/6: For small capacity environments that don’t need to scale over time, fixed controller-based systems likely can be cheaper and simpler. However, for any environment where you either have meaningful scale (50TB or more) or will need to scale over time (even if you start fairly small), a good scale-out architecture will be significantly less expensive and simpler.
    With a scale-up architecture, you either need to move to a faster controller for more performance (which usually involves data migration and then a wasted unused controller), or add entirely new storage systems (potentially migrating data, and creating more and more islands of storage to manage over time).
    A good scale-out architecture allows you to scale UP and DOWN by adding and removing nodes with no data migration and no increase in management burden.

    2. The argument that you’ll run out of performance before capacity in an all-flash environment used for mainstream Tier-1 workloads flies directly in the face of marketing from Pure and others that state their approach is to give you more performance than you’d ever really need. The bottom line is that most workloads on an all-flash array will run out of capacity before performance, but over time it’s important to be able to scale both.
    A scale-out architecture can offer a wide range of capacity & performance points, particularly if it has the ability to mix heterogeneous nodes with different drives and performance levels.
    Relying on Moore’s law and yearly controller upgrades may be a good strategy for a flash vendor, but it’s a PITA for customers who want to leverage their investments for the maximum amount of time.

    3. This point just argues the exact opposite of #2 — most environments can’t push all flash systems, so why do you need to scale performance? It balances this by saying you can always just “upgrade” the controller to a faster model, without mentioning the data migration and added expensive of out-of-cycle controller upgrades. No one really does this – when they buy scale-up, they buy the controller they expect to need for the next 3-5 years… and are locked in as a result.
    Lastly, the idea that scale-out can easily be “added” to scale-up architectures does not respect the amount of work that goes into building a true scale-out architecture. Look at the near-decade that Netapp has taken to add “clustering” (a form of scale-out) to OnTAP and you’ll see it’s not at all trivial.

    4. I’ll answer this simply: Yes. Don’t take FUD, test it yourself. Good scale-out architectures really do offer linear performance and capacity scale.

    One key point that is unaddressed involves data protection and performance under failure. Controller based shared-disk systems utilize redundant components for HA (redundant controllers, power supplies, etc), however despite claims of no single points of failure, they generally share a weak point in the shared disk shelf. That disk shelf and the backplane within represents a key point of failure without full redundancy. Historically, high-end disk based systems used dual-ported FC or SAS drives to allow independent backplane connections to each drives, reducing (if not completely eliminating) a single point of failure. Flash arrays that use SATA-based flash drives can’t do that.
    A shared-nothing scale-out system doesn’t have this limitation (because no disks are shared between nodes), and is able to truly offer no single points of failure. In addition, a good scale-out architecture can self-heal without the requirement for “extra” redundant components, removing the fire drills associated with storage component failures.
    At small scales, these differences may not matter much to customers – disk shelf failures are likely fairly rare, and a 4-hour data unavailability for parts replacement won’t kill most customers, but at large scale and in environments where 5+ 9’s of availability are needed, shared-disk flash systems represent an added risk.

    There are places in the market for both scale-up and scale-out flash systems. In general I’d agree with the premise that small environments that don’t need to scale and where simplicity and low starting price is the top priority, scale-up is fine. But as soon as you add the element of scale, the element of growth, or the element of long-term TCO, scale-out is clearly the way to go.

  2. George Crump says:

    Dave,

    First, Thank you for taking the time to read the article and for your quality comment. Time does not allow me to go into all the points in detail right now. But let me see what I can do in the short time I have now.

    Cleary this article was written to bring balance to a conversation that I felt had gotten out of whack recently. It seemed like the whole industry was sliding down the scale-out slope, the purpose of this article was to point out some of the potential bumps on that slide.

    I am not anti-scale-out or even pro-scale-up. I am for what ever makes the most sense for the customer given their reality. I do think that if scale-out implemented correctly can be a good thing, my concern is that flash does lower the margin for error in a scale-out architecture, you don’t have the latency of spinning hard drives to hide your cluster management behind. In other words you have to design it better than ever. There are some architectures that I think a clearly trying to deliver on that degree of difficulty and others that are not. The NetApp use case that you point out actually proves my point.

    The running out of capacity and performance might have been better stated. But in general my point was that for most mainstream data centers running some type of virtualization (server, desktop or both) and probably a few mission or business critical database applications the performance and capacity of a dual controller storage head is generally more than adequate in the All-Flash use case. They would never need to “scale-out”. If you don’t need it why do it? I am assuming that this mainstream data center will see a nice return on deduplication / compression but if that assumption is not correct then I would have to assume that this is no longer a mainstream workload.

    From a performance standpoint, as the guy who did the 2 Million IOPS test, clearly I understand that there are some applications that need very high performance. But most don’t. In our work we find that 200k in total IOPS is about the norm and 500k is the upper end…today. That will change but then so will both of these technologies. When the customer gets to the point that they need X million IOPS they should re-evaluate at that point.

    Thanks,

    George

    • There may be some ‘bumps’ along the way (particularly with faux-scale-out architectures that add complexity without all the benefits of native scale-out), but I think the world would be a lot better off if every storage system could natively scale-out (whether people need it or not).

      Too much of the IT world is unpredictable today to be locked into a fixed platform for 3+ years.

      While it could be argued that scale-out only matters if you are at scale, it’s clear that that’s the direction that the world is headed – either you are at scale yourself, or you buy your infrastructure from someone who is (that doesn’t have to be cloud, but in many cases will be).

  3. George Crump says:

    and that is part of the problem isn’t it? Many scale-out architectures are “faux” and placing them in an All-Flash environment exposes that. We’ll have to agree to disagree on the “all architectures should scale out” If a single unit meets your needs for the next five years your probably going to be OK. The technology will change massively in that time frame.

    I am also not convinced that everyone will be renting ALL their infrastructure in the next five years, I’d have to be convinced that it will even be a majority, I can see specific needs in every organization, just not the whole organization. Cloud will and should be an arrow that you have in your IT quiver, but it should be just one of a multitude of arrows.

  4. I certainly don’t believe that all businesses will be renting their infrastructure in the next five years – large enterprises have plenty of reasons to want to run their own systems, and the resources to do it well.
    Most smaller companies do not, and will be better served working with an experienced service provider who can give them a wider range of capabilities at a lower cost, while focusing their IT personnel on creating business value rather than purchasing, deploying, and managing infrastructure.

    I would argue that any company where a “single unit” can meet all their needs for the next 5 years falls in that second category, and probably shouldn’t be buying a storage array at all…. much less a single all flash array.

    As for differentiating between true scale-out and faux scale-out, perhaps that should be your next article 🙂

  5. Jim Sangster says:

    I think the views expressed in this blog, as well as the replies, are quite enlightening. I agree that a lot has been going on in the industry at large (flash or no flash) about the merits of scale-out vs. scale-up. The ExtremIO launch added more fuel to the fire, much of the argument for it’s “goodness” was based only on the premise of scale-out = good. To me, that falls short. It isn’t as simple as arguing the merits of scale-out vs. scale-up alone; although both the author and Dave’s replies make very valid points.

    As Dave points out, there are good scale-out designs, and some faux scale-out designs. There are good scale-up designs, and some not-so-good-scale-up designs. I don’t think you can really get to the root of the value based on the theoretical merits of the differing architectures. An analogy would be like arguing over the merits of cars by comparing manual vs. automatic transmissions (I know the analogy isn’t perfect). To me it is much more relevant to discuss the merits of the particular product design, or compare the cars in total. How can the system perform and scale over time? What are the availability and resiliency capabilities? What is the overall cost of the system? How easy is it to install, integrate into your existing environment, and manage over time?

    Pure Storage overcomes many of the traditional limitations of a scale-up design, and Dave’s systems also overcome many of the traditional limitations of a scale-out design. I’ll leave the comparison of these products (or any others) as an exercise for the reader, and I’d encourage you to examine them on their own merit…not on their architecture alone. Put them to the test in your own datacenter.

  6. […] the David Wright, CEO of solid fire wrote an excellent blog responding to a column I wrote comparing the benefits of scale-up and scale-out All-Flash storage systems. However, his blog might […]

  7. […] Crump at Storage Swiss recently wrote a blog comparing the merits of scale-up versus scale-out architectures in all-flash array designs. The […]

Comments are closed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 22,250 other followers

Blog Stats
  • 1,566,192 views
%d bloggers like this: