Designing a SAN for the Cloud Service Provider
Cloud Service Providers exist in unique market conditions. They are at the same time competitors to and complements to traditional data center IT. Every service they provide is under scrutiny, and those services must perform better, be more reliable and do so at competitive cost. In many cases, it’s storage infrastructure is a key inhibitor to achieving this high level of competitiveness. Cloud service providers need a storage system designed for their specific and unique needs.
The Cloud Advantage
Data centers, both small and large, are turning to cloud service providers (CSPs) for a large scale reduction in capital expenditures (CAPEX) and operational expenditures (OPEX). Alternatively, data centers are in some cases, turning into cloud providers in their own right. The ability to “spin-up” and “spin-down” servers and applications as needed provides an incredible opportunity for the traditional data center. This flexibility gives CSP customers the advantage of only paying for data center resources when they actually use them.
From an OPEX perspective CSPs can provide their customers with additional IT resources, skilled resources at that. They are able to leverage the CSP’s personnel and expertise when needed. This helps them to eliminate or at least lessen the very expensive process of finding and retaining qualified IT personnel.
However, CSP customers have approached cloud options with scrutiny and high expectations. They expect the provider to deliver on service level agreements (SLA) that often exceed their own internal agreements and to do so at a competitive price.
From a compute perspective, CSPs have been able to balance these often competing SLA requirements through their widespread implementation of virtualization. Server virtualization allows the CSP to scale their server resources in a cost effective, granular fashion that keeps pace with the high SLA expectations of their customers. In short, the CSP defines their own cloud to provide a cost effective, flexible way to meet SLAs. The biggest challenge remaining is making sure that the storage infrastructure can complement the compute cloud’s cost effectiveness and flexibility.
The Storage Challenge
One of the biggest challenges to CSPs standing up to this SLA scrutiny is the storage system. CSPs need a storage system designed specifically for their needs. It has to meet the performance/cost SLA against a variety of workloads. Some of the provider’s services will require that storage be very performance focused (online applications), while other services will be capacity focused (backup and archive). As detailed in the first article in this series, legacy scale-up storage systems and even legacy scale-out systems are not ideally suited to the CSPs unique pay-as-you-need-it business model, or their requirements to support highly variable workloads which mix performance and capacity.
Cloud Storage Challenges
As described above, CSPs count on virtualization to allow the compute side of their infrastructures to respond to the SLAs and cost demands of their customers. The typical CSP adds virtual machines (VMs) as they add accounts. When the number of VMs on a host reaches critical mass, they start up another host. In some cases the CSP may add unique physical servers for each account but the goal is still to maximize VM density.
This granular scaling allows the CSP to only acquire physical hardware as they add accounts to justify that investment. For CSPs to be profitable they need the critical mass of VMs to be as large a number as possible. In other words, they need very dense VM-to-physical server ratios. CSPs need to stay laser focused on revenue generated per rack.
Maintaining this density is a growing problem for traditional data centers as well, because it leads to massively random storage I/O demands caused by the wide variety of potential applications running in those VMs all making different types of requests from storage.
At the CSP the problem is compounded because, since they don’t own the workloads, those workloads can come and go as the customers please. The CSP has limited visibility into the data types within their customers’ VMs and has limited advanced warning when a spike is about to occur.
VM variability causes significant problems in the storage infrastructure. Servers must wait for their requests to be processed through the storage network and the storage system. The compute engine of the storage system, the controller, is also placed under duress responding to these random requests from hundreds of physical servers with thousands of VMs. For the storage infrastructure this is the worst-case scenario. The storage system must be able to perform under the worst-case scenario.
The problem for the CSP is that purchasing this level of performance or capacity, to protect again this worst-case scenario, can be expensive with legacy storage systems. Although scale-up systems can incrementally add capacity, they often require that performance, storage processors and network I/O be purchased upfront. Legacy scale-out systems allow an incremental upgrade path similar to how the CSP adds compute resources, but it’s a separate stack that needs to be managed. It also means that the resources of the scale-out storage system are not used evenly; something goes to waste, typically processing power.
Cloud Defined Storage
A more ideal solution may be to leverage the compute resources that the CSP is already purchasing when it adds physical hosts. This means utilizing their investment in the virtual compute infrastructure to also virtualize storage software and hardware. As the CSP’s compute needs grow their storage capabilities grow right along with them. In other words the growth of the CSP’s cloud defines their storage performance and capacity growth. This is exactly how large cloud providers like Amazon and Google handle their storage infrastructures.
Cloud Defined Storage is more than just moving the storage software into the physical host. Ideally the storage devices themselves would be moved into the physical host as well, which are simply the hard disks and/or SSDs that come with the physical server cost. This greatly reduces storage acquisition costs by eliminating the need to buy separate storage controller hardware or storage networking and by keeping the CSP from being locked into single-vendor storage devices.
Cloud Defined Performance
A cloud defined storage strategy can also lead to a better performing environment since storage performance and capacity automatically scale as the number of hosts in the environment scales. Hosts are added when the environment demands it based on account load, automatically striking the appropriate balance. Second, more of the data access is local, so there is less concern over network performance. Shared storage requirements are automatically serviced by the existing network that interconnects the hosts, which is typically a flatter, simpler network.
Cloud Defined Flexibility
Each physical server can be configured by the type of VM that it’s likely to support or it can be configured for a mixed workload. For performance-centric configurations that are providing an online application service, the hosts can be configured with high performance drives or solid state disks. Hosts that will be providing more of a cloud storage service, like backup or archive, can be configured with high capacity drives.
Cloud Defined Reliability
Access to data is a key SLA measurement that providers are held to. A cloud defined storage architecture becomes more durable as more hosts are added to the environment, helping to support those SLAs. Additionally, the level of reliability can be turned up or down based on the account. This allows the CSP to easily provide “Gold, Silver or Bronze” levels of services based on the needs of the customer.
Cloud Defined Simplicity
Finally, cloud defined storage removes the complexity of implementing and managing a storage network and separate storage system. Since the cloud based storage system leverages the same infrastructure as the CSP’s cloud, it can be managed by the same personnel and does not need an additional storage team.
Since capacity and performance expand in concert with the addition of new hosts, which comes from the expansion of the business, the need to plan storage as a separate practice is eliminated. Certainly, additional resource consumption needs to be accounted for when configuring a new host but that becomes a natural step in the configuration process.
If the CSP can scale storage in lock step with their compute environment, many storage related problems can be solved. Letting their cloud define the storage system provides maximum performance, maximum flexibility and a much more favorable cost model. It also ensures that the CSP does not have to hire storage specialists and create a dedicated storage team like the traditional enterprise data center. Instead they can stay focused on providing services to their customers.
OnApp is a client of Storage Switzerland