About a decade ago, VMware introduced the vision of a software-defined data center (SDDC). Unlike data centers of yesteryear, a software-defined data center was supposed to leverage intelligent software and commodity hardware to create a flexible data center that could meet performance expectations, de-complicate operations, provide a pathway for planned growth and address the cost realities of the modernizing data center. Unfortunately, the SDDC remains just that – a vision; an unreached goal for most enterprise data centers. The management of commodity servers through software like hypervisors and container orchestration tools continues to mature and software-defined networking is gaining in adoption. Yet storage remains the key roadblock for enterprises looking for an SDDC future, as the first generation of software-defined storage technologies fell short of the core requirements and ran out of gas on the journey to the SDDC.
Part of the storage challenge is organizations still need to support legacy applications like Oracle and MS-SQL while they bring in new applications that use Cassandra, Couchbase, MongoDB, Spark, Splunk or Hadoop. Since it may be impossible to entirely replace the legacy application environment, the organization needs a storage architecture that supports both modern and legacy applications and operating environments. And since the advent of the public cloud, applications are increasingly looking to harness rich data like images, graphics, sensor data and video as core assets to drive the user experience and analytics. Most organizations find that they lack a universal piece of storage software that can meet the requirements of the entire data center.
The lack of a universal storage software solution continues to fuel the hardware defined solutions, which have dominated the data center for so long. Even modern applications often end up attaching to older legacy storage systems. While each of the modern application environments has its own unique built-in storage architecture, as they scale they require a higher performance, more scalable and resilient architecture. As a result, each environment has more than its fair share of third-party vendors offering an alternative solution and IT is forced to manage dozens of independent storage systems, all dedicated to a single task.
Is the Public Cloud the Answer?
The major Public Cloud providers offer a compelling alternative to the storage free for all going on inside the data center. They typically have three strict tiers of storage and provide a lot of behind the scenes automation to make setting up the environment easy. Armed with these three storage tiers, a cloud provider not only supports all the workloads a single organization has but all the workloads of thousands of organizations.
Getting the cloud environment ready for the organization’s applications and data requires little upfront cost. However, as the organization moves applications and data to the cloud provider, the periodic billing for storage consumption adds up quickly. While creating, using and getting rid of compute resources is relatively easy, storage resources are different. It is not possible to dispose of storage resources quickly, if at all. Data’s permanence means that the organization needs to pay for its storage no matter the state. Since storage capacity is seldom “returned”, it is a constant periodic expense, which only increases as the organization’s capacity demands increase. The industry now is seeing data center’s return from the cloud to the data center, as cloud customers experience their first taste of “renter’s remorse”.
Even if enterprises can justify the cloud expense, willing to trade off the cost to lessen the headache of managing it themselves, security remains a big concern for cloud-based storage. While it is possible to architect both on-premises and cloud storage to be very secure, mistakes do occur. The problem with a cloud-based solution is the level of exposure. A mistake in the cloud is potentially visible to a much larger group than something internal to the organization is.
Given the data capacity requirements, which continue to increase, plus the permanency of data, keeping data on-premises makes more sense if the organization can find a software-defined solution that empowers the SDDC instead of being a roadblock to it.
The Requirements of Storage in the Software Defined Data Center
Most enterprises believe the public cloud exemplifies the epitome of the software defined data center. While the big three cloud providers have a reputation for developing their own storage solutions, most of the other large cloud providers leverage existing solutions. In both cases, what separates these providers from the traditional data center attempting to be more cloud-like is the dedication to automation and orchestration.
Most cloud providers focus their IT staff on writing and maintaining scripts that execute behind the scenes in response to requests from the provider’s customers or in response to changing conditions in the storage infrastructure like the addition of new capacity, new customers, applications, and capabilities.
The problem is that traditional data centers looking to be more cloud like, as well as smaller regional cloud providers, can’t attract or afford to hire teams of IT administrators to write and maintain such scripts. They need a storage system designed from the ground up to meet the needs of the evolving data center, which means it has to meet the demands of both legacy and modern applications.
Requirement 1: Enterprise Performance and Capabilities
One of the challenges for the data center in transition is that for the foreseeable future, it will need to support traditional applications like Oracle and MS-SQL as well as common environments like VMware and support the growing experimentation with containers. These environments need high performance and enterprise class features. The problem is that most storage solutions aiming to enable the modern data center sacrifice were unable to achieve enterprise class performance, sacrificed features that could further jeopardize clock speed and ultimately pivoted to serve point problems. One need look no further than the roster of object storage vendors to find representative examples.
The storage architecture for the SDDC needs to support both modern applications as well as traditional applications that may never transition to the modern compute architectures. These systems need to deliver hundreds of thousands of IOPS and features like encryption, thin provisioning, encryption and deduplication. They also need to support the various traditional and modern storage protocols like block, file, and object.
If the SDDC storage architecture can’t deliver these traditional levels of performance, enterprise features and support for a broad range of protocols, then it is almost a non-starter for the traditional enterprise. In the beginning and for several years into its evolution toward modernization, the traditional application and operation environments will have greater priority.
Requirement 2: Automation, Orchestration and Autonomous Actions
Public Cloud providers are the leaders in the move to the SDDC, because efficient operations are central to their unit economics and essential to scaling their business fortunes. The more efficient they are and the more seamlessly their customers can interact with the cloud, the more money the provider will make. There are three factors to creating a storage architecture that supports the desire for the SDDC to be more cloud like.
The first is automation. Storage systems with automation can automatically move data between different storage types, storage nodes or storage locations. The next step is orchestration. Orchestration insures that a series of actions, some related and some unrelated, occur when requested. Orchestration is a preprogrammed series of events triggered by a predictable request or condition.
Autonomous actions are actions the storage architecture takes after it performs its own analysis of the environment. It can learn these steps by using machine learning to watch how the administrators deal with new situations, like the addition of a node to the storage architecture or handle a hot spot caused by too many applications using the same storage volume. Autonomous actions are critical because they enable the traditional data center to compete with the teams of script creators that the major cloud providers employ. In short, automation may sound simple, but actually achieving a level of automation beyond the most basic of administrative tasks is what differentiates true automation from simple task processing.
Requirement 3: Scale Out and Continuously Aware
The third requirement is for the storage architecture to be scale-out, meaning expansion occurs by simply adding a node to the storage cluster and for the architecture to be continuously self-aware. It’s worth noting that automation must return to this simple act in two ways: first, once that node is added, the system should automatically take advantage of the new capacity and capabilities of this node, lest we head back to the data center of manual data moves.
A traditional data center with concerns only for traditional applications like Oracle and MS-SQL or environments like VMware may only need a scale-up architecture. However, a data center with aspirations on being more cloud like and introducing modern applications into the organization will need a scale-out architecture or they will end up with dozens of storage silos that span both the on-premises data center and the cloud.
Continuous availability is important for both traditional and modern applications in the SDDC. However, the storage architecture more than likely leverages commodity hardware to keep costs down. Greater capacity and greater dependence on a single storage architecture means that more applications and services are at risk if there is a storage system failure.
These reasons mean that the storage architecture needs to provide continuous access to data, and provide that data at the same performance level as it did prior to the failure. The storage architecture needs to have a design that enables it to survive storage media failure, storage node failure or an entire site failure.
Requirement 4: Storage Hardware Aware But Hardware-Free
The fourth requirement is for the storage architecture to be aware of the physical hardware the customer uses within the architecture. Hardware awareness includes understanding and capitalizing on the flash and hard disk drive makeup of the storage media within the node as well as the capabilities of the nodes. While this may seem obvious, many systems limit their knowledge to the node level, rather than driven down to the composition of that node and “micro-managing” data down to the individual memory or storage media. Hardware Aware means that the architecture needs to re-route IO based on demand balanced against the capabilities of the system, node and individual media type, and re-balance data placement to optimize for the next action – write, read, replicate — based on system knowledge to give the application in the best chance to operate with the desired throughput.
On certain occasions, software-defined vendors took a couple of shortcuts to achieve these requirements by limiting the choice of hardware the support. At the most basic level, some circumvent the freedom of hardware choice goal, limiting the customer’s hardware options to one. As a result, customers then have to select a specific node to start with and could not deviate from that node as their environment scales. Vendors typically take these hardware limitation shortcuts because their storage software couldn’t understand differences in individual nodes.
For the storage infrastructure to empower the SDDC customers need to be able to dynamically choose the server node profile as they need it. They also need to utilize their hardware provider of choice and to change that hardware profile or even the provider.
Requirement 5: Data Center Aware
The fifth requirement is the storage architecture needs move beyond being simply data aware and be data center aware. This means that the architecture needs to identify the needs of applications, network traffic changes, external to the storage system, that may impact storage performance. The storage system should then take corrective measures by load balancing the applications to across geo-locations, racks, nodes and even the media in those nodes. For instance, why shouldn’t the storage software determine which data that should go on a hard disk drive vs. an SAS flash drive vs. an advanced NVMe drive or even RAM. Lastly the storage software needs to be cloud-aware, moving an application to a different location based on a high number of users accessing it from that site.
Organizations have been chasing the elusive software defined data center for a decade now, and the key stumbling block is the storage architecture. The first round of Software Defined Storage technologies took the organization only so far by disaggregating the storage control plane from the storage hardware plane. by providing not only the performance, that data centers need but also the automation, scalability, availability and data center awareness they need to succeed. Most importantly, the SDDC architecture needs to provide automation, orchestration and autonomous actions to deliver a self-service and seamless experience to users.
Sponsored by Datera