Scale-out storage provides data centers the ability to create infrastructures that can scale to meet the capacity demands of hyper-scale environments. Flash provides data centers the ability to meet the performance demands of the hyper-scale environments. The missing link is how to make sure there is enough compute available to not only manage the storage resources, but also make sure there is enough compute available for running applications and services.
The Storage Compute Challenge
Compute provides the power for storage to move data to and from the actual storage media as well as to manage the data on that storage. Generally speaking there is more than enough compute available to perform these functions. The challenge is making sure compute is used efficiently so it is properly allocated between applications, services and storage IO.
The problem is that most scale-out architectures, whether they are dedicated to storage or share CPU with application and services, are hard set to the amount of CPU power they will use per node. In the dedicated architecture, that means all the CPU is used and as a result a lot of CPU resources go to waste.
In a hyperconverged architecture, it means the storage process will grab a hard set amount of compute resources. It will not release those resources when it does not need them nor will it grab more CPU resources, assuming they are available, when storage IO demands it. The hard set configuration makes it very difficult for a hyperconverged system to offer any meaningful quality of service guarantees to the applications and services it runs.
Introducing Project Longhorn
Nowhere is the storage compute challenge in hyperconverged architectures more obvious than in container environments, like Docker. In an instant these environments can spin up thousands of new instances of a service or an application. The impact on the storage infrastructure is severe. It can in a moment go from everything operating optimally to not being able to deliver the IO performance these instances need.
If the organization designs the architecture so CPU resources are over-assigned to storage, then it will waste those resources most of the time.
Project Longhorn is a block storage system. It essentially containerizes each block storage volume into a dedicated controller, and runs each controller on the same host the storage is consumed. The allocation of more volumes results in the allocation of more controllers, automatically scaling storage as more instances create more volumes. Compare this to legacy architectures where one controller group has to support all the available volumes – then keep in mind that in the containerized world thousands of volumes are not uncommon.
The controller-to-volume methodology also allows for a more resilient and transportable service. If a physical host fails, another controller from another host can be created and pick up the volume. If an instance needs to move to another physical host it can move the volume and processing with it.
Project Longhorn allows the pooling of both local and networked storage resources. Each volume, upon creation, can have its capacity, IOPS and protection level (synchronous replicas) requirements set.
Modernized Data Protection
One of the more interesting aspects of Project Longhorn is how seriously it takes data protection. Volumes can be setup so that the primary storage is internal to the host it is running on, but then synchronously replicated to a shared storage array. Multiple replicas can be created based on the criticality of the volume and the need to move instances to other systems. Each replica gets its own container to host the data processing, so there is almost no performance penalty for multiple replicas.
Project Longhorn can take up to 254 snapshots per volume. It can also backup those snapshots to an NFS or S3 repository. Essentially Project Longhorn has built in backup both locally and to the cloud. It can schedule snapshots to execute in a recurring pattern and allows for the setting of policies that determines how long to keep each snapshot.
Access Not Just for Docker
Access to volumes can be via Linux kernel volumes or by iSCSI. Linux kernel access is ideal for volumes supporting Docker containers and iSCSI is appropriate for KVM and VMware volumes. That means that this single storage architecture could support the legacy virtualized data center as well as the modernized data center pushing the envelope with production containers.
Project Longhorn can be fully automated so when performance demanding instances are started, it will allocate an equal amount of performance to the storage architecture.
What to Do with Project Longhorn?
Now the challenge is what to do with it? For experimenters and tinkerers the code is ready to go. But Project Longhorn, in its current state, is not ready for the enterprise – and really that may not be the intent. The next step is for a company or two to pick up the code and create a new storage system out of it, one that is able to support containers, VMware and KVM all in the same system. It could be an enterprises packaged version of just the software, adding the few missing features.
The system could also be turnkey, including software and hardware. We’ve seen storage companies built by jump starting their development by using ZFS as the underlying file system, adding features and support. There is no reason a company couldn’t do the same thing with Project Longhorn.
There is no doubt storage architectures can scale to meet the performance and capacity demands of the modern data center. But at what cost? Most storage systems have to over allocate storage controller resources in order to make sure they will be able to keep up with volume growth. At the same time, most storage software developers have not done a great job of supporting the realities of multi-core processors.
Project Longhorn is lighting the path to a more efficiently scalable architectures that meets performance guarantees while keeping costs under control. What is the most striking is the simplicity of the design. Solve a big problem, scaling storage, by reducing the number of variables. In Project Longhorn’s case, one storage controller per storage volume.