IT is under pressure to move to a self-service environment, where users and applications “order up” IT on an “as needed” basis. Behind the scenes the IT infrastructure is supposed to respond and adapt to those orders as they come in. Compute and networking resources tend to conform well to this new agile world because their use is temporal. However, storage is the most difficult aspect of the architecture to make agile. Data is not temporal, it has gravity. Even worse, data is traditionally bound to the storage (arrays, HCI appliances, whatever), the stacks (VMware, OpenStack, whatever), and the sites (on-prem, off-prem, public cloud). Data gravity plus infrastructure gravity makes a real mess for any self-service aspirations.
Data is Growing
The fact that data is growing should come as no surprise to any IT professional. What may take them by surprise is the rate and nature of that growth. While users are certainly contributing to the growth of data, the primary source for most organizations are machines. This data can come from IoT devices, sensors and log data from equipment. Moreover, the value of this growing data is not from singular isolated data sources, but from combining across multiple data sources (and usually multiple data formats and sites) in integrative analytics workflows.
Data growth creates two challenges on the storage infrastructure; the first is obviously the capacity required to store that information. Second, the storage software has to be able to handle the number of files/objects these devices create, and all with consistent performance where business value is increasingly driven by consistent low-latency results – at any scale.
Data is Cross-Workload
In the legacy data center the application or device creating data was typically the only thing to access it. And the users’ access was channeled through that static silo. But now data created by a database application may be accessed by a variety of systems including a data warehouse, analytics processes and test/dev environments. Each of these systems either wants to work with live, production data or more likely a recent copy of that data. At the same time the users for these systems may want to access data not created by the database system and correlate information between the two (or three or four or N different sources). Again, they may want to combine the transactional data from their traditional database/datawarehouse with their IoT telemetry data from the cloud and their web data from their managed services provider. Finally, these systems may create their own data that other applications may need to access.
Cross-workload data access creates an access challenge for the storage infrastructure and any self-service objectives for its users. IT needs to provide these secondary processes and needs secure access to the data they need. It also needs to provide space efficient copies of this data to these processes so that production data is protected but storage capacity requirements don’t become worse than they already are. And again, consistent low latency and performance SLAs are essential.
Workloads are More Variable
Modern workloads are also much more variable than in the past. They range from legacy scale-up applications to modern distributed systems. In some cases the applications are creating and storing structured data, in others they are analyzing unstructured data. Even within the data type, how data is accessed may vary significantly, ranging from random I/O to sequential I/O. Also some workloads need to store data and have it preserved, cost effectively for a very long time, where others need to have high performance response time for a short period of time.
The challenge for the IT storage infrastructure is some systems are better at high transaction random I/O, where others are designed for long term storage. While there are systems that try to consolidate these use cases, they often tend to be more expensive, at least from a capital expenditure perspective. More than likely IT will buy multiple use case specific systems. The result is storage systems are proliferating.
All-Flash has let us down
The simple solution to all of this workload variability is to put the large majority of data on flash storage. There are all-flash arrays available for the three broad categories of data that organizations need to store; databases, virtual environments and large unstructured datasets.
The first problem is, once again, we have three storage systems to manage, which while better than a dozen storage systems, still increases operational complexity.
The second problem is that most all-flash arrays are only available as turnkey hardware/software appliances (either arrays or HCI) available from major vendors. Even though most of the storage vendor’s investment is in software today, there is a considerable markup on the hardware, making flash systems expensive from a capital perspective, especially three of them.
Finally most all-flash systems have no or limited connectivity to the cloud. The data is still stuck in the appliance, pulled down by the infrastructure gravity. As modern IT becomes more hybrid in nature, seamless connectivity across sites and clouds is a critical requirement. The organization may simply want to use the cloud for disaster recovery but it may also want to use cloud compute to burst workloads when demand is high. And the rising demands for integrated analytics across different data sources, both on-prem and in the cloud, mean the “hybrid cloud” is becoming much more than static tiering. Lack of connectivity (or better yet on-demand access), while not impossible, makes cloud movement far more complicated than it should be.
In order to modernize and become more self-service for business critical workloads like analytics, BizDevOps, and hybrid clouds, data centers need a storage infrastructure than can provide complete storage services to legacy and modern workloads, across increasingly diverse stacks and sites of data sources. If that system is truly software, freeing IT to bring their own hardware, that would keep costs down and allow standardization on flash media, which should help eliminate the storage per use case model with which IT is currently struggling. And the most simple and powerful data self-service would come from a file system, with its application intelligence and single global namespace advantages.
Sponsored by Elastifile