Artificial intelligence (AI) stands to dramatically impact business and IT operations alike, but in these early days, it is difficult for IT professionals to distinguish vendor buzz from reality. This is especially true when it comes to the storage market. All vendors, ranging from premium performance NVMe systems developers to data protection software and cloud service providers, are talking about enabling the business to make the most out of its data, and to drastically reduce storage-related costs and complexities. In this article, we will discuss the two ways in which AI should factor into the storage roadmap – in serving AI applications for lines of business, and in facilitating more streamlined and predictive operations.
Storage for AI Applications
AI applications are emerging, promising to uncover new insights that are a foundational competitive advantage for the business. For example, AI can help the business to react more quickly to market dynamics, to discover new revenue opportunities, to deliver differentiated end customer experiences, and to help employees become more productive. While most organizations are still in the process of figuring out where and how they should integrate AI into their business processes, these applications will increasingly be deployed at scale in production. Storage managers should begin to plan for the resulting impact on capacity, connectivity and performance requirements.
AI applications impose new demands on the storage infrastructure. They are driven by the “three Vs” (volume, variety and velocity) that are commonly accepted as defining principles of all big data-oriented applications. That is, they require a massive amount of data that is heterogeneous in its file type and how it is created, accessed and stored, to be both collected and processed in real time. Accurate AI models require a holistic vantage point into all associated data and that that data be processed very quickly to keep up with changing dynamics.
From a storage infrastructure perspective, these applications necessitate a centralized global namespace and multi-protocol support, so that data can be tiered (ideally automatically via an intelligent management platform) across performance-oriented flash and lower-cost, more scalable and capacity-oriented object storage. Most AI applications require premium tiers of performance across storage (typically via solid-state disks and non-volatile memory express, or NVMe, access protocols) and compute (typically via graphics processing units), but it is cost-prohibitive to store and process all data in this manner. Tiering and a global namespace are also important because central processing units (CPUs), graphics processing units (GPUs) and object stores might exist in the cloud, for agility, scalability and cost reasons. Additionally, organizations might want to integrate a turnkey solution at an edge environment to capture and process in real time, data that is generated at that location. At the same time, however, all of this data must be readily accessible by the AI application to ensure the quality of insights.
Storage managers should consider closely how storage solutions are designed to optimize input/output (I/O) operations traffic, for minimal latency and maximum data throughput. Traditional direct-attached and shared storage architectures may require tradeoffs in areas such as performance, resource utilization and complexity. New solutions including computational storage, or the approach of adding compute resources directly on the storage drive, are emerging in kind. The ability to run multiple I/O operations in parallel is an important requirement to bear in mind to avoid bottlenecks. The greatly increased queue depths and numbers of queues inherent in NVMe make it an attractive option in this regard, if addressed correctly.
Finally, and far from least importantly, data reduction and efficiency services must be maintained to optimize utilization of storage capacity, but they must be executed in a way that minimizes host compute overhead, which negatively impacts the AI application’s performance. The system should have the ability to turn deduplication on and off depending on the value that it adds relative to the resulting performance impact. For example, Internet of Things (IoT) data is typically all unique, so trying to deduplicate it does not add value. Data quality and governance services are also important to ensure that the AI application is providing quality insights and that data is being stored and accessed in a way that complies with data privacy regulations.
AIOps: How Real is It, and What is the Impact?
The flip side to this equation is the application of AI to improve storage operations and to better plan for capacity upgrades and refresh cycles.
“AIOps,” or the infusion of AI for IT operations, promises to radically simplify day-to-day management tasks associated with the entirety of the IT stack – on-premises server, storage and network systems, cloud services, and applications alike. It applies big data analytics and machine learning (ML) to telemetry and other log and performance data that is generated by systems, cloud services and applications to automate routine tasks, such as storage capacity provisioning, and to identify and resolve issues faster and with greater accuracy. Automation is a prerequisite to AIOps, but to become truly cognitive, a solution must take the next step and apply ML to become predictive (for example, predicting when a storage system will run out of capacity based on historical usage patterns).
As applications become more data-driven, the storage environment is an increasingly determinant factor in the AIOps equation. Storage environments are becoming more heterogeneous to balance aggressive cost and performance requirements; applications must be able to access and write data in real time to avoid application performance bottlenecks, and at the same time, the quantities of data that must be stored continue to grow exponentially. At the same time, data protection and retention requirements are becoming fragmented, for example per application and region. Meanwhile, because data has become so foundational to business operations and competitiveness, storage managers are asked to free up as much time as possible from day-to-day management duties, in order to support strategic initiatives that directly generate revenue.
AIOps stands to enable storage managers to more easily oversee infrastructures that have become heterogeneous, dynamic and globally dispersed by default, as a result enabling the IT staff to focus on more complex and serious issues. It also stands to accelerate the time it takes to identify and remedy issues, thus helping to increase business continuity. An AIOps platform has the ability to continuously monitor the storage environment in a way that humans cannot. Because it can understand dependencies, it can also quickly conduct extensive root-cause analysis across not only just the storage, but more broadly the IT stack. Furthermore, it stands to better optimize resource utilization, and to better plan for new infrastructure purchases. All of these capabilities may benefit from tapping into data across the vendor’s customer base, as well. For example, it may be possible to predict a system outage based on similar parameters at another customer’s site.
Success of AIOps platforms is entirely dependent on quality algorithms and access to comprehensive, quality data, which predicates the actions that the AIOps platform will take. Storage managers should be aware that storing and governing data fueling AIOps and maintaining algorithms can require a tremendous amount of time and investment. Most notably, the AIOps platform needs to be taught business requirements, such as application service level agreements (SLAs), and it needs to be capable of evolving this understanding as application and business parameters change. This is because this is how the AIOps platform will prioritize how issues and security vulnerabilities are dealt with (the latter of which is another key potential use case for AIOps), and also how it will formulate predictive insights. Many tools, including system monitoring and helpdesk ticketing, provide AIOps data and insights; storage managers require a tool that factors the storage environment into this stack in a way that is meaningful to and can be consumed in a useful way by humans.
Storage managers have an opportunity to take a strategic seat at the table when it comes to advising the business on how to strategically integrate new AI applications and AIOps technologies for competitive advantage, greater cost efficiency and better application reliability. When navigating the slew of solutions that are becoming available, it is important for storage managers to be closely in tune with future-looking business requirements. AI stands to return significant value, but it must be implemented carefully to ensure return on investment, as designing and implementing a successful approach requires investment in expensive IT staff and technologies.