Previously, Storage Switzerland has discussed the value of computational storage, or adding compute power directly to storage media to enable data to be processed in place, in serving modern workloads – as well as its impact on future forward data center design. In this blog we will dive more deeply into key use cases for computational storage.
Hyperscale Data Centers with Parallel Workloads
Hyperscale data centers were initially created by the world’s largest cloud service providers, including Amazon, Google and Facebook. They operate on the scale of thousands of physical servers and millions of virtual machines, to deliver services to millions or billions of users – creating the need to execute a wide variety of workloads in parallel. Moreover, hyperscale architectures are becoming popular hosts for artificial intelligence (AI) and machine learning (ML) applications that require operations such as real-time, complex and parallel indexing and pattern matching.
To achieve these ends, hyperscale data centers must optimize physical space and resource utilization, and at the same time minimize power and cooling requirements. Even moving data between devices within the same rack can substantially impact the time and resources required to process this data. Allowing applications to be processed within the storage itself via computational storage, reduces network resources that are required through minimizing the amount of data that must be transferred, while enabling the host CPU to be scaled across a larger number of workloads and storage devices. It also facilitates multi-threading, which accelerates application execution while at the same time keeping the power envelope in check.
Real-time analytics applications are becoming popular tools to help businesses to obtain forward-looking insights that facilitate competitive advantage. For example, a retail company might want to analyze point-of-sale data in real time for fraud detection purposes. The problem is that these applications must scan massive amounts of data to identify the subset of information that is relevant to the query, before executing the analytics request. Moving this volume of data out of the storage system, across the network, and into main host memory incurs time and latency penalties that real-time analytics applications can ill afford.
To mitigate this performance penalty, organizations could build a more robust network – but this is expensive and complicated. Organizations also might choose a direct-attached as opposed to shared storage approach, but this creates storage memory and capacity and CPU inefficiency problems and at the same time creates data silos that can inhibit business insights. Through computational storage, data that is obviously not relevant to the analytics query can be disqualified by the storage media – thus vastly reducing the volume of data that must be moved to the host CPU. Performance is greatly accelerated, and resource utilization improved.
Organizations are collecting and must analyze more and more data at the edge to support business operations and competitive advantage. In these environments, data center floorspace and power are at a premium. This means that infrastructure density is paramount. As discussed in previous examples, computational storage enables more robust processing power per host CPU resource without scaling the storage environment, resulting in faster processing from better utilized and power efficient infrastructure resources. Meanwhile, minimizing the volume of data that must be migrated between infrastructure resources, as well as the frequency of data migration, reduces strain on edge networks.
Content Delivery Networks
A content delivery network (CDN) is a group of servers that are interconnected but geographically distributed, that accelerate content delivery to users. With the ongoing data boom, along with the advent of richer content such as videos, and with the shift to more distributed enterprises, CDNs are being used to minimize network latency. CDNs must find a way to handle a large volume of transactions as quickly and as cost-effectively as possible, to optimize their profitability. Access control and encryption are key roadblocks to achieving this goal, as they put a strain on valuable CPU resources. Computational storage allows these functions to occur within the storage media itself, thus reducing infrastructure costs and helping to accelerate transaction processing.
Modern businesses rely on data-driven applications for real-time decision making and collaboration. Against this backdrop, storage responsiveness is king, as it can make or break the application’s ability to process very large quantities of data, typically in parallel, to facilitate business outcomes. Computational storage stands to solve the key bottleneck in this equation: the fact that internal transfer speeds of solid-state disk media are so fast they are overwhelming interconnects, including the Peripheral Component Interconnect Express (PCIe) Bus. This chapter has highlighted four key use cases of computational storage: hyperscale data centers, real-time analytics, the intelligent edge, and CDNs. As the set of compute and storage I/O bound applications continues to increase, Storage Switzerland expects the use cases for computational storage to also expand.
Sponsored by NGD Systems
Sign up for our Newsletter. Get updates on our latest articles and webinars, plus EXCLUSIVE subscriber only content.