The first step in providing an answer to any question is to narrow down the choices. Analytics applications, both modern and legacy, are no different. In response to a query, they must scan massive amounts of data to select the subset of data that needs to be analyzed. The first step to finding a needle in a haystack is to reduce the size of the haystack before looking for the needle.
For many applications, the first step is disqualifying data that obviously is not part of the answer and then looking more closely at those data sets that might be part of the answer. Moving these data sets out of the storage system, across the network and into main host memory creates an architectural challenge to overcome.
The time and latency penalty incurred in making these transfers lead many organizations to the conclusion that their only solution is to store data locally. Direct-attached storage (DAS) however creates resource inefficiency problems. For instance, several studies have indicated that hyper-scale and rack scale server environments, common in analytic processing, use less than 1/3 of their available storage capacity.
An option is to build a more robust network and reduce the performance penalty when using shared storage. The challenge with building a more robust network is it is expensive and complicated. The hope is that gains in efficiency make up for the expense and complication.
There is, however, another option. What if the storage devices could run applications in-place where the data resides, on the SSD, and perform the inference on the data set without the data ever having to leave the SSD? This concept is called computational storage and enables analytics to run on the SSD and only the relevant data returned. Computational storage doesn’t burden host or network resources. An example computational storage use case is running MAPR locally on the SSD without a host. Another example is logging IP addresses in traffic patterns natively. Computational Storage allows storage to do significantly more than hold the data. It allows for real-time processing, analytics, edge computing, and other use cases to run as close to the data as possible.
Introducing NGD Systems
NGD Systems delivers SSD storage solutions with embedded processing capabilities which it calls “In-Situ Processing”. With embedded application processors residing in the same compute chain as the data, users can perform data analytics directly where the data resides and without the latency overhead of data or network movement. Use cases aren’t limited to AI, Machine Learning, and Big Data though. NGD expects customers to use their Computational Storage for search, grep, filtering, as well as encryption and key management. Additionally, there is a vast potential in the emerging IoT markets were edge-based and Fog Computing processing is required.
The biggest challenge facing modern data centers is scale; not can they scale but how do organizations build data centers big enough to contain and supply power to that scale. The answer to the scale challenge is efficiency, something that is a weak spot for IT. NGD’s solution creates a ripple effect of efficiency by moving application execution into storage. By adding computing power to storage devices, without increasing the physical storage footprint or power envelope, the NGD solution reduces application response times, network bandwidth requirements AND host server computing requirements. The result is smarter applications, effective bandwidth utilization and smaller data centers that perform better.