How to Build a Multi-Cloud Data Pipeline for AI, ML and DL
Artificial intelligence (AI), machine learning (ML) and deep learning (DL) are experiencing rapid adoption in a number of use cases, including autonomous vehicles, pharmaceutical drug research, predictive maintenance, process optimization and targeted marketing. The problem is that these workloads require levels of performance, multi-cloud support, scalability and cost-efficiency that legacy storage architectures and traditional object storage architectures are not designed to deliver. Object storage is positioned to meet the requirements of scalability and cost-efficiency but often falls short in terms of performance and cloud readiness. Unlike many of its competitors though, SwiftStack claims it can also meet the performance and cloud challenges, citing its relationship with NVIDA as a key proof point.
Who is SwiftStack?
SwiftStack provides on-premises, software-defined file, S3-compatible object storage that is designed for throughput and scalable storage capacity. In the summer of 2018, it launched its 1space hybrid cloud data management software (access the StorageSwiss Briefing Note related to that launch here, as well as our video here). 1Space creates a global namespace across cloud, edge data center and core data center storage infrastructures for data lifecycle management, protection, and migration. Now, SwiftStack is launching reference architectures for AI, ML and DL with its partners NVIDIA and Cisco.
SwiftStack Launches its Multi-Cloud AI and ML Data Management Solution
The new solutions provide for data storage as well as full lifecycle data management across AI, ML and DL workflows, which include data ingestion, data enrichment, module training, inferencing, and data retention.
According to SwiftStack, its new solutions can ingest hundreds of gigabytes (GBs) of data per second due to its architectural focus on high write throughput as well as concurrency. This helps to avoid expensive graphics processing unit (GPU) cycles sitting idle. SwiftStack provides broad application compatibility, by being compliant with Portable Operating System Interface (POSIX) standards over the NFS and SMB storage access protocols and also supporting the S3 and Swift APIs for cloud-native workloads.
An additional value-add is that SwiftStack has created middleware that provides rich metadata tagging upon data ingestion. This metadata tagging opens up a number of functionalities for enterprises. Policy-based lifecycle management (e.g. cloud tiering and retention parameters), contextualizing data for workflows like supervised learning, and particular data sets may be searched for, identified and isolated.
SwiftStack designed its architecture for massive read bandwidth, as well as for independent scalability of compute and storage, which helps with training of neural networks and other modules. Its scalability and ability to support parallel processing also help from this standpoint. SwiftStack’s S3 integration allows for support of popular frameworks including TensorFlow. Additionally, cloud-bursting is possible via the integration with 1space, allowing the enterprise to access cost-efficient and scalable compute cycles for training.
StorageSwiss Take
SwiftStack aims to bring value to AI, ML and DL workloads in two key areas: accelerating time-to-insights and cutting costs. At the core of its ability to deliver on both of these criteria is SwiftStack’s ability to maximize both read and write throughput, which enable it to deliver sufficient levels of performance to these demanding applications with a hard disk drive (HDD)-based architecture. Storage Switzerland recognizes that, much of the time, “extreme” levels of storage performance are not actually needed – what is most important is ensuring that GPUs are saturated. This helps to optimize the price-to-performance ratio.
Another component of the SwiftStack value proposition is its metadata tagging, which greatly simplifies data governance and at the same time makes data more consumable by lines of business so they can extract as much value as possible as quickly as possible. In fact, SwiftStack claims it can cut workflows down from days to hours. In addition to making the most use out of infrastructure components, SwiftStack is also helping enterprises to make the most use out of their data scientists.
Finally, SwiftStack’s ability to create a centralized global namespace alongside its broad API and storage protocol support facilitates collapsing storage silos. Applications and users can have unified access to the same set of data across regions, as well as across edge, core and cloud infrastructures. These capabilities coupled with the emphasis on high throughput will go far in particular when it comes to supporting inferencing workflows, which must occur across all of these locations.