Solving the Data Management Challenges Created by Growth in Splunk Usage
One of the biggest challenges when it comes to generating business intelligence from large, distributed datasets like Splunk is meeting very large capacity and very fast search performance requirements without breaking the budget. Splunk’s new SmartStore capability is an important step toward achieving this objective, because it decouples storage and compute for independent scalability. It also enables creating a middle, “warm” cache of storage. These combined capabilities enable storing the majority of data (80-90%) on lower-cost, more capacity-oriented storage. This drives down the cost of compute and storage alike without impacting search performance. To take advantage of this new capability, the storage infrastructure needs to be correctly optimized. In this blog, we will explore the merits of SwiftStack as an ideal SmartStore storage infrastructure.
SwiftStack provides on-premises, software-defined S3-compatible storage that includes 1space hybrid cloud data management software.
SwiftStack is an on-premises storage solution as opposed to a cloud service-delivered storage tier. It is also fully distributed architecture, meaning it enables public cloud-like levels of scalability to support the large amounts of required capacity. This makes it ideal for SmartStore because organizations may need to store petabytes of data and retain that data for years to support their Splunk implementation. “Renting” this amount of storage capacity that is growing rapidly every day can quickly become prohibitively expensive in the cloud model
SwiftStack’s architecture is also easy to scale; new systems can be automatically recognized and added to the cluster. Additionally, hardware profiles, for example for high-capacity or CPU-rich systems, can be created for automated setup. This scalability is non-disruptive to the applications, and expansions may be as incremental as needed.
Because SwiftStack is open source and software-defined, customers have the freedom to intermix and change hardware platforms, for an infrastructure that is tailored to their evolving application requirements. It is also flexible enough to integrate new or more cost-effective hardware as it is introduced. SwiftStack’s support for Splunk SmartStore is an ideal example of this flexibility in action. SmartStore didn’t exist a year ago, yet SwiftStack customers can now easily integrate their Splunk environments into their SwiftStack architecture.
When data is on-premises, the customer is responsible for its availability. Further, with Splunk SmartStore, the warm data is protected by the underlying storage system, not by Splunk. SwiftStack was architected for high data durability, in large part because nodes may be deployed across multiple racks and even multiple geographic locations. Standard configurations offer between 9 and 14 nines of availability.
SwiftStack creates a global namespace for data called “1space”. It can replicate data between and automatically distribute it across systems, non-disruptively, based on user-defined policies. Data can be placed to maximize availability, to ensure compliance with regional regulations, and for cost efficiency according to performance and data access requirements. Redundancy also enables maintenance and upgrades of systems on a rolling basis, without impacting applications. There is no need for third-party software or manual intervention. SwiftStack’s 1space capability also enables migrations of workloads to and from the public cloud, for instance when elastic cloud compute cycles or application services are needed.
In addition to a durable architecture, SwiftStack also provides erasure coding (EC) to facilitate data availability. EC segments data objects into fragments that contain redundant data pieces, and then stores those fragments across storage nodes that exist in multiple locations or across various storage media types. In the event of media failure, the data is still accessible. In the background, missing segments are recreated so that protection levels return to compliance.
EC provides additional redundancy without the overhead of storing multiple full copies of the data set. But, EC can come at the expense of system performance, because the process of segmenting and reconstructing data is computationally intensive. This is especially true in Splunk SmartStore implementations, which are typically multi-location at minimum, and frequently are multi-region. SwiftStack’s EC capabilities overcome this challenge. They can span multiple sites and regions for increased durability, and they were written to minimize throughput and latency overhead. 1space erasure codes entire copies of the data in each region to reduce the need for fragments to be moved across regions.
Splunk requires not only high-capacity, low-cost storage, but it also requires fast search. Search performance cannot be sacrificed for the sake of economics. In fact, the intent of the “warm,” storage tier in SmartStore is to accelerate search performance by quickly serving large amounts of data to the indexing node. SwiftStack’s distributed architecture enables delivery of data from all available nodes in the storage cluster to the indexer node, in a parallel fashion. Throughput increases as systems are added. Additionally, SmartStore’s ability to intelligently cache data and place data on the right storage tier according to access patterns also accelerates search performance.
Conclusion
Splunk SmartStore enables the creation of a distinct storage tier for long-term indexed data and uses a cache manager to ensure optimal data placement. It stands to significantly improve the efficiency of compute and storage resources, but in order to do so it requires an optimized storage infrastructure. SwiftStack should be evaluated as an ideal infrastructure for Splunk SmartStore. It has a software-defined and scalable, distributed architecture, a unique approach to erasure coding, and it can serve large amounts of data to the indexer node in parallel for fast search performance.
Interested in learning more about SwiftStack and Splunk SmartStore? Our on demand webinar, “Rearchitecting Storage for the Next Wave of Splunk Data Growth,” has additional color.