The need to solve data problems at scale is arguably the most prominent storage driver today — whether hardware, software or cloud services. In this regard, Splunk is quickly becoming a textbook use case. Splunk is an application that analyzes machine data to create business intelligence, for example around application performance, customer adoption trends, IT operations and security. Splunk is becoming established in terms of its ability to move the needle for competitive advantage. For example, it is facilitating better anomaly detection, better business decision making, and improved efficiency. As a result, customers are beginning to want to analyze more data (whether newly created data, or older data for richer insights). According to Splunk, it now has 18,000 users worldwide. The average user ingests 250GB of data per day, but that figure can quickly grow to 10TB of data ingest per day as the company capitalizes on all of its available data sources. In fact, some enterprises are ingesting 30-50TB per day, according to Splunk storage provider SwiftStack.
The problem with this growth is that, historically, Splunk implementations are not designed such that compute and storage may be scaled independently. As capacity requirements grow, the enterprise must also invest in compute that might be unnecessary. Additionally, Splunk infrastructures are designed to deliver Tier 0 performance for fast time-to-insights. Storing all of this data on solid state drives (SSDs) is cost prohibitive. Deploying a primary storage system that integrates both SSDs and hard disk drives (HDDs) is a step in the right direction, but typically these architectures do not scale out. They also typically lock the customer into a specific vendor or technology.
Splunk introduced its new SmartStore capability to lay the foundation for more efficient Splunk implementations at scale, by decoupling compute and storage and enabling a middle-ground, “warm” tier of more capacity-oriented storage. To achieve this end, SmartStore requires an optimized underlying storage infrastructure that can deliver scalable, lower-cost capacity without sacrificing on search performance. The infrastructure should be modular in design to support this scalability. Data durability and disaster recovery should also be supported in the system’s architecture, as well as in the form of integration with public cloud services.
For more detail on Splunk SmartStore and how to design a storage infrastructure to support it sign up for Storage Switzerland’s on demand webinar, “Rearchitecting Storage for the Next Wave of Splunk Data Growth,” with SwiftStack.