Previously, Storage Switzerland wrote about the need to be able to scale compute and storage independently, when it comes to serving distributed Splunk data sets for business intelligence. In this blog, we investigate Splunk’s recently launched SmartStore capability, and how it addresses this requirement while at the same time creating a middle “warm” tier of storage to save customers money without an impact to search performance.
Before Splunk SmartStore
It’s no secret that the growing requirement for business intelligence is requiring more data to be ingested and utilized than ever before – and for that data to be retained for longer periods of time. Splunk continues to become a more important business tool against this backdrop, but legacy data center infrastructures that tightly couple compute and storage are limiting enterprises’ ability to scale their Splunk implementations cost effectively.
Most enterprises are running their Splunk implementations on one of two types of storage infrastructures:
- A hot tier of flash storage that provides fast indexing capabilities, but that is very expensive from a capacity perspective.
- A colder tier of network-attached storage (NAS) that provides less expensive capacity than a premium flash solution, but that is still more expensive than object storage alternatives that are becoming more common for capacity-oriented workloads. Additionally, a cold tier is unwieldy to search and performance degrades as capacity expands due to the file system limitations of the typical NAS system.
Not only is the performance gap between these two types of storage tiers very great, but at the same time, it is impossible to scale compute power and storage capacity independently. This leaves the IT professional in a situation where they must choose between adding unneeded storage capacity to accelerate response time, or vice versa. Expensive resources must be underutilized, in order to meet indexing or data retention requirements. An architecture that decouples compute and storage to create a middle ground, “warm” storage tier is needed.
Enter Splunk SmartStore
Introduced in Splunk Enterprise 7.2, the SmartStore data management capability was designed to enable more efficient and scalable Splunk implementations by decoupling storage from compute, thus enabling IT professionals to appropriately size indexers for the Splunk ingest load and storage capacity for retention requirements.
SmartStore intelligently caches data to the fastest available media, according to the data’s likelihood of being searched for. It creates two data caches – a “hot” cache for data that is being actively indexed, and a “warm” cache for data that needs to be readily accessible for queries. These caches are sized according to typical ingest and query patterns, and they may be adjusted as business needs fluctuate. Data is automatically migrated across these tiers, and then ultimately to an S3-compatible object store, which is intrinsically low-cost, scalable and searchable, for retention. SwiftStack is an example of an object store that has been verified for compatibility with Splunk SmartStore.
SmartStore retains metadata for all indexed files, so that they can quickly be recalled from within the SmartStore cache or from the object store – to avoid an impact to query performance. SmartStore-enabled indexes can coexist alongside existing indexes in a dual mode configuration to increase operational flexibility, and Splunk also provides a method to convert indexes to SmartStore. Splunk administrators will find that upgrading or replacing indexer hardware is simplified since, with SmartStore, the majority of Splunk data resides on the warm storage tier and not on the indexer’s internal drives. Furthermore, SmartStore’s intelligent tiering helps to reduce the number of indexers required, as a result reducing cost and complexity. Splunk cites infrastructure cost savings of 60%-75% for a moderate size environment when SmartStore is activated.
Impact of SmartStore
SmartStore promises to facilitate a right-sized, and as a result more cost-effective, storage infrastructure for Splunk. Our next blog will discuss further how the storage infrastructure itself should be architected to take advantage of the new capability.
In the meantime, register for Storage Switzerland’s on demand webinar with SwiftStack and Splunk “Rearchitecting Storage for the Next Wave of Splunk Data Growth”, and receive our latest eBook, “Doing More With Splunk Data…For Less.”