Splunk Storage Basics

Posted on January 9, 2017 by George Crump

Splunk is software for searching, monitoring and analyzing machine-generated data. While there is a lot of talk about “big data initiatives”, this is the big data that organizations have right now. Use cases include predictive analytics for IT operations, security and event management (fraud detection) and web click tracking. Essentially organizations want to monetize the raw data they are collecting. For Splunk to really deliver though it needs a storage infrastructure that is capable of keeping pace with its compute power.

A Splunk Architecture typically has three layers. The first is called the “Searcher”. The Searcher tier has the cluster master or controller and it serves as the front-end for users to generate search requests. The third layer is the “Forwarder”. This tier is made up of any system that can forward data into the Splunk cluster. The middle tier, the “Indexer”, is where the storage happens, both capacity management and I/O performance are critical.

Understanding Splunk Storage Management

Splunk manages storage by placing it into what it calls buckets. A bucket is essentially a directory. When data is first sent to the Indexer Tier it is sent to the db-hot bucket. Generally Splunk is directed to store this bucket on an all-flash array. The hot bucket is assigned a user-defined size limit and age limit, once it reaches either of these limits, the hot bucket is “rolled” to the warm bucket, which is another directory typically located on a separate decently performing hard disk-based storage system. Finally, after another set of user defined limits, the warm bucket is “rolled” to a cold bucket – which is typically either a high capacity NAS system or an object store. There is also an option to move data, again after a user defined set of parameters, to a frozen bucket, which is often either tape or the cloud.

The Challenge with Splunk Storage Architectures

Splunk does an admirable job of moving data between storage tiers. The challenge is that all this data management takes away from compute that would otherwise apply to analyzing data. There is also the time involved in moving data back and forth between these bucket types. If the default architecture is used; All-Flash Array for hot data, performance hard disk system for warm and capacity hard disk system for cold, the time it takes to move data between systems can be significant.

A solution might be to centralize all the tiers on to a single storage system and let that storage system move data as needed. A hybrid flash array that can integrate with Splunk would work well in this situation. It would off-load the management of data from the Splunk and while data would still move between tiers, that movement would be within the system, instead of having to traverse a network.

StorageSwiss Take

Most big data projects include a laborious step that involves figuring out how to collect data. Splunk provides analysis on data that the organization has been capturing for years, maybe decades. But it does present a few storage challenges that IT professionals need to be aware of. Join us in our on demand webinar “How To Design High Performance, Cost Effective Splunk Storage” to learn more Splunk Storage Basics, the challenges splunk creates, the flaws behind the typical Splunk storage designs and ideas on how to overcome them.

Watch On Demand

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: All-Flash, Capacity, Container, Flash, IOPS, performance, Splunk, SSD, Tegile
Posted in Blog

Splunk Storage Basics

Understanding Splunk Storage Management

The Challenge with Splunk Storage Architectures

StorageSwiss Take

Share this:

Related