Don’t get caught unaware of the storage challenges of your future Splunk project. If nothing else, at least be aware of the amount of storage that a Splunk project can generate and prepare yourself for the cost of acquiring and managing that much storage.
Splunk projects, while offering incredibly valuable data, often cause their owners to wade into the deep end of the storage management pool. Managing a few terabytes of storage is one thing; managing dozens or hundreds of terabytes is an entirely different process.
Splunk’s design is to analyze machine-generated data. Research shows there’s going to be a lot of that, with one study showing that by 2020 over 40 percent of all data will be machine-generated – and that’s a lot of data. Combine that data point with this: for every terabyte of machine-generated data Splunk analyzes, it needs 23 terabytes of storage. Since it’s very easy to generate several terabytes of machine-generated data, that means it’s very easy to create hundreds of terabytes of data using Splunk.
Again – the data Splunk creates may indeed prove invaluable to the company using it. But if the company is unprepared for the acquisition and management costs of that amount of storage, it will find itself a very unhappy Splunk customer. Another study says that over 50 percent of typical Splunk projects costs come from buying and managing the storage for that project.
Creating dozens to hundreds of terabytes of raw storage requires a company to use enterprise grade storage that uses modern day data preservation techniques. That way you ensure the loss of single or multiple drives or sites does not cause the loss of any of the valuable data that Splunk generates.
There are myriad companies that will sell you products to meet these demands. The challenge with that is most of them come with the traditional challenges of managing that amount of storage. You will need monitoring software, backup software, and some type of system to get the data offsite for disaster recovery purposes.
Modern companies may look at this and then say, “We’ll just send it to the cloud!” This does offload many of the data management challenges, and ensures that data is in multiple locations; however, it creates other challenges. Directly integrating Splunk with cloud providers is the first challenge, the latency and bandwidth of the Internet is the next one to think about.
StorageSwiss Take
Look before you leap. Consider the storage costs of your Splunk project before embarking on it. If you’re already knee deep in such a project, consider the risks to your project if you’re not using modern self-protecting storage to hold your data. To learn more about the storage challenges Splunk creates join Storage Switzerland and ClearSky Data for our on demand webinar “Which Storage Architecture is Best for Splunk Analytics?”


Curtis, Where did you get the 23 to 1 figure for Splunk storage requirements? What accounts for the large increase?