If you ever ask someone whether your data should be on-premises, in the cloud, or both, the answer is often different depending on the source. Cloud vendors think you should store all of your data in the cloud. Traditional storage vendors think you should store all of your data on-premises. There are also products that will allow you to do both. Asking an independent consultant would probably get you the age-old answer, “It depends.” In my opinion, that is the only correct answer. Let’s take a look at these choices to see why.
When To Use On-Premises Storage
One of the first reasons to store data on site is if you can save more money than storing it in the cloud. This is really possible only if you are able to get high storage utilization, and that is definitely not a given. But eliminating waste is the key to storing data less expensively than the cloud. This includes using commodity-based storage designs similar to those used by cloud providers, not buying that hardware until you absolutely need it, and making sure that all storage systems are fully utilized before buying more capacity. Generally speaking, all three of these ideas imply scale out storage.
Another reason to store data on site is performance vs the laws of physics. One challenge of using the cloud comes from the fact it’s on the other side of a WAN connection. If that latency impacts user or application experience, then locality is another area where on-premises storage has an advantage. In addition to latency, bandwidth can also be a concern. If you are creating and/or reading a significant amount of data in a short amount of time, getting that data to and from the cloud can be problematic. The amount of bandwidth required to get the data to and from cloud storage can be quite expensive, and the time it takes to transfer large amounts of data to the cloud may be significant as well. If your application is running onsite and your storage is in the cloud, you will see a significant reduction in performance. Local caching appliances and other workarounds can mitigate this to some extent, but can add complexity and additional cost.
A final reason to consider storing data on-premises is security and compliance concerns. Some data may simply need to remain on-premises for regulatory reasons or security concerns. It might not be the entire dataset, but it could be particularly sensitive portions of the dataset.
When to Use Cloud Storage
If you are able to migrate both your data and the workload that is using the data to the cloud, then performance may actually increase. First, the latency problem is solved since the application is now right next to the data. Second, once the application is in the cloud it’s easy to add additional computing resources to generate faster results. Need a bigger, faster server with more RAM, or simply more cores? Push a button and it’s yours in the cloud for the duration of the workload. The cloud can offer you a much bigger, faster host to run a short-lived application, without having to pay for that server long term.
Just as computing power can be scaled immediately to meet short term demand, so can storage. If your storage experiences spikes in demand followed by periods of much lower demand, ensuring all of your storage hardware is fully utilized can be quite problematic. In the on-premises model, if you purchase a lot of storage in order to meet the performance or capacity needs of a project, you’re stuck with that storage once the project is done. This is where the elasticity features of the cloud make perfect sense. Large spikes in demand can easily be met at the push of a button, and when the large spike is over, it returns the extra capacity.
Can We Effectively Use Both On-Prem and Cloud?
Depending on the workload, organizations ought to be able to dynamically decide where their data should best reside: on-premises or in the cloud. Perhaps they have a workload they know they want to keep a long time but aren’t going to touch it for a while, it might make sense to move that to the cloud. And if the needs of that particular dataset change, they should be able to easily pull it back on-premises.
Or perhaps they know they have a large working set of data that will only live for a short period of time. They ought to be able to make a decision whether the costs of putting it in the cloud outweigh the costs of keeping it locally for performance. On one hand, they’ve got the cost of the bandwidth and the monthly bill at the storage vendor, as well as the time to upload a large dataset. On the other hand, they could have an immediately accessible system, but they’re going to pay for it for a long time if they buy additional capacity just for this project. As mentioned before, some data may need to remain onsite for security or compliance reasons.
The challenge with dynamically moving workloads around is the physics of moving data to and from the cloud tends to get in the way of a “any workload anywhere” ideal. Therefore, vendors seeking to allow customers this flexibility will need to build a data fabric that makes these moves seamless and invisible.
When choosing the best destination for an organization, it comes down to this: there is no single perfect place for all workloads or datasets. Sometimes the cloud is better; other times on-premises is better. Organizations should have the freedom to choose either or both locations based on the needs of a particular project or workload. If storage location choice and changes happened dynamically and seamlessly, IT could focus on delivering better applications and services and focus less on where the data needs to reside.
Sponsored by Elastifile