Unstructured data is file based data that is most commonly associated with user productivity applications like word processing, spreadsheets and presentation files. But unstructured data also includes rich media from training videos, photos, CAD-CAM drawings etc., plus the data created by surveillance cameras, GPS locators and other device sensors. All these types of file data can be critical to the enterprise and often need to be stored for long periods of time as well.
The net result is millions if not billions of files that must be saved, protected and kept available for rapid retrieval. The size and quantity of these files is bringing traditional file-system based solutions, like file servers and network attached storage devices (NAS), to their breaking points.
Consequently, by the time an SMB data center earns the designation of “Tier 2” it has typically accumulated several NAS systems to meet its unstructured data storage requirements. Unfortunately, most of these additional NASs are implemented long before the full storage capacity potential of the original NAS is ever reached.
In fact, Storage Switzerland has found that most NAS systems in Tier 2 data centers run at less than 50% utilization, essentially doubling their price per GB. This inefficient use of storage capacity is driven by the fact that as the number of files on the NAS increases, the metadata processing workload can increase to the point where performance is affected.
To make matters worse there is typically limited integration between these NAS systems increasing administration overhead. The Tier 2 data center is left to individually manage multiple NAS systems which aren’t even full. For a budget- and staff-strapped data center buying and running more systems than necessary is simply unacceptable.
Scale-Out NAS Problems
Scale-out NAS solutions are supposed to be the answer to this problem. But they have two issues that specifically impact the Tier 2 data center. The first is that scale-out NAS often starts out too large because these architectures require a cluster of nodes in their initial configurations. Again, this is a mid-sized organization not an enterprise and initial capacity requirements are smaller. In fact the ability to manage file count may actually be more important than total capacity.
Scale-out NAS systems are also based on traditional file system technologies and, as a result, can run into the same limitations with file counts that traditional scale-up NASs do, as described above. The Tier 2 data center needs something else and the cloud service provider may be the place to look for that solution.
What to Cloud Providers Do?
Most cloud providers that are focused on storing data have this same file count and capacity problem, but even worse than the Tier 2 data center. Because of the competitive nature of this industry they also have similar staffing and budget pressures. Staying lean and frugal is critical when you are competing on the cost per GB of storage delivered.
Looking at how cloud providers tackle the unstructured data challenge is a good way for Tier 2 data centers to learn. Most of these cloud providers use a scaleable object storage system for user data. Think of objects as files but instead of being stored in the traditional hierarchy of folders and paths, each object is given a unique identifier that’s stored in a simple index. When the data needs to be retrieved it’s found using this ID number instead of navigating a directory structure.
This method means object storage systems are far less impacted by file count and file size. It enables the cloud provider to run their storage systems at much higher utilization rates which leads to fewer overall systems, reduced upfront acquisition costs and lower long term management overhead.
The Problem With Cloud Storage For the Tier 2 Data Center
There are two roadblocks for Tier 2 data centers trying to implement an object based storage system like a CSP would use. First, they are typically scale-out in design meaning that, similar to a scale-out NAS, the organization has to be prepared to start big and grow into their storage system.
Because of the ever decreasing cost of disk capacity buying excess storage upfront is the least efficient acquisition strategy. Ideally, IT planners should look at storage as a just-in-time inventory item, filling their current systems to 90% utilization before purchasing another. While object storage has the 90% capability it may take a long time for the Tier 2 data center to reach that point on their initial purchase. These companies need a storage system that can start smaller but still grow in similar fashion to other scale-out systems.
Tier 2 data center managers are seldom in direct control of the applications they support. They don’t typically have access to the source code to modify these programs so that they can store the unique ID required by the object storage architecture. If that ID can’t be stored then the data is much more difficult to access. The Tier 2 data center needs an object storage solution that can be accessed via traditional NAS protocols like CIFS and NFS without inheriting the file system limitations of those protocols.
The Tier 2 Cloud Storage Solution
Companies like Exablox are answering the call from the Tier 2 data centers with storage systems to meet their unstructured data challenges. These designs feature systems that can start with a single node but can be expanded by adding nodes. And similar to other scale-out storage systems they’re still managed as a single entity.
In addition, while these are object storage systems, they have the NAS protocols integrated directly into them. This means that the Tier 2 users and application owners don’t have to change any part of their workflows but also, that the storage nodes can be filled to 90%+ capacity without suffering any performance degradation.
The Tier 2 data center has storage cost and efficiency challenges that are similar to those of Cloud Service Providers (CSP) when it comes to managing unstructured data. They also need the same types of storage systems that CSPs utilize to meet those challenges. Unfortunately the lack of granular scaling (the ability to start small) and the need to re-write applications has largely excluded the Tier 2 data center from using the object storage systems that cloud providers count on.
New storage systems, like those from Exablox, leverage object technology but still provide the flexibility to start at a lower capacity point and expand in a scale-out fashion. By being easily integrated – they provide standard file protocol access – these scale-out, object storage systems can help meet the needs of the Tier 2 data center.
Exablox is a client of Storage Switzerland