Being able to store terabytes or even petabytes of data is table stakes. It’s how data is managed that determines how effective a data driven organization will be in delivering value and controlling costs.The problem is most file systems and data management solutions hold data hostage to a single media type and/or vendor. Data is fluid, its requirements for performance, value, protection and location change over time, the file system should allow data to traverse vendors, media types and locations.
Unstructured data is growing at an alarming pace. The growth is driven not only by an increase in user-created data, but also by the increase in machine and sensor created data. Data growth is nothing new, the rate of its growth is and the desire to leverage tools like data analytics and artificial intelligence to strategically leverage data. Potentially, more surprising is the new access and performance requirements of unstructured data.
The age old theory of data value steadily decreasing over time, eventually never being accessed again, is outdated. Today data, especially machine and sensor created data, can be very active for a short while after initial creation, become dormant very quickly and then once again become active. And, when a user needs it, that user wants it immediately. Some organizations can’t wait hours for old data to be made available.
At the same time the performance of this data, once active, may need to be processed very quickly. Where that data is and what type of media that data is on becomes critical.
The combination of rapid creation, long periods of dormant retention and sudden bursts in activity lead organizations to adopt a variety of NAS solutions to meet these different needs. They are then often forced to manually move data between the systems or somehow modify their applications to access data across these systems. In the end multiple systems, without some unifying force leads to increased costs, lowered performance, and more complex management.
The Capacity Management Challenge
IT faces a significant challenge in making sure the right unstructured data is on the right type of storage, in the right location, at the right time. The old workaround for this situation was just to keep all data on a primary NAS system. That workaround is no longer practical because of the amount of unstructured data the system has to store. It is simply too expensive to keep all unstructured data on primary storage.
The cost of primary storage is compounded by the reality that many primary storage systems are now either all-flash or flash assisted. While flash is coming down in price, it is still more expensive than hard disk storage, especially capacity centric hard disk storage.
It used to be that the biggest challenge facing IT was creating a single storage pool that was big enough to store all of an organization’s unstructured data. Thanks to advanced file systems and scale-out storage architectures raw capacity is no longer the issue, efficient use of capacity is.
The Modern Capacity Goal
The objective for IT is to make sure that unstructured data is on the most cost effective storage media, system and device at any given moment in time but to also be able to move that data rapidly and non-disruptively as requirements on that data set change.
To accomplish the capacity goal, a scale-out NAS 2.0 system can’t be limited to a single storage architecture or even a single storage location. It should leverage high performance hardware when data is active and then move data to other storage types when it is less active. For example, if data needs to be retained or preserved for a period of time, moving that data to a storage system that has strong compliance features is ideal. If data is not active but might be soon again, moving that data to a storage tier that is cost effective but easily accessible is another option. Or if data could be better processed by cloud compute instead of on-premises compute then moving it to a public cloud provider like Amazon AWS should be an option.
The key though is the data movement should be transparent, automatic and policy driven. Transparency is critical so applications and users can access the data without change to workflow. Automation is important because IT is already overworked and doesn’t have the time to manage data movement, especially of unstructured data sets that may have millions, if not billions, of files associated with them.
At the same time data movement should be policy driven to override automation when it makes sense. For example some workflows can predict when they will need a data set or when the access of one particular file means that other files within a directory will be accessed. Policies allow for the pre-migration of data to higher performing tiers or the aggressive movement of data off of a tier.
A Scale-out NAS 2.0 system needs to be able to leverage multiple types of storage architectures that best match the data’s storage requirements and be able to move that data when those requirements change. Just meeting the capacity requirements of an organization is now table stakes, plenty of storage architectures and systems can meet that need. Scale-out NAS 2.0 should leverage multiple types of storage systems and locations like private or public cloud storage and even tape, and then provide an intelligence layer to move data between them.
Sponsored by Quantum