Scaling Unstructured Data in the Cloud Era

Unstructured data presents two scaling challenges to the data center. The first is capacity. It comes as no surprise to any IT professional that unstructured data is growing at an alarming rate. The second is performance, which may take some in IT off guard. The need to quickly process through unstructured data is a growing concern for many data centers. The problem is these two scaling requirements don’t occur in unison, and they are not evenly distributed across a single organization.

The Capacity Scaling Challenge

It is not the amount of capacity unstructured data requires that takes IT off guard, it is the impact the growth has on the data center. It is not uncommon for the unstructured data set to require multiple data center rows, spread across multiple network attached storage (NAS) systems and dedicated file servers. Collectively these systems, with their associated shelves of storage, also consume a large percentage of the data center’s power use.

The growth of unstructured data creates another capacity challenge – data protection storage – because all the primary unstructured data needs to be backed up regularly and then replicated (or transported) to an off-site facility as well. The data protection process is not only strained by the amount of total unstructured data, but the number of files that need to be protected and tracked can also overload backup systems and their catalogues.

The Performance Scaling Problem

The performance of unstructured data storage is also becoming increasingly critical. While users connected via WiFi at the local Starbucks may not need high performance access, analytics, video processing and other unstructured processes do.

An all-flash NAS solves some of these problems, but it then creates a data management challenge. How does the organization automatically move data from a hard disk based NAS to a Flash NAS? All-flash vendors will, of course, suggest putting everything on all-flash. But this is expensive overkill for most environments, especially for unstructured data sets where over 90% of the data has not been accessed in over 90 days.

Another challenge is that not all of an organization’s locations will need the same level of performance, even if they are accessing the same data. Some will only access a data set occasionally, making performance less of an issue. Others will need the highest performance possible.

The Scaling Advantages of a Cloud Era File System

One of the big advantages of a cloud era file system is its ability to independently scale performance and capacity. The typical design is for a hub and spoke architecture where the cloud acts as the central storage hub and repository, and the offices, even the primary data center are edges or spokes. At each edge where data is being processed, the organization has the choice of several different performance options in terms of type of media and controller processing power.

The edge cache is designed to hold only the most recently accessed data, keeping the on premises requirement small. If data is not available in the edge appliance, data is automatically fetched from the cloud. With most unstructured data being 5-15% active, and with intelligent caching algorithms, data rarely needs to be retrieved from the cloud and is served from cache with local performance. Even if the organization decides to store all data accessed within the last year to limit the amount of cloud retrievals the reduction in on-premises storage is massive.

The primary use for the cloud is as a storage area. Data is continuously versioned to cloud storage which is replicated to another region in case of disaster. Cloud storage is based on object storage and can cost effectively scale to meet any organization’s capacity and file count requirements.

StorageSwiss Take

The modern organization has new expectations of IT and those expectations are exposing weakness in legacy file systems’ ability to scale. The cloud is well suited to meet these demands. However, it needs to be coupled with a cloud era file system.

Independent scaling of performance and capacity is only one of the requirements of a cloud era file system. To learn about the rest, watch our webinar, “The Four Requirements of a Cloud-Era File System”. All registrants will also get a copy of Storage Switzerland’s White Paper “What is a Cloud Era File System”.

Watch On Demand

George Crump is the Chief Product Strategist at StorONE. Prior to StorONE, George spent almost 14 years as the founder and lead analyst at Storage Switzerland, which StorONE acquired in March 2020. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Prior to founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators where he was in charge of technology testing, integration, and product selection.

Tagged with: , , , , , , , ,
Posted in Blog

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 21,785 other followers
Blog Stats
%d bloggers like this: