It is a well known fact that unstructured data is growing. But there is an aspect of its growth that may take unsuspecting IT professionals off guard; the number of individual files storage systems need to store today and the ramifications of a higher file count on the storage infrastructure.
Machines – The File Creators
The primary reason for the increase in the number of files that a storage system has to store and manage is machines. Machines are the devices and sensors we are integrating into our lives to help us be more productive and healthy. They also allow companies to capture data about their products to help with future product planning. Unlike users, machines work non-stop capturing data. Often each increment of capture results in a new file. Some devices may capture information every second of every day, 365 days per year. While the actual file that a device creates may be significantly smaller than the files that users create, the sheer number of these files often consumes far more capacity than the user data does.
The Ingestion Problem
Many organizations implementing machines are now asking their storage infrastructures to store millions if not billions of files. These files are sent to the storage system non-stop and although the individual files may be small, the constant ingestion can place a lot of stress on the storage system. Also, different types of devices may need to transmit data to a different type of target. For example, some devices may look for an NFS data store, others a Windows SMB data store and still others may transmit via a RESTful API.
The storage system that is the target for these high file count situations needs to be able to handle the ingestion of all this data and it needs to present multiple target types. While NAS systems typically support SMB/CIFS and NFS they are less likely to support object storage. Some object storage systems, however, can support NFS, SMB, Amazon S3 and native object stores. This combined with their scale out nature allows them to receive inputs from thousands of devices at the same time.
The Scale Problem
As the number of files increases the data that tracks where these files are on disk, the metadata increases with it. A traditional NAS system, because of its more complex hierarchal structure of folder and subfolder, will reach metadata limits long before an object storage system will. This is why most environments with a NAS investment will not allow their NAS systems to exceed more that 40 to 50 percent of capacity. These percentages will be even lower for systems that are storing files from sensors or devices. Object storage systems, thanks to how they streamline the handling of metadata, can run at much higher percentages of capacity. Reducing both the number of systems or nodes required while increasing the capacity efficiency of the system.
The NAS Surprise
Many data centers don’t see file count as a problem, probably because they do not have sensors to track various components or products. But most organizations will eventually implement these devices, and of course many already have. Initially a traditional NAS may be able to handle the inbound data from these sensors but over time these systems will hit a performance wall caused primarily by file count. The problem is that when that wall is hit it will be difficult and more costly to migrate the data on the NAS to an object storage system. Data centers should consider object storage upfront and avoid the costly migration in the future.
About Caringo
Caringo was founded in 2005 to change the economics of storage by designing software from the ground up to solve the issues associated with data protection, management, organization and search at massive scale. Caringo’s flagship product, Swarm, eliminates the need to migrate data into disparate solutions for long-term preservation, delivery and analysis—radically reducing total cost of ownership. Today, Caringo software is the foundation for simple, bulletproof, limitless storage solutions for the Department of Defense, the Brazilian Federal Court System, City of Austin, Telefónica, British Telecom, Ask.com, Johns Hopkins University and hundreds more worldwide. Visit www.caringo.com to learn more.
