The longer you keep data, the more it makes sense to move it off of primary storage and into a different kind of system. It makes financial sense, as discussed in the previous blog entry. It also makes sense for efficiency. The problem is most of the inefficiencies come from the typical design of most NAS systems. Have a bunch of unstructured data that needs storing? Buy a filer and put it there. Problem solved, right? Not so fast.
An old saying is, “I loved my first filer. I loved my second filer. I hated my fifth filer.” Typical NAS do not scale. Yes, there are some scale-out NAS systems, but even then you have to create some kind of hierarchy to be able to find anything. You need a filer or mount point for each department or workgroup. Then under that mount point that work group is going to create subdirectories to organize their content.
Some workgroups are better at this process than others. They create some kind of hierarchy that includes things like projects, timeframes, data types, etc. Others are very bad at it and create subdirectories that make as much sense as obfuscated code*. Whichever of these happens in a given department, time passes and people get replaced. The next person responsible for the organization of the file system might not agree with the previous person’s management system and decide to make a change. Maybe they’re better; maybe they’re worse. The result is that data older than a few months can become nearly impossible to find.
The bigger your data needs are, the worse this problem becomes. Although there are some systems that can allow you to search the content of these files (instead of just their file and directory names) most searching systems are completely unable to search the content within the files of a large file hierarchy. You can waste a lot of time trying to find these older files. Worse than that is often the person is unable to find the file in question – and they end up having to recreate the work. Now the person has wasted time looking, wasted time recreating the data, and wasted more space on the already overtaxed primary storage system.
Imagine a system where there is no file system hierarchy, and data is stored and accessed according to what’s in the file. It separates the data and the metadata, allowing for much easier search of the data when it’s needed. For example, Caringo Swarm allows customers to search based on the attributes of the file, as well as against custom metadata that they can add. Objects attached to a certain project, product, or process can easily be located via such a search.
Object storage systems may not be for everyone, but they certainly make sense for those who are creating and storing significant amounts of data that need storage for very long periods of time. The more data you create and store this way, the more object storage makes sense.
Sponsored By Caringo
Caringo was founded in 2005 to change the economics of storage by designing software from the ground up to solve the issues associated with data protection, management, organization and search at massive scale. Caringo’s flagship product, Swarm, eliminates the need to migrate data into disparate solutions for long-term preservation, delivery and analysis – radically reducing total cost of ownership. Today, Caringo software is the foundation for simple, bulletproof, limitless storage solutions for the Department of Defense, the Brazilian Federal Court System, City of Austin, Telefónica, British Telecom, Ask.com, Johns Hopkins University and hundreds more worldwide. Visit www.caringo.com to learn more.
*Computer code written in such a way that a typical programmer can not look at the code and figure out what it’s supposed to do.