Strictly speaking, object storage refers to a system where data is stored in discrete buckets or “objects”, in contrast to the directories and subdirectories of a traditional file system. It can be implemented in any storage architecture, but is usually found in a scale-out configuration, a cluster of storage modules, instead of a traditional scale-up configuration of disk shelves behind a single or redundant controller. Each object is assigned a unique identifier (an object ID number or OID), which is compiled with other OIDs into a flat index that’s used to access the data in each object.
Technically, an object can be of almost any size and could contain multiple files, or only fractional files. But in practice, most objects contain a single file. For this reason object storage systems are typically compared with NAS systems, not with traditional block-based storage arrays. A common application for object storage is as an archive for unstructured data, especially large file data such as digital content. But it can be used for more active data as well, like that stored by an online application provider.
Complicating the NAS comparison somewhat are object storage systems that include a file system. This isn’t a file system like that which would be used with a block storage array, but a file system layer that essentially performs a protocol translation, mapping a file name to the object that contains that file.
Also, the vast majority of object storage systems on the market are designed for the large file archive use case described above. However, object storage with a file system layer can provide an effective scale-out solution for more general purpose file serving applications as well.
Why Object Storage?
Storage for unstructured data has historically been modeled after the way humans think and organize data – by abstracting files into folders and then folders into directories. This could be seen as a computer analog to the paper based file cabinet with drawers and multiple file folders in each drawer. Humans can only keep track of a finite number of discrete objects (according to phone company research that number was seven, explaining why phone numbers are seven digits), but computers don’t have this problem. They can keep track of an unlimited number of files or data objects. In fact, organizing files into a hierarchical structure just slows the process down.
This is one of the primary benefits of object storage, that its flat index is more easily searched by a computer than a traditional file system. With only an OID required to search for data (plus an offset into the object itself) lookups are essentially a two-step process, versus the multiple steps required to walk the directory tree of a traditional file system. The immediate effect of this is a reduction in metadata handling by the storage system which means faster file access, especially as the system grows to very large proportions.
The Key Attributes of Object Storage
While object-based architectures reduce metadata handling in the file access process, they actually offer a way to store more metadata than does a traditional file system. Each object has the ability to hold metadata about its own data and most systems enable users to modify those metadata fields somewhat to best fit the application. This can support an extensive set of storage system features like tagging, security, data tiering, faster data access, etc.
Erasure Coding and Data Dispersion
Erasure coding is a data protection process that parses a data set into a fixed number of blocks, then adds some redundant blocks to create a more resilient superset of data. Similar to a parity calculation, the original data set can be recreated after a data loss, provided that a minimum number of these blocks are intact. Object-based architectures are ideal for erasure coding since objects are easily parsed, compared with volumes or LUNs.
Dispersion refers to the practice of storing objects on different physical storage devices, often in different geographic locations. Since each object is self-contained, its physical location is technically immaterial to the storage controller that’s servicing storage requests, although latency between distributed devices can be a factor. This makes object storage an efficient solution for creating a resilient storage system that spans multiple data centers, as many cloud providers do.
Object storage systems are accessed using a REST-based interface that leverages simple PUT and GET commands to read and write data. This allows applications to directly access data without using traditional file system protocols. As the ‘language of the internet’, the REST-ful interface makes object storage an ideal cloud storage platform.
Where to use Object Storage
Unstructured data sets have gotten too big to economically store and handle using traditional NAS infrastructures. The huge archives now common in industries such as media and entertainment, cloud-based services, social media, video surveillance, oil and gas, etc., need a different storage architecture, like object storage, one that has these characteristics:
Nearly unlimited scalability
Object storage can expand flexibly, in a scale out fashion by adding modules to increase capacity, increase performance, increase connectivity, etc, while maintaining reasonable access and throughput performance.
Traditional RAID-based storage systems use data replication to create multiple copies in order to protect a given data set. When archives reach the hundreds of terabytes, this capacity overhead (which can reach 300%) can make these storage systems unfeasible. Object storage with erasure coding can actually provide better data protection with overhead under 50%.
Many object storage systems are software solutions that can be run on nodes using low cost server hardware and high capacity disk drives. Compared with proprietary NAS systems the savings can be significant.
Better data protection
Erasure coding provides several times the level of data resiliency that RAID does. And, most object storage solutions that support erasure coding allow the user to configure that level by modifying the number of redundant blocks that are created and required to reconstruct the original data object. Also, by leveraging its ability to physically distribute data objects, object storage can provide a real disaster recovery protection, without creating yet another copy.
The flexible metadata architecture that object storage employs can enable a number of intrinsic features that would require dedicated software in a traditional storage system. For example, storing user information in each object can enhance access security and detailed data tags can enhance search performance and efficient data tiering.
This special page includes links to Storage Switzerland content related to object storage technology, object storage vendors and current object storage products.