Object storage has been around for a number of years but is now getting more attention. This is due to a couple of factors: the amount of data, especially file data, is continuing to grow and the length of time it must be retained is increasing. The need for storage systems that can keep up with this demand for long-term capacity is greater than ever and storage vendors are responding with object-based storage systems. But what exactly is object storage and why is it a better fit for scalable file storage than traditional storage architectures?
What Object Storage Is
Object storage is an architecture in which data is stored in discrete ‘buckets’ (called objects) instead of in large volumes of address space (like block storage) or in a hierarchy of directories and folders (like NAS). Each object is assigned a unique identifier called an “object ID number” (OID), which is compiled into a flat index and searched to access that object’s files. This structure provides a simple, efficient way to organize and access files, compared to navigating a folder hierarchy. Instead, users or applications basically look up the OID and fetch the data required using an offset into the object.
This architecture also requires a minimum of metadata, the information that’s used to organize data sets, compared with a traditional file system. This metadata efficiency means object storage generates less overhead storing and handling files. But object storage is also more flexible than traditional storage as systems can be configured with custom metadata fields to support advanced functionality.
The object-based storage software architecture is also ideal for a modular storage topology. Since each object is a discrete ‘container’, large data sets are easily divided into subsets and stored in a scale-out fashion, on multiple storage modules or “nodes”. This logical cluster of nodes can then be physically separated to provide greater data protection.
Object storage systems are accessed using a REST-based interface, using lower-level PUT and GET commands. This allows applications to directly access data without using traditional file system protocols and for object storage to be easily connected to via the internet.
What Object Storage is Not
Object storage is not a storage system, per se, but an architecture as described above, one that can be integrated into storage systems in many different configurations. Some object-based solutions are software-only that run on user-supplied hardware. Others are appliances, typically 1U or 2U nodes often supplied as turnkey systems that leverage commodity hardware.
Object storage is not a file system either, or a NAS; however, object-based storage systems are often used in the same large file-storage environments as scale-out NAS solutions. These storage systems typically contain a file system layer that essentially performs a protocol translation, mapping a file name to the object that contains that file.
Object storage is not erasure coding, a data resiliency process that uses redundant blocks of data and a parity-like calculation to recover from an infrastructure failure. As with the file system discussion above, many object storage systems include features like erasure coding and data distribution (or dispersion), both process that are well suited to the object-based architecture.
Object-based storage systems have some unique characteristics which can provide compelling advantages for storage vendors designing systems to handle the deluge of unstructured data that many companies are facing. In the future posts we’ll discuss what these characteristics are, how they’re being leveraged by storage vendors and where object storage systems are being used.