In a recent column we talked about what exactly object storage is, a storage software architecture that puts data into discrete ‘containers’ (called objects) and accesses them through an ID number and a simple index. We also discussed what object storage is not, namely a storage system (it’s a software architecture), a file system (but it can support a file system) or a NAS (it typically replaces a NAS). Object-based architectures are ideal for supporting features like scale-out hardware, erasure coding and data dispersal. In this entry we’ll discuss what these unique characteristics are and what they enable object storage systems to provide for users and applications that rely on them.
Object storage is a natural for unstructured data since files are also discrete data entities. By making each file an object, as is common with larger files, data access is achieved with a simple index instead of a complex system of i-nodes, directories and folders. A file system layer can be added to an object storage system to essentially perform protocol translations and create a compelling alternative to a traditional NAS system.
An object storage architecture is an excellent fit for a scale-out, clustered topology which can expand in a modular fashion, adding processing power as it adds capacity. Also, due to its efficient metadata structure, an object storage system generates less processing overhead than a comparable NAS, something that can help maintain performance as the system grows. The end result is a storage system that’s easy to scale and can do so more effectively than a traditional ‘scale-up’ NAS system.
Some object storage systems use RAID and replication to maintain data safety and integrity, a process that can generate extra copies of data. But even with those extra copies object storage’s ability to support scale-out configurations that use commodity hardware can still make these systems lower in cost than traditional, scale-up storage. However, when erasure coding is included, object-based architectures can be used to create even lower-cost disk storage infrastructures. Erasure coding as a disk-level data protection mechanism, generates much less redundant data than RAID. In a typical implementations this efficiency can reduce the amount of storage capacity needed by 2-3x, driving significant cost savings.
Object storage with RAID and replication can assure there are always a minimum number of copies of each object in a particular data set and that copies are maintained in a different location in order to provide disaster recovery protection. Object storage with erasure coding can provide even better data protection by enabling the system to sustain many more data failures and recover much more quickly than a comparable RAID system. And, when objects are distributed (or “dispersed”) geographically, object storage can provide a disaster recovery capability as well, one that’s more efficient than simply making replicated copies.
An object storage system’s REST interface allows it to be accessed directly by applications that are also REST-enabled, bypassing traditional storage protocols. And, as the ‘language of the internet’, REST makes web connectivity easier. Its flexible metadata structure also enables object storage systems to support advanced features like tagging, security, data tiering, encryption, etc.
It’s clear that object-based architectures provide some unique capabilities that can be used to create solutions to address some of the storage challenges facing companies today. Handling Big Data, large file content archives, long term data retention and cloud-based storage are some of these challenges that these systems are being designed for. In the final entry of this series we’ll look at where object storage systems are being used and why companies have chosen them over existing technologies.