The value of object storage is that it can be the storage backend to a variety of unstructured data use cases, ranging from primary file stores, to unstructured data archives to a backup target for the protection of transactional primary store. But how should IT protect object storage systems?
Data protection processes have to protect against two main types of damage: physical and logical. Physical damage occurs when a drive fails, a node fails, or something physically happens to a node or entire site. Fires, floods, and natural disasters can all create physical damage to a storage system. Logical damage happens at the file, object, or block level and is created by humans or things created by humans such as software. Someone accidentally deleting or corrupting a file is logical damage. Viruses, ransomware, and other attempts at hacking your infrastructure also create logical damage.
Parity protection (RAID and erasure coding), drive mirroring, and replication are all designed to protect against physical damage. Backup systems of all types are designed to recover from logical damage. The question is, how do object storage systems protect against both types of damage?
Replication vs. Erasure Coding
The historical way to protect against many different types of physical damage is replication, and some object storage systems use replication as well. The most common method for object storage systems is to replicate at the object level, ensuring that every object is stored in its entirety in at least three locations. Others may replicate drives or nodes, but the goal is always the same: ensure that all data is in at least three locations. The main advantage of triple replication is that it requires low computational overhead, is simple to understand and well tested.
Replication in most cases is configurable at an object level. An administrator can set policies based on type, age or activity as to the number of copies that the system should maintain. These systems also allow for additional copies to be removed as the data ages or as it becomes less active.
The main disadvantage is it has a 200 percent storage overhead.
An increasingly more common method for object storage systems is to use erasure coding which is a parity-based method for protecting against erasure due to a damaged device. Erasure coding systems divide objects into pieces called “shards” or “chunks”, to which they add parity shards. For example, an object may be divided into nine data shards with three parity shards (referred to as a “9 + 3” configuration). The resulting 12 shards is distributed across multiple drives, nodes, and even sites. Since the object can be read with any nine of the 12 shards, a 9+3 configuration would be able to survive three simultaneous outages with only a 33% storage overhead. Therefore, the greatest advantage of erasure coding is that it can protect against multiple outages with significantly less storage overhead than replication. It’s main disadvantage is that it is computationally expensive and multi-site configurations can create significant latency during writes.
Most object storage systems provide the option to do both. Organizations with lower (sub-1PB) capacity requirements may find replication is the simpler protection method. However, as the environment grows most organizations have to use erasure coding from a sheer cost and data center floor space savings perspective. Additionally most object storage systems will allow a blended protection strategy where data is erasure code within the data center but replicated between data centers, resolving latency issues.
Object storage systems can protect against logical damage, too. They can keep a WORM version of every object, protecting against accidental or purposeful deletion or corruption. Recovering from logical damage is a simple as changing pointers to older versions of objects. They can also put safeguards to provide short term protection from large scale attacks, such as placing recently changed or deleted objects in a virtual trash bin.
Object storage systems can protect stored objects from both physical and logical damage, although not all systems support recovery from both. Erasure coding is becoming the more common method from protecting against physical damage. Users must understand they will suffer a latency penalty during writes or have to use asynchronous writes to send shards to other locations. Triple replication is simpler and easier to understand, but it’s going to have a much higher storage penalty.