Object storage is often thought of as the storage system where data goes to die. Its cost effective and data integrity does make it an ideal storage mechanism for long term data archive. But there are times where a higher performing object storage solution will better match the needs of a lot of use cases, including analytics and social discovery. The faster these use cases can process data and find a result, the happier the user (or customer) is. The problem is that most object storage solutions are not tuned for high performance and as a result data centers are left using legacy NAS or block storage solutions as an expensive and complicated work-around.
Performance Object Storage is More than Just Flash
Most object storage solutions were not originally designed with performance in mind. Instead, the focus was on keeping costs down and insuring the long term viability of the data they store. Their only performance upgrade was to equip the storage nodes with flash solid state drives (SSDs) instead of hard disk drives.
Flash-enabled nodes have two benefits, and performance isn’t one of them. The primary benefit is density. An object storage system equipped with SSD can offer greater capacity per node, which means less nodes, reducing data center footprint. The secondary benefit is an increase in performance. But the overhead of the object storage software may prohibit the flash storage from reaching its full performance potential.
Object Storage vs. Flash – What Went Wrong
One of the primary inhibitors to achieving maximum flash, or even hard disk drive performance, in an object storage system is the architecture itself. Most object storage systems are scale-out, meaning as IT adds nodes to the object storage cluster, storage capacity and compute performance scale with it.
The use of flash in scale out designs exposes a couple of weaknesses. First the cluster needs management, nodes have to communicate with each other to make sure each is participating correctly in the object store. Cluster management creates overhead and latency. Better development can address these issues, but many object storage vendors are just now starting to focus on removing latency from their software.
In addition to cluster management, the object storage software needs to disperse data to various nodes in the cluster. It also needs to create a protected copy of the data that it is writing, typically through replication (x number of mirrored copies) or by erasure coding (a parity based protection scheme similar to RAID).
As the number of nodes increase so does the number of communications across the network. More bandwidth, or faster networks, don’t entirely fix the problem. The object storage system, especially with erasure coding, may write very small size blocks and small block I/O doesn’t need bandwidth it needs reduced latency. The latency of multiple network round trips is the concern. Similar to cluster management, network transmissions also need to be managed often via a custom network protocol that is efficient for the type of network traffic that the object cluster will create. Once again, it is a development issue.
Finally, there is metadata management, a hallmark of object storage. It provides the rich tagging that allows stored data to be easily found in the future. The metadata is placed in a table and so the efficiency of its management become very similar to optimizing a database. Once again, it’s a development focus.
Data Integrity is still Critical
While performance is increasingly vital in object storage systems, one of the original traits, data integrity can’t be lost in the shuffle. Active or not, data placed in an object store is likely to stay there for a long time. It is important that the system have the ability to guarantee that the data will be readable for years to come. To accomplish this high performance object storage systems should have a strong cryptographic checksum associated with it. The checksum should not only be applied to the data but also the metadata. The checksums can be used to continuously verify data integrity. If there is a mismatch then the solution should be able to leverage its data protection methods to restore the data to its correct state.
StorageSwiss Take
Optimizing object storage performance is not a “throw hardware” at the problem situation, like most of IT. Instead, it requires the object storage solution vendor focus development priorities on performance, without sacrificing data integrity. They need to focus on optimizing areas like cluster management, data disbursement and metadata management. If vendors do, they can deliver a high performance object storage system that doesn’t necessarily need flash storage, but can take full advantage of it if it is there.
Sponsored by Nexenta
Well, Mr. Crump is right that object storage cluster performance improvement is not necessarily solved by throwing SSDs at it. Object-based storage software and how it works within its storage cluster is where the engineering effort needs to be applied. When selecting an object-based storage software vendor, you need to find out how their software addresses performance, including the items Mr. Crump touched on his blog.
[…] entry “The Need for Speed – High Performance Object Storage” shows the decisions to use flash for object storage gets support from improving time to results […]