Making Object Storage do more

Posted on June 10, 2016 by George Crump

SwiftStack Briefing Note

IT knows about object storage for its ability to store trillions of files, or objects, scale to almost limitless capacities and do it inexpensively. Cloud storage providers are accepting the technology with open arms, but enterprises are more cautious. They are choosing to stretch their network attached storage (NAS) investment just a little bit further. While the capabilities of object storage appeals to enterprises, they have existing architectures that need integration. They need more than just an object storage system, they need a system that can support all of their secondary storage needs.

SwiftStack is a software defined-storage (SDS) solution based on OpenStack Swift. It is a turnkey, software defined, object storage solution designed to help all organizations, not just OpenStack organizations, manage their unstructured data. SwiftStack is announcing version 4.0. This version of the solution adds many features to address the needs of enterprise customers.

SwiftStack Drive

Desktop & laptop users cannot easily interact with object storage, because the built-in clients speak file protocols. SwiftStack Drive is an optional agent for Mac or Windows that enables users to mount object storage containers to write to a SwiftStack object store directly. Applications that are native object can read and modify those same files. For example, a device could dump its data to the SwiftStack Cluster via SwiftStack Drive, and then a Hadoop Analytic engine can read that data.

MetaData Search

SwiftStack always had the ability to create metadata about objects as they are written, where that metadata could be searched. But the application creating the data had to perform that search. If another application wanted to search that application’s data it would need to index all the object metadata. In the era of petabyte sized system that re-index is very time consuming. SwiftStack 4.0 works with Elasticsearch to provide a more global query of stored data. As the SwiftStack cluster stores data, it feeds the data into Elasticsearch. The result is any application or user can leverage the Elasticsearch to find data they are looking for. The combination also works with Kibana to provide a visualization of the data stored.

Built-in Load Balancer

The value of a scale-out storage architecture is that as IT adds each node, it scales performance and capacity automatically. The problem is that as the number of nodes grows it becomes more difficult to ensure it leverages each node’s capabilities uniformly. To overcome this, organizations often have to purchase a hardware load balancer, which are expensive and make scaling more difficult since they don’t scale in unison with the rest of the cluster. A load balancer software comes with SwiftStack 4.0 which resides on every new node, scales with the cluster and provides built-in high-availability. The result is a simpler way of making sure that there are no performance hot spots that need additional hardware. It also means that the storage team no longer has to count on the network team to manage load balancing, now they can do it themselves.

StorageSwiss Take

We’ve been tracking SwiftStack for a few years now and it hits all the marks IT professionals look for in object storage; it is open, affordable and scalable. Now in 4.0, the team delivers key features that enterprises need while maintaining its object roots.

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Metadata, NAS, Object Storage, Scale, SDS, SwiftStack, Unstructured
Posted in Briefing Note

2 comments on “Making Object Storage do more”

Tim Wessels says:

June 11, 2016 at 1:24 pm

Well, SwiftStack has been weak when it comes to supporting the AWS S3 API. This is probably because OpenStack Swift prefers its own API. Nothing wrong with SwiftStack supporting the Swift API, but the S3 API is the de facto standard for object storage.

SwiftStack 4.0 with Elasticsearch is a nice add. Caringo already does the same thing using Elasticsearch. Cloudian will include Elasticsearch and Kibana in its next release later this year. Cloudian did show its use of Elasticsearch and Kibana in a demo at the most recent Storage Field Day. Having Elasticsearch integration will become table stakes for object storage vendors going forward.

SwiftStack Drive is optional with SwiftStack 4.0, and you can do the same thing using CloudBerry Lab Drive with practically every object storage vendor or service out there.

A built-in load balancer is a nice add in SwiftStack 4.0. Including a load balancer represents a nice savings for SwiftStack customers. Other object storage software vendors who recommend using load balancers when operating at scale should do the same.

Here are some real questions about SwiftStack 4.0, which were not addressed in this post by Mr. Crump. Has SwiftStack addressed its scalability issues related to the use of SQLite for metadata storage and rsync for replication? Does SwiftStack 4.0 still rely on a non-P2P architecture which makes Swift harder and more costly to scale? Is erasure coding finally working in SwiftStack 4.0?

Inquiring minds want to know.
Douglas Soltesz says:

June 28, 2016 at 2:47 pm

Thanks for the questions. I work for SwiftStack and would be glad to address these topics.

SwiftStack, along with others in the community, have been focused over the last 18-24 months, on ensuring that all of the necessary S3 features are covered for developers and are now moving on to the more niche functions. Since I work with 3rd party software vendors on supporting both the S3 and Swift API, I can tell you that today, I can hardly think of any commercial software implementing more than just basic CRUD (create, read, update, delete). I’ve given several talks at industry events to try to expand developers use of the rich API that Object Storage allows. It will come, but for now, SwiftStack and our competition support 99% of the commercial applications on the market using S3, because these applications are just not using some of the really cool stuff.

SwiftStack Drive to your point does the same thing as Cloudberry Lab’s Drive, Mountain Duck and other 3rd party applications. I fully endorse these products as well. You will see that Cloudberry is a SwiftStack partner, we appear in their dropdowns, and we work with their entire suite of products, supporting both S3 and Swift API. SwiftStack Drive gives value to customers that want a single throat to choke.

When you talk about SQLite for Metadata storage I believe you are confusing another issue. SwiftStack has always stored metadata with the object in a completely durable fashion, and not relied on SQLite. SQLite is simply used for listing of objects in containers. SQLite has presented issues in the past with the number of objects that a bucket can handle. The number of objects that can be held in a bucket / container has steadily risen with each release. Rackspace spoke at OpenStack Tokyo about users with a billion objects in a bucket, while I gave a talk in Austin detailing performance numbers up to 50 Million objects. SwiftStack allows for millions of user accounts each having millions of buckets, with each bucket containing millions of objects.

Below is a link to my talk:
https://www.openstack.org/videos/video/swift-102-beyond-crud-more-real-demos

SwiftStack still utilizes rsync to move data between nodes. This is done in a P2P fashion and always has been. If you watch my talk above, you’ll see that new improvements now allow a drive to drive P2P movement of data. While rsync is a tried and tested utility, we are continuously investigating ways to increase efficiency of how data is replicated between nodes in SwiftStack.

Erasure coding in SwiftStack is GA and shipped in our 3.0 release in October 2015. Several of our customers have deployed it in production. Again I think you might be confusing the community launch of Erasure Coding in OpenStack Swift with the release in SwiftStack. The community release was labeled as beta and thus SwiftStack did not release a commercial version for several months after the community release. SwiftStack licensed the Intel ISA-L Erasure Coding library and we do not use the open source EC Libraries that the Openstack Swift deploys.

Comments are closed.