Taking Advantage of Object Stores Like EMC ECS – Right Now

Object storage is the storage methodology for the future of unstructured data and more. Next generation applications are able to write directly to object storage and take advantage of its sophisticated resource management capabilities. But enterprises have unstructured data problems worth solving right now and today’s users and applications have no idea how to write to an object storage system. The creation of files and other data using older methods isn’t going to stop as the data center moves to more modern applications and storage platforms.

Enterprises not only need to plan for the future of unstructured data, then they need to manage it in the present.

What is Object Storage?

Object storage is simply another way to store data. It is different than the traditional POSIX-based file systems that most network attached storage (NAS) and file servers use. It has no folder hierarchy, and instead each file is given a unique ID within the system as it is written. It is this ID that is used to find the file in the future.

This unique ID is also leveraged for data protection and data integrity checking. A simple data protection scheme might be to make sure that there are always X number of copies of object ABC on Y many storage nodes. The object storage uses a simple hash algorithm to create the unique ID as new information is stored. The same file (contents) should always generate the same ID. If for some reason data is modified, or the media that the object is on has errors, the ID will be different, the object storage system will know, and it can correct the errors that may have occurred.

EMC Elastic Cloud Storage (ECS)

One of the leading solutions on the market is EMC Elastic Cloud Storage (ECS). ECS is a software defined object storage solution that is often bundled with the appropriate hardware to provide a cost-effective alternative to traditional unstructured data storage methods like NAS and file servers. For enterprises, it is also a viable alternative to public cloud storage. Many enterprises data needs are so large that the ongoing cost of cloud storage is often more expensive than maintaining the same data on-premises.

ECS is also available as a software download that the enterprise can run on its existing hardware or as a turnkey solution that enables a faster implementation and simpler long-term operation. The solution provides complete data resiliency from hardware failure or media degradation. In addition to unstructured data, it is an ideal solution for the storing of data created by modern applications like Hadoop or Internet of Things (IoT) projects, which can natively store and access data on an object store.

ECS helps organizations meet the challenges they have with their current unstructured data applications which doesn’t have direct support for object storage protocols. ECS, itself, can provide emulation of standard protocols like SMB and NFS. But legacy data needs more than just a “connection and another place to put things,” it needs a process that intelligently moves and organizes data once it is stored there.

Leveraging Object Storage Today

There are two use cases that are immediately apparent for today’s data centers to leverage EMC ECS. The first is as a secondary backup target. While not a replacement to something like EMC’s Data Domain, it can augment it. The goal is to use Data Domain data protection storage for the most recent versions of the backup and then use ECS for the longer-term retention of protected data.

The second use case, ECS as an active archive, potentially has an even greater payoff.

Today, most of the enterprise’s unstructured data set is stored on NAS systems or file servers, which are the inverse of object storage. Designed primarily for performance, they are not able to scale to the same levels of object storage, and maintain the same (lower) costs. Also, for most of these legacy systems, as the number of files and the amount of capacity grows, their performance declines.

To make matters worse, most of the data on NAS systems and conventional file systems, upwards of 85%, has not been assessed in years so it doesn’t need the performance that higher-cost NAS systems claim to provide. While the cost savings offered by new storage platforms are always important, the value of an active archive is more than just the cost per GB savings that ECS enjoys over traditional NAS systems.

An active archive also reduces the on-going cost of data protection hardware and software. And it protects the organization from the newest threat facing the data center – cyberattacks. If 80% of the data is secured in the archive and not directly accessible as it would be on a NAS, there is a massive reduction to the organization’s exposure. Malware and hacking can all be discovered and stopped before meaningful damage is done.

The concept of an active archive has been around for a while, and has always sounded appealing, but it has not executed well. The primary challenge is in identifying the right data to move to the archive, actually moving that data, and subsequently restoring that data quickly if a user or application needs it. ECS provides the right level of performance and certainly the cost effectiveness to resolve the backend of that concern. The remaining problem is the front-end intelligence – identifying, moving and setting up for rapid, seamless recalls.

NTP Software Enables EMC ECS for Today’s Data Center

NTP Software offers a suite of products to solve the problems of managing unstructured data that organizations face today. At the heart of its portfolio is NTP Software’s VFM™. It is a virtual file management solution that will monitor and manage a wide variety of unstructured storage systems – simultaneously, if this is what your organization needs.

The first step in creating an active archive is identifying the data that may qualify for movement to the archive. The problem is often that the bulk of this data is scattered across multiple NAS systems and file servers from different vendors and across multiple operating systems. While there are many point solutions to identify and move data to an archive, they tend to only work on a specific platform, which means that the organization has to buy a separate solution for each NAS vendor and operating system in their environment.

NTP Software’s VFM will work across all of the tier-1 and tier-2 vendor NAS systems and file systems on the market today. NTP has invested the development effort to make sure its software interfaces with the NAS and operating system file systems in an approved method with feature parity across platforms. For the organization, this means that from a single interface it can inventory its unstructured data across a variety of NAS platforms and use a single set of policies for its movement to ECS.

As the data is moved to ECS active archive, software should create an ability for the user or application to have seamless access back to the moved data in case it is needed in the future. In most cases this means that the software will use stub files. It is important to realize that all stubs are not created equal. NTP provides a variety of stubbing methods based on how users or applications will typically recall the files. It can eventually remove the stubs as the need to access a particular file declines even further.

A stub removal capability addresses the issue that if stubs are first used, they do not lower the file count on the NAS. File count can be a source of some of the NAS system’s performance problems. Ideally, archiving software should, again through policy, provide the ability to remove stubs when it deems that the chance of future access is so small that the overhead associated with a stub is no longer needed.

Once a stub is removed (or lost somehow) the active archive software needs to provide an ability to still find and access data. NTP provides a web portal interface to VFM so users or administrators can search VFM’s catalogs to find the data they need.

Finally, the active archive solution, both the software and object storage, needs to be scalable. Archive systems, both the software and the hardware, will likely be in place for decades. Over time the number of files (objects) that the system will need to track and store could be in the billions if not trillions. ECS is one of the most scalable object storage systems on the market so the software parented with it needs to scale as well.

It’s easy for a software vendor to claim scalability, but how can an organization test that claim? It’s almost impossible to simulate years of storing billions of files. NTP is not a startup, it has customer relationships that are decades old in which they are already storing and managing billions of files. This track record means that their developers don’t need to test the product’s scalability, they can simply ask their customers.

StorageSwiss Take

Object storage is the future of unstructured data. Eventually many of the applications in the enterprise will write natively to it. But there will also always be the need for NAS systems and file servers. With an active archive solution those servers can be kept “thin” and only need to store the most active of data sets. The key though is an end-to-end archive solution that includes intelligent software and cost-effective hardware to deliver cost savings and better data security.

Eight years ago George Crump, founded Storage Switzerland with one simple goal. To educate IT professionals about all aspects of data center storage. He is the primary contributor to Storage Switzerland and is and a heavily sought after public speaker. With 25 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS and SAN, Virtualization, Cloud and Enterprise Flash. Prior to founding Storage Switzerland he was CTO at one the nation's largest storage integrators where he was in charge of technology testing, integration and product selection.

  1. Doug Jones says:

    What are some of the paths or bridges you might suggest for applications now archiving to CIFS folders but want to use S3 in the future? If we write to ECS or other object storage using a CIFS conversion how can we seamlessly transition to pure S3 a couple of years later? Even the various storage manufacturers don’t seem to provide a clear answer. We’re stuck on CIFS to Object now with no BRIDGE. If SQL points to the location of folder file pathnames now and that is packaged onto object. How can we later access this through S3 without a major migration or some other work? SQL or the app would need to know where those same records are without a CIFS path and file information.

