Data center planners tasked with lowering the cost of storage have an unenviable job. Several industry sources state that data is doubling every two years and the principal source of this growth is unstructured data (user files, PDFs, rich multimedia, machine sensor data, email). The challenge is since many organizations are deploying data mining and data analytics systems, there is an increasing need to retain all of this information.
Storage Catch 22
Likewise, there is often a need to retain these records in order to comply with federal statutes like Sarbanes-Oxley. This puts storage infrastructure planners in something of a “catch 22” situation—lower the cost of storage but keep all of the data around. In a recent Storage Switzerland webinar, “How To Attain Sustainable Storage Savings”, we explored how organizations can get a better handle on their unstructured data repositories to reduce storage costs while meeting business intelligence (BI) objectives.
One of the fundamental problems with cost-effectively managing storage is the lack of insight that most businesses have on the type of data on hand and how relevant it is to the business.
To get a comprehensive view of all the information assets contained within the data center, IT managers typically will conduct a storage assessment. Assessments can be useful since they can provide detailed reports on file types, data ownership (users that created the files), size of each file and the last access or modification time of the files.
Armed with this information, it is possible to make decisions about data deletion or retention and whether the data should be migrated to a different cost storage tier. One of the problems with storage assessments is that the process is typically performed on only an annual or at best, a semi-annual basis. It is also often done by an outside third party who is not intimate with an organization’s data value or worse, may have other motives, like selling you more hardware. In order for businesses to maintain any meaningful, consistent storage efficiencies, this needs to be a regular process done internally.
The only feasible way to perform internal and continuous storage assessments is by the implementation of “data profiling” or data indexing technologies. These solutions can typically be deployed as either a hardware or a virtual appliance. Data indexing products can scan NAS systems, file servers and even tape storage archives to provide an extensive group of reports on all the data stored across the enterprise.
As importantly, data profiling technologies enable organizations to run storage assessments as a continuous process so that storage planners can maintain a close watch on the information inundating their environments and make data placement changes before a problem arises.
Armed with this information, storage managers can then make intelligent decisions on which data can be pruned or removed from the environment, which data can be migrated to a lower cost storage tier, like tape, and which data should be preserved to comply with compliance mandates.
Objectified Data Storage
The “Sustainable Storage Savings” webinar also delved into how object storage technology can be leveraged to store vast amounts of unstructured data. Object storage is being used today by cloud service providers (CSPs) as a way to cost effectively store multiple PB’s of information. As my colleague, Eric Slack explained during the webinar, object storage may be looked at as a new paradigm for storing data but in fact it has been in use now for well over a decade. One of the earliest examples is EMC’s Centera offering which utilizes object storage technology to store and preserve documents and files, like medical images and email, for compliance purposes.
Due to the ability of object storage systems to massively scale using commodity storage resources, many organizations are considering object storage as a way to supplant traditional NAS filesystem architectures. Some object storage system technologies also provide the application programming interfaces (APIs) to interface with public cloud storage offerings, like Amazon’s S3, to allow businesses to cost-effectively offsite data for backup and DR purposes.
Tale of the Tape
Then, of course, there is always tape. Over the years, tape has been assaulted by various “market-eers” as being dead. The fact is, with the recent advancements in tape solutions like LTO and LTFS (linear tape filesystem) technology, tape remains a very attractive medium for cost-effectively storing PB’s of information. LTFS is continually developed by a consortium of IBM, HP and Quantum so clearly it has some major backing and staying power.
Through LTFS, data can be directly accessed by applications via an operating system driver without going through the preliminary step of a data restore operation from a proprietary backup application. This makes the long-term proposition of storing data on tape much more viable as there is no requirement to maintain a copy of a particular vendor’s backup application onsite in perpetuity.
Some tape backup suppliers have even built-in an S3 interface to their LTFS based tape libraries to enable businesses to leverage cloud based tape repositories. This can be an effective strategy for maintaining a local tape backup footprint while using cloud based tape for select data sets.
The Power of 2
While retrieving data from tape is not as fast as disk, some vendors have introduced data archiving solutions which utilize a hybrid of disk and tape to enable organizations to get the best of both worlds—a low cost, highly dense tape storage repository backend with a fast NAS disk cache front-end to business applications. In this architecture, when data is recalled from the LTO archive, it is loaded on to the disk cache so that subsequent access requests will be at hard disk speeds. Then as data cools, it is pushed back down to LTO storage so that room can be made on the disk cache for more active data.
Storage Swiss Take
Attaining storage savings in today’s “internet of things” where data is being collected at break-neck speeds by an overwhelming number of devices, requires organizations to deploy solutions which will provide greater intelligence on the business value or relevancy of the data on hand. When technologies like data indexing are run as a regular process, storage planners can be more proactive about pruning data or migrating it to a more cost-effective storage repository like object storage or tape.
Furthermore, since many Tier-2 and Tier-3 storage offerings today are being designed with the cloud in mind by layering in the necessary APIs to interface with cloud storage systems, organizations can start to lay the foundation for a hybrid cloud storage infrastructure to potentially further reduce storage and DR costs.