The term Active Archive is thrown around all the time by IT vendors, but what is it really, other than an attempt to make something as old as data management sound cool again? An Active Archive is an archive where users recall data in that archive much more frequently than the recall rate of a more traditional archive. An Active Archive provides a rapid return on investment, drives down the cost of buying and managing storage and frankly may be the only way to live through the coming data deluge.
The cause of these more frequent recalls is data is more aggressively, soon after it stops changing, archived. The recalls can also be specific to a use case. For example surveillance video, healthcare imaging, seismic data, and media and entertainment archived data is or can be accessed much more frequently than regular cold data. In these use cases this cold data does not gradually warm, it goes from cold to hot very rapidly. Access time is critical for an Active Archive to be usable by the organization.
The goal of an Active Archive is to dramatically reduce the growth of more expensive primary storage and instead have the growth occur on less expensive secondary storage. The strategy sounds good but creates a unique set of challenges that IT has to overcome.
The Movement Challenge
Most archive storage systems are an entirely different storage system than primary storage. The first Active Archive challenge then facing IT is how to get data from primary storage to archive storage. There are three steps involved. First, the data to be moved needs to be identified. Second, the data has to be physically moved to the secondary location. Third, data has to remain transparently accessible to users and applications. While archiving software has improved over the years, qualifying data to move from primary storage to some form of secondary storage remains a big challenge. The challenges in correctly identify the right data to move to the archive are big enough that most organizations spend their limited IT budget expanding primary storage instead of managing it better.
Getting down to a single storage platform is the right idea, but expanding primary storage, because it is the most expensive, is the wrong choice. It makes more economic sense to expand secondary storage and keep primary storage small. The problem with counting on secondary storage is that it truly becomes an Active Archive and suddenly performance matters. The speed at which data can be delivered back to users and applications is critical for acceptance. Most archive storage systems however leverage either high capacity (and slow) hard disk drives or the cloud which introduces latency.
You can solve the performance issue by adding a distributed flash front end to the archive and placing that cache near the users and applications. In fact, assuming the solution has the ability to manage multiple cache endpoints, the solution could also act as a data distribution mechanism.
The Storage Challenge
Assuming you can solve the data movement challenge, the second challenge is the design of the archive storage system itself. For many data centers the ideal target storage location is public cloud storage which gets the majority of the data off-site and if it leverages a caching model, than the latency of the cloud is also overcome. The result should be a dramatic reduction in on-premises resources which means a lowering of not only physical storage investment but also space, power and cooling costs.
The movement solution should not solely require the use of public cloud storage though. For many organizations there are legitimate reasons to keep data in-house like security or long-term storage cost concerns. For these organizations data movement software should support on-premises cloud/object storage.
The return on investment of moving inactive data to secondary storage has always been compelling and Active Archive makes it even more impressive. The problem is the operational cost and effort of managing the data outweighs the capital savings of moving data to a secondary storage system. A potential solution is to eliminate the two tiers of storage and consolidate it in a single tier. To some extent IT did this by continuing to add capacity to the primary storage system. Unfortunately that’s the wrong tier. Instead IT needs to create a powerful secondary tier that can deliver the performance of primary storage for the data that needs it and the cost savings of secondary storage for the majority of data it does not.
To learn more about how to create an Active Archive that leverages the cloud to significantly reduce storage management costs without increasing operational expenses join Storage Switzerland for its upcoming webinar “How to Use Cloud Storage to Overcome The 3 Challenges to ACTIVE Data Archiving“.