The traditional backup process doesn’t work well any more. It’s being overwhelmed by a new kind of file-based data that’s created, seldom modified, and saved – sometimes forever. Driven by industry trends such as big data analytics, image-based digital content and a “save everything” mindset, companies are finding they need something different to store and protect data.
Traditional backup was designed to provide protection for files and database applications that changed on a regular basis over a period of days or weeks. Regular backups were taken and multiple copies of multiple versions of this data were created and saved. These backups were used for immediate restore of critical assets, such as a database that needed to be rolled back to a previous point in time or an entire production system that was destroyed. Backups also provided file and folder-level recovery from user error or a system crash.
As it aged out, this data became inactive and were typically saved in an archive state (within the backup infrastructure) and seldom recovered. The backup process itself involved a data flow that was mostly in one direction, from the primary data store into the backup infrastructure, with an occasional restore done during the first few months.
Backup systems were essentially tied to the growth of primary storage and deduplication was the fundamental technology used to handle that data growth as disk-based systems started to replace tape. Dedupe was ideal for this process since it allowed multiple similar copies of a data set to be saved with minimal impact on the backup infrastructure – on either the storage capacity, processing overhead or network bandwidth.
A new type of file data
Now there’s a different kind of data, led by the explosion of digital content, that’s filling up storage systems in company data centers. Several industry trends are creating this new type of unstructured data:
- Digital content (videos, images and audio assets) is becoming more common in regular corporate data sets to support social media and a consumer preference for videos, infographics and animation.
- “Monetization” – reusing existing assets to create new business value – is encouraging a save-everything mentality and is increasing the demand for storage.
- The Internet of Things and the practice of capturing a wider range of data objects for use in analytics are creating larger archives of all types of files.
A new problem for backup systems
These file-based data are typically created and seldom changed so they don’t get backed up over and over again, throughout their lifecycle, as is the case with documents, spreadsheets and much of the data that’s historically been backed up. This, and the fact that many of these file types are largely image-based, means they don’t deduplicate well, causing a capacity problem for backup storage systems.
But capacity isn’t the only issue. When needed this data must be recovered faster and are often repeatedly accessed, replacing the one-way backup/restore process with a new workflow. This access requirement is forcing data to be stored on disk-based storage systems which for many environments must scale into the PB range, but still remain economical.
A new workflow
These characteristics are shifting the value and the lifecycle of retained corporate data sets overall and causing a change in the demand for backup and long-term data storage. Saving valuable digital assets is cheaper than recreating them (if recreation is even possible), prompting longer retention periods. This, together with the larger file sizes common with digital content, is creating a demand for higher capacity archive storage systems.
A backup system and an archive system
The answer to this problem is to first pull much of this new unstructured data out of the backup process, and maybe even from primary storage, itself and create workflows to store, protect and manage it, through this new lifecycle. Companies need to establish a new archive with hardware and software specifically designed to meet the needs of this new type of data. They also need to leverage advanced technologies in traditional backup infrastructure that make it more efficient and more cost effective for the data that’s still appropriate for backup.
A multi-tier architecture
In order to handle the sheer volume of data coming in from these new unstructured data sources this archive system needs an infrastructure that leverages multiple tiers of storage. For the most active data, Object Storage as the first tier would provide economical scalability into the PB range with the reliability and data protection these assets require. While not as fast as primary disk arrays, object storage is fast enough for this new workflow and especially well suited for streaming larger files.
Object storage, tape and the cloud
Cloud storage can be a tier as well, providing off-site protection and maximum flexibility using physical or virtual on-premise appliances to improve performance and simplify implementation. New file-aware tape technologies that leverage LTFS (Linear Tape File System) are another attractive solution for a long-term tier in this infrastructure. With LTFS, data is available directly from the LTO tape cartridge for simple file retrieval, even outside of the archive system.
High performance, global file and archive system
Managing these new data types across a tiered storage infrastructure requires a high performance file and archive system, like StorNext 5 from Quantum. This software architecture can provide file-level access to data on any storage tier and make it available to users running all the standard compute platforms.
In order to achieve this performance the system should use sophisticated data handling processes that separate metadata on the fastest disk tiers and on flash storage. Metadata is the indexing information – the “data about the data” – used to access the files themselves, so speeding up metadata processing can improve performance significantly. The actual files themselves can be put on slower tiers, including capacity-centric disk storage that has data integrity checking to assure long-term asset protection. The architecture should also be able to leverage solid state disk to provide better data streaming.
Data protection is still important
The active archive infrastructure must also provide effective data protection since valuable data is now being stored outside of the backup system. This can mean technologies like erasure coding and data integrity checking at the disk level instead of traditional RAID methods which aren’t efficient at the PB scale these archive systems often reach. Creating automated copies to tape can be another option to provide affordable data protection.
Management
Also important are effective monitoring, reporting and diagnostics to simplify the management of data sets across multiple storage tiers. These new workflows focus on gaining insight by comparing lots of data, admins need tools for efficient data handling and capturing real-time information.
Improved backup technologies
While moving these new types of unstructured data into an active archive is the primary strategy, companies still need a backup system – and it needs an upgrade too. The data explosion is hitting backup-appropriate data as well and straining storage capacities. Backup systems need the ability to handle more data, more files and smaller files efficiently, and to provide more granular recoveries.
This means an integrated, high-performance file system with efficient metadata handling to provide better overall scalability, one that’s journaled, for extra protection in case of system failure. And, while deduplication doesn’t solve all storage problems, it’s still important. To this end, the backup system needs to leverage the latest technology like Quantum’s DXi with variable-length dedupe that the company claims can provide a 90% data reduction rate. In addition to reducing storage capacity it also minimizes the time and bandwidth required to move data across the data center and across the WAN to a DR location.
Summary
Traditional backup systems are getting overwhelmed by a new type of unstructured data. These new data files don’t deduplicate well, are saved for longer periods of time and being accessed more frequently than backup systems were designed for. The answer is to move this data out of the backup process altogether and create a new workflow that’s supported by a multi-tiered, active archive storage infrastructure.
This Article Sponsored by Quantum

[…] Read Storage Switzerland’s views on next-generation data workflows […]
[…] complexity that is to blame for essentially breaking traditional backup processes in his piece “Why Backup is Failing and How to Save It”. Erik describes the growing importance of unstructured data that is not really suited to […]