It’s no secret that unstructured data is consuming data center tiles across the world. Structured data may have been king at one point, but the vast amount of storage in today’s data centers is being consumed by unstructured data. This explains the advent of object storage systems specifically designed to handle such data and cloud-based versions to handle tremendous amounts of unstructured data. But there’s still a lot of file servers in today’s data centers that are doing nothing but holding unstructured data. Is there a better way to back these up and archive them?
How To Backup Unstructured Data
NAS filers are typically backed up in a couple of different ways. The historic “head in the sand” approach is to simply mount the filer to a server that then backs it up just like any other directory. The downsides to this approach include the fact that the performance of the backups is limited by the performance of the server through which the filer is backed up. They also include the reality that requests for files by the backup server look the same as requests for files by users, allowing a full backup to bog down the performance of a filer.
The official way to backup many filers is via NDMP, and it does have a number of advantages over the NFS mount approach. NDMP allows customers to directly connect tape drive (or virtual tape drive) to a filer, which allows for much faster performance than the previous method. NDMP is also written with some intelligence that forces it to back off if it is causing a performance impact for regular users. The main downside to NDMP is that it is still based on the idea of tape, which kind of places it at odds with modern backup design.
Both the network mount approach and the NDMP approach also suffer from slow restores when restoring a large amount of data. For this reason – as well as the previously mentioned downsides to those backup methods – the preferred method of many customers for backing up a NAS filer is to simply replicate to another filer. It solves the slow restore problem because the second filer can act as the primary filer during any kind of outage. It also solves the performance problems during backup due to the fact that it is a forever incremental technology. The only downside to this approach is the cost of the second filer, which often tends to be very similar to the cost of the first filer.
The Igneous Solution
Igneous decided to tackle this problem in a different way, starting with an object storage system based on a very unique design. As my colleague George Crump covered in his briefing note, Igneous is an S3 API compatible storage system designed to run in your data center. It has chosen to attach what it calls a nano server to each hard drive. Imagine a circuit board the size of one end of a hard drive that contains a small ARM processor and a little bit of RAM. This one-to-one relationship of CPU to disk allows for some very interesting design considerations, even to the point that Igneous uses the “cattle, not pets” mantra when speaking about its systems. Igneous is coining the term JBOND to describe this unique architecture: just a bunch of networked drives.
How Igneous Handles Backup
The Igneous solution to the filer backup problem is also unique. It’s a merging of two techniques above, in that it backs up the files via NFS, but then uses an incremental forever approach like the replication method above. By combining methods like this, Igneous gets the advantages of each without the disadvantages.
All backups are done by accessing the filer like any other network client requesting files. An initial full backup would transfer all files into the Igneous system. Once that is done, it simply needs to scan for new or changed files and transfer them to their object storage system. While doing this, by the way, it is also indexing all files using their integrated search and retrieval system.
It’s important to state Igneous does not want to impact the performance of the filer it is backing up, so it constantly monitors performance attributes like how long it takes to access and transfer each file. If it sees performance deviating from the norm, it will simply back off for a while to give preference to the “real” requests for files that are causing the performance degradation. Once seeing performance resuming to the norm, it can resume a normal transfer rate.
The idea is that customers would have the Igneous system scan all data on the filers via this process, after which the product would give them some insight into file attributes like access time and file type. This would allow customers to determine some files that could be permanently migrated to their system, freeing up space on the primary filer. Doing this can have immediate savings. The product also supports migrating some or all of the data to a public cloud provider.
Igneous believes that backing up your unstructured data in this way could free up significant amounts of capacity and performance on the backup server. It would allow your backup server to concentrate its efforts on backing up structured data, VMs, and containers.
The idea of attaching a nano server to each drive is certainly a unique one, which is why Igneous patented the idea. It does seem very scalable and it would seem this would make sure every disk drive has the right amount of CPU. It also allows for “failure in place” design since the failure domain would be so small. Igneous’ backup design seems to combine the best of two different backup approaches (network mount and forever incremental) without the downsides typically associated with such designs.