Advanced Data Protection for MongoDB, Cassandra and Hadoop

Posted on August 12, 2016 by wcurtispreston

Datos IO Briefing Note

Customers with massive petabytes sized cloud databases are exactly who Datos IO is looking for. Companies creating giant product catalogs that thousands of people a day are simultaneously accessing and updating find that they cannot meet their needs with typical SQL databases. In turn, they start looking at products like Cassandra and MongoDB to solve their performance problems.

The challenge is, as mentioned in my previous brief about Datos IO, backups did not seem to be on the forefront when these products were designed. They do replicate data to multiple locations, and too many people think replication is the same thing as backup – but it clearly is not. If some sort of logical corruption happens, replication will make sure that corruption is spread everywhere.

Backups serve multiple purposes; they protect against both logical corruption and physical device or site failure. Replication cannot protect against logical failure unless it has a historical component to it, and the replication built into these products does not have that. Replication can protect against site failure, but with these databases there is a problem: the replication functionality operates on an eventually consistent model. Data is asynchronously replicated to multiple sites and it will eventually be consistent if given enough time. This means that different sites are at different points in time as far as database transactions go.

The eventually consistent replication model results in an interesting situation when one or more sites go off-line and other sites have to take over. The database is now in an inconsistent state and must go through a significant recovery process that can take days or weeks. The database will eventually recover, but the system will be offline during that time. This is the reason for Datos IO to develop a product to create a single snapshot across an entire database. It could back up that database and use it to restore to a single point in time, removing the need for a large recovery process.

The latest enhancements from Datos IO include the ability to backup and restore these large databases anywhere they reside. A customer running a massive database in Amazon can restore that database to Amazon, to an on-premises system, or even to another cloud provider. A customer running an on premises system can restore that database to Amazon S3. Datos IO RecoverX includes support for MongoDB, Datastax, Apache HBAS, Cassandra, Hadoop, and Amazon DynamoDB. Datas IO claims to have a number of early adopters including IoT vendors, as well as customers among the retail and media and entertainment spaces.

StorageSwiss Take

It’s amazing to learn about the number of large enterprises using a database product that doesn’t have a good backup and recovery story. It should only take one public incident to scare the rest of them into doing something better. It’s good to see a company tackling this new market with a product that provides the scalability and performance that databases of this size need.

About wcurtispreston

W. Curtis Preston (aka Mr. Backup) is an expert in backup & recovery systems; a space he has been working in since 1993. He has written three books on the subject, Backup & Recovery, Using SANs and NAS, and Unix Backup & Recovery. Mr. Preston is a writer and has spoken at hundreds of seminars and conferences around the world. Preston’s mission is to arm today’s IT managers with truly unbiased information about today’s storage industry and its products.

Tagged with: Backup, Cloud, Datos IO, Replication, Restore
Posted in Briefing Note

Advanced Data Protection for MongoDB, Cassandra and Hadoop

Datos IO Briefing Note

StorageSwiss Take

Share this:

Related