NoSQL and Hadoop typically distribute data through the use of built-in replication technology. The replication process also provides protection from media or server failure. Many IT professionals, as a result, think that they don’t need to backup these environments. In this ChalkTalk Video join Storage Switzerland and Imanis Data to learn why NoSQL databases and Hadoop need point-in-time backup protection and how to correctly implement that protection.
Although NoSQL and Hadoop replication provides protection against media or server failure, it is flawed in several critical ways. The problem with counting on instant replication is that it replicates bad data as quickly as good data. Data corrupted by a bug gets replicated. Fat finger deletions get replicated. Data encrypted by a ransomware attack gets replicated. With replication any corruption instantly replicates to other nodes. If it take a few hours or days to identify the corruption, all traces of the former version of the data may be lost.
That’s why Hadoop and NoSQL environments require point-in-time backups! The challenge is how to actually backup these eventually consistent environments? Their massive scalability, distributed nature, hybrid cloud deployment, and eventually consistent nature require an approach to data protection that is built for petabyte scale, incorporates data awareness, and leverages machine learning.
In the video Storage Switzerland and Imanis discuss how to properly backup Hadoop and NoSQL environments.