The Challenge of Protecting Next Generation Databases

Posted on October 2, 2015 by Joseph Ortiz

Database Technology Evolves Beyond Traditional Data Protection

For over three decades, relational databases like Oracle, MS-SQL, DB2, MySQL, Sybase, and Informix have been primary databases for core business applications. They are used to store and analyze what used to be considered large amounts of structured data important to the organization. They typically use a Structured Query Language (SQL) with strict rules on how the data is formatted, stored and accessed.

For the last few years however, there has been an unparalleled explosion of both structured and unstructured data growth with the bulk of it being unstructured data. This is driving the need for new means to store and analyze tremendous amounts of what has come to be called “big data” which outstrip the capabilities of traditional database servers.

This in turn has led to the development of next-generation scale-out, cloud, NoSQL (No Structured Query Language or Not just SQL) database products like Cassandra, MongoDB, HBase, and DynamoDB, among others, that are able to ingest very large quantities of data very quickly in real time. However, unlike most of their predecessors, they use a distributed computing model, which means copies of the database product are installed on numerous commodity servers, and run in parallel.

New Databases Need Protection

These next-generation database products are designed with fault tolerance in mind and try to protect against node failures and/or disk failures by replicating various copies of the data across multiple nodes and data centers. Many organizations think that this means they do not need to concern themselves with additional application recovery protection but that is a grave mistake. The replication capabilities only address availability requirements. They do not provide point-in-time recovery, so it is impossible to go back and fix operational errors or corrupted data.

Clearly, these next-generation databases also need proper recovery tools just like their predecessors. Unfortunately, current data protection products from legacy vendors do not have a means to protect these next-gen databases and up to now, there have been few good point solutions to protect them either.

Datos IO Reinvents Recovery with Distributed Versioning Platform

To meet the application recovery needs of scale-out databases, Datos IO is introducing the industry’s first distributed versioning platform which they claim guarantees consistent versions across all scale-out databases and provides enterprises with what Datos IO calls a single state of truth for their distributed applications.

The first version of the Datos IO product brings several industry-first innovations:

Cluster-wide consistency: Cluster consistent versions can be at any point in time and are designed for configurable RPO (Recovery Point Objectives) needs as low as 15 minutes to hours.
Orchestrated “repair free” recovery: Users can restore an entire keyspace/database or a single column family/collection without manual steps. The Datos IO versions are stored in native format and are database consistent. Hence, no repair is needed on restores, which reduces the RTO (Recovery Time Objective).
Industry-first Semantic deduplication: Built from the ground up for scale-out, eventually consistent databases, Datos IO extends the traditional notion of deduplication to also include semantic equivalents of data values.
Scale-out software platform: Datos IO platform grows horizontally with application recovery needs from a single node to scale-out Datos IO clusters.

The first release of Datos IO will cover Cassandra and MongoDB databases with more databases targeted in the future.

StorageSwiss Take

Big Data, NoSQL and Cloud databases are moving into the enterprise, but the enterprise will be more concerned about data protection. They understand that redundancy is not backup. As a result, there is a clear need for enterprise class application recovery tools to protect these next-generation databases and Datos IO appears to have a very well engineered solution to address those needs. Not only does Datos IO help cover enterprises but also it may very well be the tipping point that makes next-generation databases more than a corner project for these organizations.

About Joseph Ortiz

Joseph is a Lead Analyst with DSMCS, Inc. and an IT veteran with over 35 years of experience in the high tech industries. He has held senior technical positions with several major OEMs, VARs, and System Integrators, providing them with technical pre and post- sales support for a wide variety of data protection solutions. He also provided numerous technical analyst articles for Storage Switzerland as well as acting as their chief editor for all technical content up to the time Storage Switzerland closed upon their acquisition by StorONE. In the past, he also designed, implemented and supported backup, recovery and encryption solutions in addition to providing Disaster Recovery planning, testing and data loss risk assessments in distributed computing environments on UNIX and Windows platforms for various OEM's, VARs and System Integrators.

Tagged with: Big data, Cloud, Data Protection, Datos IO, Deduplication, Next Generation, NoSQL, RPO, RTO, SQL, Unstructured data
Posted in Briefing Note