Can You Fall in Love with Data Classification Again?

Posted on July 28, 2017 by George Crump

Veritas Integrated Classification Engine Briefing Note

Almost a decade ago, data classification was the new “in thing” for the data center. The project looked like a great idea on the whiteboard, but then the wheels came off during implementation and operation. There was too much dependence on the human element.

But there is still a need for data classification. In fact with regulations like the European Union’s General Data Protection Regulation (GPDR) compliance policy, it is needed more than ever. IT must have technology that gets around the human element and delivers a solution that will help meet all of those governmental and organizational regulations.

Why is Data Classification Broken?

Most data classification projects started off with the best of intentions. But the project counted on humans to tag or classify their data as they went along. There were two problems with this strategy. First, the process is counting on someone to make their own decisions like; what the document is about, where it needs to be stored and for how long it should be retained. Second, the process is counting on that same someone to make those decisions every single time a document is created or modified, no matter how busy they get.

In hindsight, the failure of the first wave of data classification should have been predictable. Humans don’t have a great track record of doing mundane tasks consistently over a long period time.

The other alternative is bulk tagging. Essentially bulk tagging is the tagging of everything within a group. A simple example is everything that Human Resources (HR) creates gets an HR tag. The problem is, of course, that not everything in HR needs the same level of retention and even within sensitive documents, the time required to retain data will vary. Also bulk tagging makes finding the right data difficult, since everything with the HR tag will be presented as a search result.

Finally, unstructured data has grown at a rapid pace, and the amount of unstructured data that most organizations store now, compared to ten or even five years ago, is dramatically different. Even if much of the data growth is from machines and devices, which tend to tag better, the enormity of the task in front of IT is massive.

How To Fix Data Classification

Fixing data classification requires automation. Leaving it to humans leads to inconsistent and often incorrect or incomplete tagging of data. To help this process, Veritas is announcing its new Integrated Classification Engine that uses pre-configured patterns to determine over 100 different sensitive data types. It also includes over 60 pre-loaded polices to help organizations adhere to regulations like GDPR and HIPAA.

One of the challenges with automated classification is, how accurate is the software as it tags files? The Integrated Classification Engine uses confidence scoring and quality assurance tools to minimize false positives. The solution is integrated with Veritas Data Insight 6.0 and will be available as an add on option to Veritas Enterprise Vault in August. Over time the engine will be integrated into the entire Veritas portfolio.

StorageSwiss Take

Regulations like GDPR and HIPAA are moving data classification from a “nice to have” to a “must have” data center capability. But both of these regulations as well as many others are really just reinforcing best practices when it comes to managing data. Organizations have lived in the digital economy for years now, it’s time to manage data as the important asset that it is. Storing all data without knowing what there is or being able to search it is just as bad as deleting it all. Data classification solutions like Veritas Integrated Classification Engine makes data asset management a reality.

About George Crump

George Crump is the Chief Marketing Officer at VergeIO, the leader in Ultraconverged Infrastructure. Prior to VergeIO he was Chief Product Strategist at StorONE. Before assuming roles with innovative technology vendors, George spent almost 14 years as the founder and lead analyst at Storage Switzerland. In his spare time, he continues to write blogs on Storage Switzerland to educate IT professionals on all aspects of data center storage. He is the primary contributor to Storage Switzerland and is a heavily sought-after public speaker. With over 30 years of experience designing storage solutions for data centers across the US, he has seen the birth of such technologies as RAID, NAS, SAN, Virtualization, Cloud, and Enterprise Flash. Before founding Storage Switzerland, he was CTO at one of the nation's largest storage integrators, where he was in charge of technology testing, integration, and product selection.

Tagged with: Cloud, GDPR, HIPAA, Hybrid, SDS, Vertias
Posted in Briefing Note