A smart and holistic data management platform is the linchpin to enterprise competitiveness, but at this point in time, it remains largely a nirvana that storage managers are chasing. In this article, Storage Switzerland explains what constitutes data management, how it is evolving, and why it is crucial for modern enterprises.
What is Data Management?
Data management is the collection of administrative processes that span the data lifecycle, including the creation, validation, protection, storage (e.g. archiving), processing and removal of data. The core objective of data management is not the reduction of storage costs (although that is an important side benefit). Data management’s primary objective is ensuring data quality, or the data’s ability to serve specific purposes such as analytics queries or eDiscovery requests. A number of factors impact data quality, including:
- The data’s accuracy, integrity and reliability. It must be complete, consistent across the organization and up-to-date, as well as free of corruption or errors.
- The data’s accessibility and integrity. It must adhere to a common format and be readily available when the user needs it.
- The data’s relevancy to the job at hand.
Data quality (and thus, data management) matters because it underpins day-to-day business operations, as well as providing the intelligence base on which businesses innovate, make strategic decisions, and facilitate internal collaboration and engagement with customers. For example, an analytics query may help a business to better predict demand. Application developers are employing data-driven, agile development to bring more relevant applications to market faster. Meanwhile, we are on the tipping point of artificial intelligence (AI) which will have broad implications ranging from customer support chatbots to refined infrastructure planning. Larger amounts of data are being stored for longer periods of time, and they are being used more actively by lines of business. Data is being heavily repurposed, mined and monetized.
At the same time, data privacy regulations such as the European Union’s (EU’s) General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) add to the quantity of data that must be stored, and they create the need for more finely-tuned control over data. For example, the enterprise might need to quickly identify files that contain personally identifiable information (PII), and per “the right to be forgotten,” they might need to quickly locate and erase or anonymize any files that contain a specific user’s data. Data breaches and failure to comply with data privacy regulations can significantly tarnish the business’ reputation.
Finally, as storage capacity requirements grow and as solid-state drives (SSDs) and non-volatile memory express (NVMe) introduce a new price premium to storage environments; it becomes more important than ever to use storage resources as cost effectively as possible. Not all data and workloads require Tier 0 performance and maximum levels of availability. A data management platform that has intelligent tiering can help businesses cut costs by better utilizing their infrastructure resources. Meanwhile, the integration of AI will support better capacity and refresh (and as a result budget planning). At the same time, intelligent data management can help to support business continuity (thus avoiding revenue and productivity loss) by helping storage managers to identify and remedy issues more rapidly.
Why is Data Management So Difficult?
To deliver on more demanding levels of capacity and performance, storage infrastructures have become highly heterogeneous. For instance, NVMe has introduced a new tier of storage performance for production workloads, and object storage is becoming more commonly used for near-line and archive use cases. At the same time traditional file systems are being modernized to be more parallel for hyperscale workloads, scalable to support large file counts, and cloud-integrated to support data mobility. Modern hyperconverged and composable architectures coexist alongside traditional solutions – as do containerized, virtualized and bare metal environments.
Meanwhile, secondary storage environments are characterized by the need to balance requirements for instantaneous recovery and seemingly endless capacity according to a growing number of application-specific service level agreements (SLAs). Finally, data must be integrated across on-premises and cloud resources as well as a growing number of edge locations
At the same time, the advent of business analytics, AI and DevOps requires a blurring of the line between primary and secondary storage; although not directly revenue-generating day-to-day, as these workloads still add to the need for a faster tier of performance within the secondary storage environment – as well as for the reduction of data silos. The new compliance-driven world we are moving towards also necessitates fast recall and granular searching and control.
What to Look for in a Next-Generation Data Management Solution
To achieve these ends, there are a few key capabilities that storage professionals should look for in a data management solution. These include:
- The creation of a centralized global namespace across storage infrastructure resources (public cloud, hyperconverged infrastructure, solid-state drives, hard drives, etc.), access protocols (NFS, SMB, S3 and block), and data types (structured and unstructured). One size does not fit all in the storage space; certain architectures have benefits over others (such as scalable capacity versus faster performance). At the same time, however, data silos are not acceptable. Analytics and machine learning (ML) are only as good as the data they are fed, so a comprehensive picture is required. Meanwhile, the enterprise needs to be able to respond efficiently to eDiscovery and data privacy requests, no matter where the data lives.
- Along a similar vein, the ability to index, tag, classify and search data granularly is becoming table stakes. When it comes to compliance and legal discovery, a data access map, immutable file access history, and policy modeling can also help a lot.
- Automatic tiering and removal of data. Today, these functionalities are largely driven by policies set by the storage administrator. In the future, they will become more predictive based on usage patterns. Data needs to be available as quickly as possible upon a user access request. However, storing all data on a premium tier of storage is cost prohibitive.
- Reporting and chargeback capabilities, so that the business has visibility into what is being used and by whom.
- Granular but efficiently executed metadata operations. Metadata is fundamental in terms of facilitating data access, but it can eat up very valuable processing cycles (sometimes accounting for 80% or more of traffic).
- Data protection, including SLA-driven backup, replication, failover and recovery. Many data protection solutions are beginning to incorporate broader data management functionality, but it is important to understand that one does not necessarily equal the other.
The data management marketplace is in flux, as a slew of vendors jockey for leadership in terms of addressing this top-of-mind budget priority. It is important for storage managers to collaborate closely with lines of business regarding forward-looking priorities, to ensure that technology budgets are spent in a way that facilitates the highest levels of data quality and access possible. Data privacy and serving secondary business operations such as DevOps and business analytics will remain among the hottest topics surrounding data management, and these items are typically quite specific to the organization. Finally (but not least importantly), storage managers are pressured to oversee a growing and complex ecosystem of resources and to meet increasingly demanding line-of-business requirements. Against this backdrop, analytics and AI will go far – as will the ability to create a data fabric that is agile and can draw from a range of underlying storage infrastructure resources.