Data management is moving to the top of many IT project whiteboards and the reason why is simple, organizations are drowning in unstructured data. There is too much data, too many files and organizations store it in too many locations. IT just can’t keep up. The problem is traditional data management solutions primarily focus on moving data from expensive storage to inexpensive storage. Traditional data management solutions don’t provide the level of insight necessary to intelligently enable organizations to manage, move and find data. Intent-based data management focuses on the user or application’s request making sure the data is in the best available location and directing them to it.
Today, data centers have many choices where to store their unstructured data. Users can store data on-premises, using a network attached storage (NAS) system, or they can store it on an on-premises object store, or they can store it in the cloud, or in multiple clouds. In many cases the organization stores copies of the same data in all of these locations, contributing to data sprawl. The amount of data combined with the diversity of storage locations makes managing and finding data almost impossible from a human perspective.
Another reality is the users don’t tend to utilize the available tools to help them find and manage data. For example, most file-systems today only POSIX defined metadata which gives little insight into the contents of a file and may not be consistent across storage types.
Managing Data is Serious Business
To compound the data management problem even further is the increased level of importance placed on unstructured data. Organizations mine unstructured data for analytics and machine learning purposes, and there are an increasing number of regulations that set specific requirements on organizations as to how long they should retain information, for what purposes it may be used, and who should have access to it.
What the Data Center Needs
The data center needs to get out of the data management business, at least the manual process of managing it by copying data. The first step in this process is to virtualize data, disaggregating it from the physical storage and the physical location. Organizations need a global metadata engine that knows the exact location of every file and every copy of every file. A global metadata engine allows data to move between on-premises systems and the cloud on-demand at file-level granularity. All of this should be fully automated, driven by machine learning to continuously adapt to changes in the infrastructure or business objectives. An essential part of this process is also to break down protocol barriers so that data is accessible anywhere via standard NFS or SMB protocols or object storage protocols.
The first step is where most data management vendors hope to get. Once there, they consider the job complete, but it should only be the beginning of the data management journey. The next step is to provide hybrid cloud data management. Most data management solutions have a one-way relationship with the cloud. Data is moved there and never retrieved. Instead, solutions need to move data between cloud and on-premises and between multiple cloud providers. The solution should also leverage machine learning and automation so that data flows seamlessly between the various locations, arriving before the user who needs it requests it.
The final step is providing metadata-as-a-service which includes extendable tags and keywords, leading to the creation of data catalogs, and instant global index and search capabilities. The data-as-a-service component should analyze data on its own to index it and self-tag data for improved searches. With tags in place, users and data administrators can find the right data when needed as well as confirm compliance with various industry regulations.
Hammerspace is delivering an intent-based data management solution. It installs seamlessly into an organization’s on-premises environment and their various cloud accounts. It then creates a metadata index that contains the exact location of all data and creates a global file system. Users merely see their data; they don’t need to know where that data is. IT or users can move data between locations without changing any applications or workflows. Think of it as a DNS server for data. When a user requests a specific set of data the metadata routes them to the actual data location, and once the solution makes that connection, data flows directly to the user or application without the need for any re-direction.
Metadata can come from a variety of sources. When users or applications create data, a file system also creates metadata like a filename, who created the data, and which application can open the data. When data is later accessed, a file-system captures more metadata like, who accessed the data, when and if it was modified, and when it was accessed. Finally, metadata can be added ad-hoc either by users, applications or the creating device. These tags or keywords help with classification of data. Hammerspace aggregates all this metadata into a single engine that it uses to drive its intent based data management, which also becomes part of the metadata engine.
Once the solution creates the metadata index, Hammerspace uses machine learning and analytics to provide further insight into the data. It allows users to manage data through an automated feedback control-structure, continually optimizing cost and performance of the data.
Once IT installs the Hammerspace solution, organizations can seamlessly leverage multi-cloud distribution to gain price advantages when cloud providers change pricing models. They can also run analytics jobs from a single location and access data across all locations. Additionally, they can also use Hammerspace to find specific data faster to meet compliance and discovery requests.
The amount of unstructured data in terms of capacity, number of files and number of destinations is scaling beyond human comprehension. Even if IT could manage the data, it doesn’t have the time. Leveraging machine learning and data analytics to manage data better not only makes sense, but it may also be the only option. The first step though is the separation of the control path from the data path. Once Hammerspace consolidates and isolates the metadata, it takes the breakthrough step to analyze and perform machine learning to make the placement of data autonomous.