The new era of data analytics has opened up a new opportunity for storage professionals to become strategic partners and advisers to line of business stakeholders. The problem is that completing analytics queries is taking far too long – frequently weeks or even months. One of the key logjams to this process is identifying, obtaining and preparing the data that is relevant to a business query. A major part of the problem is that there are only a finite number of individuals at a given enterprise that have the required skills.
Funneling the correct data to an analytics query requires an understanding of four basic questions: what data is relevant, what data needs to be moved (and how), how does the data need to be cleaned and integrated, and who has access to the data.
To say that organizations are capturing a vast amount of data is an understatement. To understand the meaning of data and that data’s relevance to an analytics query, the data engineer first needs to be able to understand what is being captured across what are often thousands of tables that constitute a given database.
This issue leads into the data access issue. Especially as data governance rules and regulations are becoming stricter while at the same time constantly evolving, the data engineer should not need to request access to all of these tables, but they frequently need to because they are unable to narrow down at the outset, where the relevant data is within the database.
It is likely that relevant data will live across multiple databases – adding complexity to the data access issue, and also adding another layer of technical expertise that is required in the form of understanding what data needs to be joined and how.
Once the relevant data has been identified, the data engineer requires visibility into the data’s quality. They need to know what data needs to be cleansed, and how. Understanding if data has been corrupted, if data is inconsistent with other databases, if data is incorrect due to an input error by itself is a lengthy and challenging process, especially against a tide of data sprawl. The data engineer then also needs to understand how to rectify, including modifying or deleting data.
For its part, Promethium has developed a Data Navigation System tool that greatly streamlines the process of identifying data relevance and quality across the multiple databases and data warehouses that enterprises use. Our recent on demand webinar, “The Top Four Challenges of Data Analytics and How to Solve Them,” has additional context.