90% of all the data in the world was created only in the past 2 years. That’s every text you send, every movie you stream, every podcast you listen to, every purchase you make, every Google search you key in…basically every activity you undertake online. Thus, companies today are sitting on a tottering mountain of data.

As of today, most of this data is in the data lakes of companies, accessed mainly by CIOs and their IT teams. But somewhere in this junkpile are nuggets of knowledge: insights, and trends that could bring tremendous value to the business, if this data is collated and analyzed right.

Sorting out the “junk drawer”

Hadoop and other Big Data handling frameworks make it supremely easy to get data into data lakes. This makes organizations very enthusiastic about spending dollars to put massive amounts of data into storage (in fact, the global data lakes market was valued at USD 3.24 billion in 2017.)

I&I Software Pillars of Data Governance — data cataloging
Four principles of data governance.

But doing this is like putting things into the junk drawer: it’s all in there somewhere but finding what you need when you need it becomes almost impossible.

Thus, data governance is built on four principles: availability, accessibility, data integrity, and security. Data lakes can take care of only the first — availability. In order to carry out any kind of analysis and derive business intelligence, the data stored in lakes must fulfil the other three criteria.
From a data privacy and security point of view as well, knowing what data is going into the lake, how it is transformed and who is accessing is extremely important. This is where having a catalog of this data becomes extremely important.

So, what really is a data catalog?

A data catalog is literally that — an organized registry of all the data an organization possesses; the data remains in its original location but a copy of its metadata is preserved in the catalog. This allows users to find relevant data irrespective of where it’s stored across systems, know more about these data sources, and collaborate with each other to carry out analyses on this data. And because all of this stays in a centralized system and access is monitored, data integrity is not compromised. Thus, it becomes a trusted source of clean, usable data for the business.

Data catalogs are made up of metadata containing the definitions of database objects. For instance, base tables, views or virtual tables, indexes, users, and user groups. Users can not only search and find, but also contribute to the catalog by tagging or annotating the data sets, thus forming a continuous symbiotic cycle.

Who benefits from data cataloging?

In many ways, data catalogs support democratization of data beyond the reach of just the IT team. Depending on the need at hand, organizations can allow different user groups differentiated access to the repository. Some of these users include:

  • CXOs who want a top-level view of the whole or parts of the business. For instance, CRM or accounts.
  • Analysts who are trying to solve specific business problems like user behavior or sales trends.
  • Data stewards who are putting in place policies, processes and responsibilities to meet compliance requirements.
  • Data creators who generate the data that goes into these systems as part of their everyday work and who will need to retrieve and study it to make business decisions.

The search for the ideal data catalog

The benefits of a data catalog have been clear for a while now and there are many data catalog solutions providers in the market. While selecting the solution for your business, ask yourself these questions.

Is the solution user-friendly?

The catalog system must help your users to easily navigate and find what they are looking for; otherwise, it defeats the very purpose of its existence. This means a user-friendly interface and easy-to-understand language that is consistent with your business glossary, so users are able to correlate what they are seeing with what they already know. This also positively influences the speed with which they can access relevant data, improving impact timelines for the business.

Is the solution collaboration-friendly?

Different teams of stakeholders must be able to access and collaborate on the data, from contributing to its quality and usability through tagging and annotations to working together to construct or manipulate datasets as a part of enterprise-wide projects.

Is the solution adaptable?

We need to be prepared for an era of ever-changing legislation in the field of data, and the data catalog solution you choose must be agile enough to adapt to such requirements.

Is the solution integration-friendly?

Through open APIs, the new data catalog system must be able to integrate efficiently with existing tools and data sources, and understand and apply current business glossaries.

Is the solution scalable?

As your organization grows, the quantum of data to be handled, the number of different sources it gets pulled from, and where these sources reside (cloud? On premise? hybrid?) will all change and grow in complexity. The data catalog system you adopt must be able to scale and adapt according to these needs.

The data you possess is your most powerful weapon and we can help you generate the maximum value for your business out of that. Explore our data governance solution with Collibra to understand how.


Leave a Reply

Your email address will not be published. Required fields are marked *