Data Catalog vs Data Registry
Metadata has long been foundational to data governance, but as data evolves, so do the challenges in managing that data – across privacy, security, and governance. In recent years, data catalogs have emerged as foundational tools in data governance for capturing, managing and searching metadata. But is that enough?
A part of the data world, metadata simply refers to the column of information available and located inside databases or similarly built sources of data. It’s the data about data – and critical to data professionals. It provides a map of what data resides where, speeding the ability of data professionals to locate the best data for AI and BI, manage data commercialization and fulfill data regulatory requests, and more.
Traditional data catalogs, however, have serious limitations:
Limited Data Source Coverage: While structured databases and warehouses have a prominent spotlight on them, there’s a lack of focus on blind spots – which include files, documents, images, messages, messaging platforms, SaaS, data pipelines, developmental environments, NoSQL, and a lot more.
Lacking in Scale: Typical metadata catalogs aren’t able to cover the entire data estate within companies, enterprises, or large sets of data. They lack scale to cover an entire data estate inside an enterprises leaving organizations a map of, say, Winnipeg when what they really need is a map of the world
Siloed capabilities: If you’re limited in what data you’re looking at – that means no global data profiling, no consistent data inventory, and very limited security and privacy awareness.
The result? These types of catalogs alone provide an incomplete picture of an organization’s data universe – and that’s problematic not only for identifying and managing high value data, but also for identifying and managing high risk data.
Data Fabric and Expanding The Data Visibility Aperture
One way organizations have recently begun to address these challenges is by expanding their field of vision via virtualization strategies – like data fabric. While traditional catalogs perform best on highly concentrated data sets in specific data sources, a fabric can scale beyond this. A data fabric provides a means to virtualize access to distributed vs concentrated data sets.
Not all abstraction or virtualization strategies, however, are equal. Most require proxies – adding latency and single point of failure. These approaches tend to limit views (to SQL supported data sets), leaving a blindspot across the rest of the data landscape (from SaaS to noSQL, to files and messaging.)
Products like BigID have emerged to address this: becoming the foundation for running data discovery and governance functions on all data – across the entirety of the fabric.
Getting a Searchable Global View of Your Metadata, Sensitive Data, and Personal Data
In response to some of the limitations of today’s catalogs in terms of coverage and context, some organizations have begun exploring data registries, to complement that data fabric. With a data registry layer, organizations are able to:
- Expand the field of vision for the data beyond just a limited pool data lakes and warehouses
- Provide scale to look across the entirety of a corporate data estate
- Enable discoverability for metadata, privacy data and security data (critically important given growing complexity of environments)
- Apply global profiling capabilities – whether to improve data quality, minimize duplicate data, or even simply cost rationalization possible across the entire volume of company data assets
- Provide the necessary business and operational metadata to simplify actions in data governance – as well as privacy and security activities
Data Registry + Fabric for Finding and Actioning All Data
Data is the lifeline of every single modern digital enterprise. Traditional approaches for data management give an incomplete picture of data – leaving them blind to the bigger picture, and vulnerable to risk. Organizations need to take a more modern approach to manage across the entire data estate. There is a need now, more than ever before, to use virtualization and data fabric together to act on all possible data.
And that’s where data registries come in: helping organizations discover and manage data in context: across not just metadata, but other data artifacts for understanding risk and regulation -like people data and sensitive data. Data practitioners can then get full context for the content they are searching – and streamline actionability whether in data governance, security or privacy.