ML-Augmented Data Catalogs with Active Metadata

March 9, 2020

5 minute read

Data catalogs are a critical part of any data (and metadata) management strategy. As the complexity of data ecosystems and the volume of data flowing through them grows, the traditional approach to a data catalog needs to evolve.

Rethinking data cataloging requires deeper context, breadth of data source coverage, and orchestrated automation to map and catalog sensitive & personal data with deep data insight – incorporating active metadata, direct and inferred attributes, and classifiers.

Data intelligence is a driving force & catalyst for investing in data catalogs today: organizations need to be able to layer in context from personal, sensitive, and situational data to get value out of metadata management. It’s not just a question of where and how data is found, but how data elements relate to business terms – and how the foundation of a data catalog can be used to refine key points of data understanding like data quality and lineage.

Without the ability to incorporate additional data insights and perspective, many enterprises are taking unnecessary risks: the ability to automate data intelligence enables organizations to utilize their data to the fullest extent.

By managing privacy and security risk while providing a broader view of the data through automated discovery and classification, organizations can get better use of their data, reduce risk, and leverage more context and greater accuracy for advanced analytics.

Inventory Distributed and Siloed Data Assets

Two of the most pressing operational challenges with traditional data catalogs are their limited coverage of enterprise data sources and the degree of manual curation required to tag data elements in the data catalog with descriptions.

Effective data catalogs need to be deployed with the ability to scale, cover all data sources where sensitive and personal data is stored, automatically populate with the right data (and propagate business term tags), and go beyond exclusively technical metadata to be able to provide the context needed for privacy-aware data governance. It’s critical to be able to find and inventory all of an organization’s data – regardless of what type of data it is, where it’s stored, or whether or not they know what it is.

In order to manage risk while driving data management & analytics strategies, leverage an ML-augmented catalog with active metadata and broad data coverage. Incorporating machine learning classification helps organizations to systematically surface the relationships between data elements, automate the process of associating metadata with data elements for activities like privacy protection and data quality assessment, and establish relationships between data that might otherwise be missed. By doing so, organizations can easily gain a unified view across distributed, siloed assets and more easily inventory and find relevant data across the entire data ecosystem

BigID combines an ML-augmented catalog with classification, correlation, and cluster analysis to give deeper data contextual awareness – enabling organizations to get more out of their data while ensuring that policies for privacy compliance and ethical use of data are enforced.

Catalog in Context

By representing, describing and organizing more types of data from more sources into a data catalog, organizations are able to gain a consolidated view that incorporates privacy, security, and business insight – all within a single pane of glass. A privacy-aware unified data inventory combines attributes, metadata, and context around an organization’s personal and sensitive data, providing deeper data insight and understanding.

BigID discovers and classifies more types of data across data sources – from data lakes to file systems to relational databases wherever they live (on-premise, hybrid, and cloud environments) – building a unified inventory, enabling data stewards to make intelligent decisions across an organization’s entire ecosystem. By leveraging a discovery-in-depth approach, BigID can uncover relationships between connected and disparate datasets and incorporate that insight into the data catalog.

What is Active Metadata?

A next-generation approach to data catalogs takes into account not just technical metadata, but business and operational metadata for added context and data-driven insights. By associating more types of data on an ongoing basis, organizations can make more informed and intelligent decisions around their data.

Active metadata focuses on connecting vs collecting: making use of the (passive) metadata organizations have been collecting all this time only at periodic intervals, but now adding context. Active metadata connects all those separate metadata points that may have already become stale with a passive-only approach while leveraging machine learning to add content and context, relating data sets and helping find the right data needed for analysis, governance, and self-driving data management.

Metadata Exchange, Interoperability, and Open Architectures

For full visibility across data & silos, it’s important to be able to integrate with other catalog, data management, and governance technologies: organizations must be able to exchange data, tags, and metadata with other catalogs, apply business terms, consume data hierarchies, aggregate results, and incorporate data context.

In order to drive more value and utility across the data ecosystem, BigID created a bi-directional metadata exchange – enabling organizations to import business glossaries, map business terms, and consolidate their governance landscape for more automated policy management and enforcement. By leveraging a bi-directional, extensible metadata exchange, organizations can benefit from the foundational elements of data discovery & classification while integrating with – and enriching – traditional metadata management solutions, from catalogs to governance workflows.

The BigID Approach to Data Catalogs

An ML-driven data catalog is the first step in metadata management to discover, identify, classify, and manage data. BigID leverages an ML-driven data catalog to automate discovery & streamline manual workflows for greater accuracy, data intelligence, and shorter time to value.

Catalog in context: See attributes, metadata, and context around your data (PI, PII, sensitive data, data relationships) to make intelligent decisions
Holistic view: We don’t just collect data – but connect the dots around it. Our catalog is context and content aware, adding value by associating different types of data to get the big picture
Visibility across silos: Get a consolidated view of your organization’s data, incorporating personal, sensitive, and metadata (even aggregated results from other catalogs and inputs)

See the BigID approach to data catalogs in action with a live demo – or download the white paper Data Catalog 2.0: Rethinking Data Catalogs.

BigID

Meet BigID's author collective, a diverse team featuring product marketers, subject matter experts, and copywriters deeply versed in data privacy, security, and governance. Our collaborative approach harnesses a wealth of industry expertise to craft insightful and informative content, ensuring you stay informed in this ever-evolving landscape.

Contents

Inventory Distributed and Siloed Data Assets
Catalog in Context
What is Active Metadata?
Metadata Exchange, Interoperability, and Open Architectures
The BigID Approach to Data Catalogs