Two Heads Are Better than One: When Data Discovery Meets Data Catalogs

Data Perspective

James A. H. Murray is famously credited as the creator of the Oxford English Dictionary (OED). Over the course of 40 years, the professor and lexicographer—versed in over 14 languages—undertook the most momentous linguistic enterprise since Samuel Johnson: To categorize all the words in the English language, in all their various forms and connotations.

As tenacious as Murray was, he might have never reached his goal without access to the vast “data” stored and processed in the mind of William Chestor Minor, who contributed countless quotations illustrating the way those English words were, in multiple contexts, used.

So, the OED—the universal authority on the English language—results from one of the most fruitful partnerships (read: data integrations) in history. And it continues to set the bar for the highest accuracy standards that exist in its discipline.

Data Catalogs and Data Discovery: An integration bigger than the sum of its parts

CDOs, data stewards, and compliance officers who face the growing challenge of efficiently unlocking value from their data without ever jeopardizing compliance will understand the Murray-meets-Minor takeaway: Data understanding seeks data intelligence, and vice-versa.

In the world of data management, data governance teams might in fact maintain a data dictionary and a business glossary. The data dictionary is typically system-specific, and will define how data elements and their meanings relate to technical parameters like column name, for example. A glossary typically serves more of an enterprise-wide need to define business terms that can have multiple technical relationships. In other words, glossaries are useful for definitions but still need the context on where they are used.

Data catalogs emerged over the course of the last few years to provide the business-friendly terms and technical metadata mapping to facilitate collaboration, crowdsourcing, and data sharing across an organization. In plain English. In terms that are understandable. Across loads of data sources.

Data understanding, check.

Enter BigID, an intelligence platform that uses bottom-up data discovery to align those business-friendly terms and metadata with an inventory of sensitive and personal data. Organizations can now get a unified inventory and data catalog that includes Personally Identifiable Information (PII), Personal Information (PI), metadata, business terms, and more. In other words, they can expand their “vocabulary” to speak the language of compliance. And to be more searchable. And to scale amidst the era of big data.

Data intelligence, check.

Great. So what does this mean for me?

Say you’re a global retailer looking to balance self-service data access with compliance and privacy.

You probably have unstructured data from thousands of sources across a private cloud, on premises, and in SaaS environments—plus structured data in all forms of SQL, flat files, file shares, Cassandra, MongoDB, and various other applications and connection points. Any data mapping and data curation you have may be largely manual, building spreadsheets that are perpetually out of date, with little to no unified visibility for privacy and governance.

As a forward-thinking CDO who knows that privacy regulations and ethical concerns aren’t going anywhere—and are only getting more detailed and robust—you might also want to build an ongoing strategy that will help you automate all data management, ongoing and at scale.

How it works: Behind the scenes of a unified front

Let’s dig into how privacy-centric data discovery can automate and enrich governance so you and your organization can:

Integrate data understanding and data intelligence

  • Automatically find, inventory, and map PI—not just PII
  • Connect business and policy terms from data catalogs to data elements, objects, and physical assets at scale
  • Enhance data trust and quality through bi-directional, ongoing metadata exchange
  • Ensure descriptions, physical data assets, and metadata are accurate and up-to-date

Save time and reduce manual efforts

  • Accelerate population of data catalog via automated discovery
  • Extend the data catalog business glossary terms across the enterprise data landscape, including unstructured data
  • Automate data sharing and analytics through synchronization of data catalog attribute and asset tagging and BigID active metadata
  • Automate mapping of all data by person, type, server, application, physical location, etc., without copying or duplicating

Reduce ongoing risk at scale

  • Automate data governance policies and integrate ongoing discovery
  • Monitor for new sensitive information findings for ongoing compliance
  • Enhance ability to manage privacy risk within workflows
  • Monitor and automatically learn from changes to sensitive info

Operationalize and scale privacy compliance

In essence, an integrated data privacy and data governance capability will help you create a bidirectional exchange of information that maps the data catalog business glossary to BigID’s inventory. The automation is designed to enhance accuracy, reduce workload, and get (and keep!) you up to compliance standards—quickly and sustainably.

CDOs can then use the BigID data inventory to enhance data governance policies across enterprise datasets, propagate tagging of attributes defined in the business glossary at scale, incorporate personal information context, and improve the data catalog with active metadata generated from ongoing discovery and analysis.

So, while data catalogs make data governance business-friendly, Big ID turns “business-friendly” into “regulation-friendly,” which, in turn, enhances data catalogs. In a lot less than 40 years.