Data Governance 101: How to Build a Business Glossary

Data Discovery Data Perspective

A business glossary is a key artifact that a data governance program produces to demonstrate that the organization has an agreed-upon understanding of key business concepts, business terms and the relationships between them. It is also used to demonstrate adherence to data policies and regulations.

Beyond the audit requirements, the business glossary is meant to serve as a centralized knowledge set that documents:

  1. the definition and usage of the business term amongst the different Lines of Business,
  2. the physical instantiation of the data including the authoritative source,
  3. acceptable data quality rules for measuring the business terms,
  4. and the owners and partners responsible for creating and defining the usage of the term.

Business Glossary Challenges

Creating a business glossary – the start of a formalized data governance program – is still a traditionally labor-intensive task that involves asking for subject matter experts to document and input their definitions into a centralized spreadsheet.

Technology owners and operation teams also contribute their understanding of the business term into the business glossary. This is repeated until all stakeholders can weigh in and agree upon the final definition and usage – an inefficient task, considering the large number of data elements that must be included in the business glossary.

After addressing the manual documentation of the different metadata describing the business term, the next biggest challenge to creating a business glossary is communicating and negotiating a process to agree upon the final information that should be documented in the glossary. There are multiple layers of people to provide input, review and approve to ensure that all points of view on the business term are captured. It is common for business terms to go through multiple iterations including steering committees to review and sign-off on the definition.

With such a prolonged and manual intensive effort to build a business glossary, it can quickly become stale with outdated information: as additional copies of the data are created, the physical instantiations of the business terms are not all captured in a manually curated business glossary. Nor are all the relevant business data quality rules. With data privacy and regulations coming front and center impacting the business priorities, a timely and comprehensive data governance program is more critical than ever.

Why Business Glossaries Aren’t More Common

Even companies that have not formalized a data governance program under a Chief Data Officer will have a semblance of a business glossary that is centralized amongst the organization or a list of managed data elements collected from databases. Traditionally in financial services companies, technology groups have provided evidence of a business glossary by utilizing their data model documentation which may contain the column name, definition and data type. This is often the rudimentary start of a business glossary.

Why is building and maintaining business glossary so difficult to achieve? If you ask any S&P 500 company, you’ll hear some common responses:

First, the knowledge to create a business glossary resides in the minds of many individuals in the business and among IT professionals. Secondly, it has historically been a manual effort to collect, curate and maintain the information. Even after an organization attempts to create an impressive 500 data element list with complete definitions, authoritative sources, data quality rules and sample data, the metadata becomes stale. Or worse, the business glossary has become obsolete upon completion since no one in the business will actually utilize this information for compliance to any new regulations.

10 Steps for Manually Building out a Business Glossary

Since a business glossary is required as evidence by regulators for compliance to a data governance program, the need for a business glossary will never go away. Let’s break down the process for collecting the minimum attributes that are involved in the business glossary for one business term – in order to determine where process efficiencies can help improve it:

Step 1: The Enterprise Data Office contacts a Data Steward to define a business term. For example, “Market Value”

Step 2: The Data Steward defines the term that is relevant for his line of business and context of usage. The Steward spends a day writing the definition and then emails it back to the requestor.

Step 3: The Technology System Owner provides an instance where the data is stored along with the data type and the table and column name.

Step 4: The Enterprise Data Office reviews the terms and identifies that alternate definitions are possible depending on the line of business and system used.

Step 5: The Enterprise Data Office to reach out to additional Data Stewards in other lines of business and other Subject Matter Experts to provide a definition. Wait a few days for an email response.

Step 6: The Enterprise Data Office follows-up with the requesters on a definition.

Step 7: Additional system owners may be contacted whose system may be storing the value, Market Value.

Step 8: The Enterprise Data Office realizes that there are derived data elements that go into the calculation for Market Value. Should these additional data elements be created in the business glossary as individual data elements? The Enterprise Data Office realizes that the full lineage of the business term must be captured so an additional 3 business terms need to be defined.

Step 9: A meeting with all the Data Stewards, Subject Matter Experts and Technology owners is arranged to discuss and review the results collected for one term, Market Value.

Step 10: Rinse and repeat for the next business term.

Once the business term is defined and approved to be included in the business glossary, business stakeholders and data consumers need to ensure that the term is used consistently throughout the organization in data models, reports, dashboards, and new applications. The Enterprise Data Organization then needs to ensure that – as part of the firm’s data strategy – the business terms are utilized in line with policies and regulations, especially with new privacy mandates.

How to Use Machine Learning to Populate a Business Glossary

The biggest challenges to a manually curated business glossary include:

  1. identifying the full scope of business terms that may be correlated to the initial set
  2. the number of technology owners impacted by these business terms who are involved in the review process, and
  3. leveraging classification techniques to quickly identify new instantiations of the same or similar data.

New approaches that leverage automated discovery, metadata collection and analysis, and machine learning-driven classification hold the promise of minimizing manual steps and facilitating more seamless collaboration between business and technical stakeholders.

Through programmatic integration with machine learning-driven classification, data governance teams can align business terms in their glossary with data findings identified. This integration can help organizations to not only bridge business and technical perspectives but also leverage active metadata for a range of data governance activities. Get a demo to see how BigID’s discovery-in-depth technology can help populate a business glossary, minimize manual steps (and potential manual error), and help build a better data governance program.