As Bob Marley once observed: “If you know your history, then you would know where you coming from.” The same principle could be said to hold for the management of customer data. If you know where your data has been, who has accessed it and how it has moved from one application to another, you will be in a much better position to detect privacy violations and prove the integrity of privacy controls.

Keeping track of and reporting on data history, access patterns and flows is a data lineage challenge. Knowing who has touched what, when and how for your customer PII data is increasingly mandated under regulatory requirements. For large organizations that collect big quantities of personal data and need to satisfy compliance and policy demands for privacy, understanding what PII you have and the flows of that data is no longer a nice to have — it’s a need to have.

It’s not just where you are coming from — it’s where you are going

Data intelligence about data provenance and lineage has long been an important end for data governance professionals. Understanding PII data lineage however has a broader importance to compliance, risk, security and privacy professionals. Understanding the map of how PII moves around an organization can ensure early detection of breach vulnerabilities or privacy regulation violations. In this regard a map of data flows can be viewed as an essential complement to an identity map showing location and risk of personal data fragments. Together they provider critical privacy and data protection intelligence.

One example of where data lineage intelligence plays a growing role is in the problem of analyzing data in data lakes. Data lakes aggregate multiple big data sources into a unified data platform. In many ways data lakes are opaque to details on how data gets consumed and moved around. Tools that provide insight into access history and flows are therefore important for ensuring data doesn’t get used in ways that violate internal rules (like customer consent) or external regulations. Knowing history can forewarn future problems.


The Tao of privacy: to know your customer is to know their data

Tracking data lineage can prove invaluable in the area of privacy management. By tracking PII access and flows, companies can get better insight into the chain of custody for sensitive personal information and prove compliance. However PII is not limited to the fragments of identity data collected by web, mobile and IoT apps. It also encompesses the preferences, risks and history metadata that are unique to an individual.

Take consent preferences for example. Many new applications collect user consent around how they want their data to be used. Consent preferences need to be viewed as a guide to acceptable use and therefore flows of personal data. Consent should be used to regulate flows and alert privacy administrators to potential violations. If, for instance, only specific applications are authorized to access data because of consent constraints, a data system that informs on data flows, lineage or provenance can provide evidence. Alternatively, if access to a user’s data violates that user’s consent setting, an organization could get notified in real time.

But the potential to enhance privacy management through the marriage of data lineage and privacy metadata is not limited to consent information. The combination of data provenance information and privacy risk scores, for instance, can help isolate and circle data activities on especially sensitive or vulnerable data, preventing loss or misuse before it happens.

History is never a perfect predictor of the future. But, increasingly, organizations facing cyber and privacy risks to their data need more intelligence on the customer data they collect. They need to know its provenance, access history and flow; they need to know their customer data as well as they know their customer. The future is data driven. But if you don’t know where your data’s been, you may not like where it takes you.