Some of the new EU GDPR (General Data Protection Regulation) privacy requirements can seem daunting at first blush to a company that is subject to them. Unlike SOX for example which prescribes the use of controls for data access without detailing specifics, GDPR enshrines very specific rights to EU citizens around their data which places a new data accounting and accountability burden on enterprises governed by the new regulation. And with fines that can reach 4% of global revenue, companies are highly motivated to find ways to automate the new data subject access requirements.

The GDPR data subject rights include:

● An individual’s right to access their data

● An individual’s right to port their data to a new service provider

● An individual’s right to erasure to to-be-forgotten

● An individual’s right to notification within 72 hours in the event of a breach

For enterprises this creates a new set of responsibilities around EU resident data. Organizations that collect and process personal data must also be able to account for that data down to a discrete individual. This is a significant departure from compliance requirements of the past wherein organizations simply needed to document privacy policies and show general security controls. The EU GDPR emphasizes that as custodians of consumer and employee data, organizations must now be able to account for every individual’s data stored inside the company. The question for organizations therefore becomes how to do this at scale for all EU resident customers and employees.

A New Software Approach to Data Subject Access

While data subject rights are broadly outlined by GDPR provisions — including data access, rectification of inaccurate or incomplete data, blocking of data whose accuracy is contested, and erasure of data — how companies are to meet the the requirements is not specified.

Historically, those companies that volunteered to honor data subject access requests did so through manual processes — requests would be assigned to technical teams to discover data and then report back on location and usage. This manual process, however, does not lend itself to scaling, is haphazard rather than accurate, and misses a critical opportunity to better protect personal data based on data intelligence.

Traditional data discovery tools fall short in helping organizations understand anything beyond basic data classification. They typically require some level of manual coding to prescribe what data types to search for thus overlooking unknown types; they are optimized for structured data or unstructured but not both; they have no context awareness and so can’t differentiate between similar looking data elements; and perhaps most importantly they don’t reveal data ownership or data provenance. That means traditional data discovery tools can’t identify what data belongs to whom, while potentially missing large swaths of data that is unknown or dark.

Tools like BigID upend the usual data discovery process with a new big data software approach, purpose built for finding and inventorying personal data by data subject. Using data science, machine learning and identity context, a tool like BigID provides a simple scanner that can find, inventory and track personal data by data subject across the enterprise and cloud.

Data Science and Machine Learning Meets Data Privacy

Privacy automation software like BigID make data mapping automatic. BigID for instance scans on premise and cloud data sources, identifies personal information and then catalogues that personal information by data subject at scale. Where traditional data discovery tools require agent instrumentation, some level of custom coding in arcane programming techniques like Regular Expression, a tool like BigID uses unsupervised learning to help the software first understand known personal data and the associated relationship with a data subject. The system then uses this learning set to find and catalogue other personal data across company’s data stores.

The resulting index provides a map to a user’s data with additional metadata that helps organizations attest to data flows, retention and residency. Data Subject requests can be answered in seconds and easily assigned to an an analyst for additional processing whether access, portability or erasure. Lastly, in the event of a data breach an organization can compare a purloined data dump against their data map to determine if they were in fact breached and / or identify impacted users in minutes — well below the 72 hour threshold set down by GDPR.

by @dimitrisirota