DSARs: The Essential Privacy Problem
DSARs or data subject access requests are in some ways the most elemental requirement to emerge from the new wave of privacy regulations like GDPR and the California Consumer Privacy Act (CCPA). DSARs herald that individuals have a legal right to their data even after sharing it with an organization. This requirement, often labeled as a personal data right, grants individuals a fundamental right to access or delete the data if they should so wish. Companies effectively become stewards of personal information. Consumers, employees, and contractors always retain title to their data.
For business, this upending of who owns personal data poses some grave challenges. Companies collect personal data in many places across the data center and cloud. Data is collected in all manner of data stores and then processed in all manner of application. And while organizations have developed technology for enumerating their data stores and applications they have nothing similar for the data residing inside the stores and applications. What’s worse, with DSARs finding PII in the vast stores of data is not enough. To provide data accountability to people, it’s essential to first account for all the data belonging to each person. This means being able to look across all manner of data stores and applications and identify what data is personal, to whom it belongs and whether consent exists. Moreover, it requires an organization has not only an ability to manage the requests but also fulfill those requests, and do so at scale.
But to make matters worse for companies eager to comply, the risk of failure is not just the possible fine from a regulator. Unlike other parts of privacy regulations like GDPR where compliance is determined by designated regulators, for personal data rights, it is the consumer or employee who ultimately decides compliance. So in the case of GDPR, the motivation for accurate and effective DSARs is not the 28 state regulators, it’s the 500M residents of the EU.
The Hard Truth About DSARs: Classification Is Not Enough
GDPR and similar regulations enshrine the idea of personal data rights via DSARs. For companies, this obligation to deliver on DSARs proves challenging, however. It requires an organization be able to locate all the data they keep on an individual everywhere. This is hard to do in practice because current technology cannot identify contextual personal information or PI, cannot automatically determine to whom it belongs, and cannot look everywhere an organization keeps personal data.
In the past companies were able to fall back on the idea of manually finding a person’s data when asked. The requests were infrequent, there was no standard for performing the task, and there were no penalties for inaccuracy. Hence a request would be routed to a person who would, in turn,, ask various application owners and data store owners to report back on what each system contained. This process has the obvious drawback that it is labor intensive, imprecise and reliant on search which would fail to find contextual personal information or PI. It obviously also lacked scale.
With the advent of GDPR however, organizations are starting to turn to technology to help automate the DSAR request and fulfillment process. Before purpose-built tools like BigID emerged, the default technology would be a data classification based security or eDiscovery tool. These products were designed to find keywords or PII in files, email, and databases relying databases relying on pattern magic using Regular Expression. While slow, they worked reasonably well for use cases in PCI or HIPAA where there was an exact search criterion, a limited volume of data to scan, and target systems that consisted of file systems, mail or relational data stores. However, they prove inadequate for privacy use cases like DSAR since they are unable to look everywhere, can’t identify general PI and most critically – have no way of correlating any PI data accurately back to an individual ie they are not identity aware.
A Big Idea from BigID: DSAR Automation at Scale
As more jurisdictions introduce privacy regulations that stipulate legal data rights for individuals, corporations will be facing the reality that they will need to build automation for managing DSAR activities from request through fulfillment. Automating DSARs at scale will require two types of innovation not previously afforded with older data classification tools. First, organizations will need to find and inventory what data they have on any individual. Secondly, once they have the raw data they will need to be able to operationalize the request and fulfillment activity to accommodate various request portals, configurable response types, work-flow for analysts, batch for larger volume scenarios and consent integration.
Now, finding and inventorying data on every individual an organization has records on is no easy task. There is the challenge of identifying what is personal, there is the need to look across unstructured, structured, Big Data, cloud etc, and there is the obvious requirement to be able to sort data by a person. To find all personal information is impossible with traditional classification that can only find pre-defined data classes. Finding PI vs PII requires an ability to determine if data is by, or about, someone because it is the context of that person. To also look across all data from cloud to application requires a new method of interrogating data stores that avoids classification style dependencies. It must also map or visualize the data without copying or duplicating the data. PI is highly sensitive. The last thing any organizations wants to do is to centralize their most sensitive data creating a giant honeypot for bad guys. Lastly, the new technology will need to be able to automatically correlate data back to a person. While that is pretty easy for uniquely identifiable data like a credit card, that is very hard for semi identifiable data like a birthday, GPS coordinate, IP address, cookie or shopping preferences to give some examples. Moreover, the technology will need to be able to both resolve identities to ensure any requestor gets all their data accurately, but also disambiguate similar identities to ensure persons with the same name don’t get confused with one another.
As it happens, BigID was designed for this very purpose. As the first and arguably only identity-centric data discovery tool, BigID leverages the latest in ML and scale-out technology to specifically find PI, look across any data and then inventory by a person without centralizing the data. Entity resolution and the ability to disambiguate people is built in.
But BigID goes beyond just discovery to fully operationalize DSAR creation and reporting. This includes tools to integrate with company data requesting portals as well as 3rd party systems from vendors like ServiceNow and OneTrust. BigID provides just-in-time data fetching and report formatting. It provides an ability to include reporting on consent using BigID’s consent governance capabilities. It provides workflow so analysts can review and approve reports. It provides tools to authenticate a requester using their own data. It also provides bulk management capabilities to perform hundreds of DSAR requests simultaneously. BigID even allows administrators to confirm data deletion following an erasure request.
Getting Privacy Right
New privacy regulations are popping up around the globe and the US at a frenetic pace. While not all prescribe identical requirements for organizations, they all share some common foundations: companies need to know what data they have on individuals and if they are using that data in legitimate, approved ways. For individuals, these new privacy regulations will most strongly manifest themselves as a set of new personal data rights to things like access, port, correct or delete one’s data. For companies complying with these new personal data rights will also represent the greatest challenges since it requires them to know their own data to a degree of detail never before needed.
BigID’s technology was purpose-built to help organizations find personal data by identity across the big petabyte scale of information volumes organizations keep across their data stores and applications. That’s why BigID is called BigID. But, finding and understanding data in a privacy-centric way is not sufficient to help organizations meet the emerging requirements of modern DSARs. Companies will also need all the operational capabilities to report or act on the data back to an individual. That’s why BigID has developed the most comprehensive solution for automating DSARs in the market. You can’t protect what you can’t find, which is BigID helps organization rethink how they find and protect their customer’s information.