Data protection tools function with virtually no context of whose data they presumably protect. Discovery and protection of sensitive, confidential and regulated data has been at the top of almost every CISO’s to do list for the past 10 years. In response, the IT security industry has developed a range of solutions to help discover and classify information so we could better protect it. However, these solutions contended with a persistent problem: data protection tools function with virtually no context of whose data they presumably protect.

The stakes for keeping personal and private data protected are getting higher: not only are disclosures about breaches almost a daily occurrence, but there is a growing list of data privacy rules and regulations that emphasize the obligation for enterprises to secure the data they hold about customers and users. Other rules demand proper controls from businesses and set of breach response and notification processes for users and customers impacted by the exposure of their private data.
With privacy concerns now paramount for both consumers and enterprises now compelled to meet new compliance requirements, the question many are now wrestling with is: How can I protect the privacy of my customers if I don’t know anything about the data I’m protecting?

The environment has evolved to take on more of a privacy focus, but the data protection tools have yet to adjust to this change. IT Security solutions have focused on ways to build walls around the information, and ensure that it’s encrypted unless access is authorized. Firewalls at the periphery, agents on the server, gateways in front of the server, IPS, AV, DLP, IPS, SIEM, EDR, UEBA, PIM, IAM, SSO — the list goes on. Still, a fundamental question remains unanswered: whose data is it, and what is the relevance of the data — is it a SSN of a customer? Of an employee? A random person? A child? A foreign resident?
There is one key element that is missing in data protection today — knowing the Identity of the data — or who the ‘data subject’ is, as privacy professionals and regulations refer to it. Why does knowing identity of the data help? Here are five quick reasons:

1. Improved Accuracy

som_pns3kwu-sweet-ice-cream-photography

Most data protection tools today rely on regular expressions to identify personal and sensitive information — like a SSN or Credit Card number. The problem is that what looks like a SSN isn’t always a SSN. It might be a phone number with 9 digits and hyphens. However, If you know that a SSN belongs to a customer or an employee of yours, you know you are looking at a real SSN.

2. Identity Context Defines Sensitivity

iocn-gwfweu-anton-repponen

Personally identifiable information (PII) is considered sensitive only in the context of a specific user. Personal information like gender, religion, age, running routes, shopping preferences, residential address … all of these are sensitive only if they can be linked to a specific individual. Current data protection tools can’t analyze whether a file or a record containing information about the gender, height and weight of an individual are attributes of a specific customer. There isn’t a regular expression or machine learning algorithm that can tell you that without knowing whose data it is. You need the identity of the data for that.

3. Breach Response Plan

iramdoh78ne-austin-neill

Your working assumption must be that you are going to be breached. That’s why 25 states in the US, and the impending GDPR demand that you have a breach response plan in place, and that you are able to notify your customers in case of a breach within 72 hours. Unless you want to notify all of your customers when you uncover a breach of a data center or database, you need to know which customer data is stored where. Also, if you get hold of a data dump of hacked personal records and want to know which of your customers were impacted so you can notify them or at least prompt them to reset their passwords, you need to have the ability to correlate that data dump to your customers’ data. This required again, to know the identity of the data.

4. Honoring Data Subject Rights

nkeh30uoe8o-joshua-earle
While the term ‘data subject rights’ is typically associated with the EU General Data Protection Regulation (GDPR), the principle of consumer ownership of their data is broadly embedded in US through various existing regulations such as HIPAA, and increasingly emphasized in new rules issued by the FTC and FCC. Data subject rights refers to a set of rights individuals have with respect to their personal data and limitations of how it is collected and processed by service providers, government agencies, and other bodies. This includes the right to know what data is stored about you, the right to access the data as well as the right to erase or modify the data. Not being able to satisfy and honor these rights can translate into hefty penalties — up to 4% of global revenues under the GDPR. In order for an organization to satisfy these rights, one must know where every individual’s data resides. Tackling data subject rights requires knowing the identity of the data.

5. Determine Data Residency and Sovereignty

kgsapvfg8kw-kalen-emsley

The applicability of many data protection regulations is contingent on the residency of the individuals. Most notably, the applicability of GDPR depends on the residency of the individual in the EU, but now also Russia, China, and others. This means that if you are processing the data of residents of these countries, you are subject to their data protection regulations. How can you know which data systems hold which resident data? Again, you need to know the identity of the data.

 

Existing data protection tools need to adapt to these new requirements if they are going to be able to address privacy requirements. In practice, this requires to build a data inventory of your customers’ personal data that points to where you store the personal data of each individual. The good new is that having this type of inventory can not only improve privacy and security, but can actually help you monetize on them.

Existing data protection tools need to adapt to these new requirements if they are going to be able to address privacy requirements. In practice, this requires to build a data inventory of your customers’ personal data that points to where you store the personal data of each individual. The good new is that having this type of inventory can not only improve privacy and security, but can actually help you monetize on them.