Where’s Waldo: Finding Identity in Big Data
Big Data creates big problems for finding identity information. As organizations shift to predominantly online interactions with their customers, they are collecting petabytes of data about individuals at evermore breakneck pace. Organizations want to quickly respond and even anticipate their customer’s’ needs. That means increasingly surrounding their customers with digital services across Web, mobile, home, auto, wearable and AI channels. Personal data of all sorts ranging from highly identifiable to preferences and geolocation is collected across application touchpoints creating personal data sprawl and spread that has proved impossible to track or trace.
And that’s the rub, data unaccounted is effectively unknown. Unknown data is not however invisible; it’s just vulnerable. Knowing a customer today requires a company know their customer’s data. Preserving that customer’s loyalty requires a company protect that customer’s data. But you can’t protect what you don’t know about and so now more than ever organizations must know their customer data. But finding specific customer data in big data can feel like trying to find Waldo in a sea of Waldo’s, lots of similar looking stuff with no ability to figure out who’s who and what’s what.
Hard Doesn’t Mean Impossible, and GDPR Doesn’t Mean Voluntary
There was a time in the not too distant past when knowing your customer data seemed like more burden than benefit to many companies. Data accounting meant accountability and shining a bright light on something too sensitive could reveal surprises – surprises that all of a sudden become the liability of the company. But times have changed, breaches are now a daily occurrence raising pressure for companies to do something to make their customer data less vulnerable. Moreover as businesses compete online he or she who knows the customer best will win the customer. If knowledge is power, data knowledge is rocket fuel.
But even if revenue and security are still not motivation enough for a company to know their customer data, increasingly organizations are waking up to the reality that data knowledge is the law. Around the world new privacy regulations require organizations to know what data they have on an individual and the penalties for not knowing are steep. No where is this perhaps better exampled than Europe where the right to privacy is increasingly viewed as a constitutional right and right to privacy means a right to one’s data.
With the introduction of the EU General Data Protection Regulation (GDPR), organizations are legally obligated to provide or delete their customer’s data upon that customer’s request. Penalties for not doing so can reach as high 4% of global revenue across the EU or even 10% in select countries. GDPR enshrines the concept that companies are only custodians of consumer or employee data. The data remains the property of the citizen. Failure to live up to that standard may hobble a company. Privacy protection may not be enough carrot for all organizations to find and inventory their data but regulations like GDPR will certainly provide a motivating stick for those unconvinced.
Have You Heard The One About the Needle and the Haystack
Knowing your identity data is good for business. it’s good for security. It’s good for privacy. Increasingly it’s also the law in a growing number of countries around the world. But finding identity data in big data is already hard, finding an identity in big data is even harder.
Finding certain types of PII (personal identifiable information) is not a revolutionary ask. Enterprises have been going about the business of finding certain kinds of PII in their data for years for reasons ranging from marketing to security. Let’s take the example of a national ID like social security number. This data is at once highly identifiable and highly sensitive. It also happens to be protected in many industries by regulation. For that reason many organizations have already embarked on an effort to find and catalogue this data. But the tools to do so leave much to be desired. They can find nine digit numbers in databases but can’t span all data sources. Usually, they can’t distinguish between between similar looking numbers. They can’t provide any visibility around usage. And perhaps most importantly can’t figure out who the data belongs to.
Of course knowing your customer data is more than just knowing their social security number. It’s knowing their name, their address, preferences, documents, geolocation, their IP address – everything about them or by them. This is a harder ask. It requires an ability to find all kinds of data – sometime without advance knowledge – in all kinds of places. It’s like finding multiple needles in multiple haystacks. But even then that may not be enough for to satisfy regulations like GDPR is also necessary to know which needles come from what pack. It’s about finding the hard to find and then organizing it by person or “data subject”.
Finding identity in Big Data requires an ability to find identity information and then figuring out what identity information belongs to what identity. But that’s not the end. To really know your customer’s data, you need to know what it is, where it is, who it belongs to, where is it going and where has it been. If knowledge is power then why settle for cola and baking soda when you can have metallic hydrogen?
New tools like BigID upend traditional approaches to finding, inventorying and mapping personal data. They dispense with structured searches based on archaic regular expressions. They rely on identity context to find and sort identity data at scale. BigID builds on big data, machine learning and identity correlation to figure out what’s what and who’s who. Its aim is not to find one social security number but to map all social security numbers and related identity information into an identity graph. It’s about building an Atlas: maps that show locations, maps that show access, maps that show residency, maps that show data flows.
Knowing your customer begins with knowing their data. But finding Waldo in your data requires a map. Tools like BigID help organizations build a data atlas for better customer service, better customer security and better customer privacy.