BigIDeas

BigIDeas in Data Privacy, Protection and Perspective

Rethinking Privacy Protection for GDPR and CCPA

When BigID formed in 2016, privacy was a matter of policy, process, and people – but not product. Specialist CPOs and DPOs assessed privacy risk through data maps and inventories built from surveys, not scans. Privacy, while intended to help protect the integrity of personal data, was in many ways disconnected from the data. Instead, it relied on data recollections more than data records for assuring data privacy.

However, with the advent of new regulations like GDPR and CCPA, individual data rights took center stage. Companies would be required to account for every individual’s data to deliver individual data accountability. This required a rethink of how companies approached data privacy and protection. It would require data knowledge built on identity-aware data discovery. Data rights needed actionable data intelligence beyond what was possible with either legacy data classification or catalog solutions. Traditional data security or governance products lacked the data detail, context, and coverage to identify PI vs just PII, identify whose data vs just what data, and provide a mechanism to look across all of an enterprise’s data estate vs just a silo or two.

BigID completely rethought data discovery and intelligence for the privacy era. BigID was the first company to deliver enterprises the technology to know their data to the level of detail, context and coverage they would need to meet core data privacy protection requirements. Fast forward to today and BigID remains the pioneer and market leader in privacy-aware data discovery and intelligence, building on its many firsts to help organizations truly “know their data” to deliver sustainable privacy compliance, actionable data protection, and ultimately situational data value.

With BigID, data liability is transformed into a data asset.

You Can’t Protect What You Can’t Find

Historically, data discovery was always regarded as an afterthought in data protection. Priority was placed on transformation on interdiction. Discovery was the tail, not the dog. New privacy regulations like CCPA and GDPR, however, place new emphasis on knowing your data.

Previously, privacy regulations required that organizations account for the data they collect, process and share on individuals. CCPA and GDPR placed individual data rights at their center forcing companies to know not just what data they have, but whose data they have.

As a result, BigID pioneered the technology to allow companies to account for the data they collect, process and share on individuals. BigID remains the only company that can identify contextual PI and not just PII, correlate data back to identity, and also look across any data source or pipeline in the data center or cloud.

The result is the first and only solution that can effectively deliver on the privacy promise of individual data rights.

By identifying sensitive and personal data, BigID offers organizations and the individuals they serve another critical benefit: data protection. It’s a truism that you can’t protect what you can’t find.

Undiscovered data is not invisible; it’s just dark data – and vulnerable. Traditional approaches like classification or cataloging alone lack the detail, context or data source coverage to identify all sensitive data, everywhere.

BigID rethinks data discovery for privacy – and in so doing remade how organizations go about protecting their most important assets: their customer and client data.

A Single Source Of Data Truth for the CPO, CSO, and CDO

Traditionally, different data stakeholders had to contend with different and incompatible views of their data assets.

  • Privacy professionals like CPOs and DPOs relied on interviews and surveys to build inventories of personal data.
  • Security professionals used pattern based classification technologies inside three letter products like DLP, DRM and DAM designed in the mid 2000’s to find sensitive data in either file folders, mail or SQL databases.
  • Data governance professionals depended on metadata catalogs that ingested column names from databases, data lakes and relational data warehouses to help map what type of data resided in what tables

Besides the fact that none of the approaches were compatible with one-another, each also presents arguably insurmountable problems toward achieving the goal of providing authoritative data truth and trust in data.

  • Privacy-based data surveys (as opposed to scans) rely on data recollections instead of data records making them imprecise and error prone by definition.
  • Pattern-based data classification technologies can’t disambiguate similar looking data, can’t map data to an owner and lack data coverage in terms of modern data sources.
  • Governance based metadata catalogs, only provide a narrow lens into a modern data landscape, and can only surface what a developer wrote in a column header without validation against the column content.

BigID transforms how organizations see and understand their data, providing the first-of-its-kind Discovery-in-Depth technology to look at data four ways in order to provide the data content and content necessary for privacy, security and governance.

With BigID, organizations can create a single source of data truth, without compromising the views necessary for a CPO, CSO or CDO to conduct their business.

Unmatched Data Coverage for the Modern Data Landscape

Enterprises in 2020 look nothing like enterprises in 2002 – so why should their data discovery technologies look the same?

In 2002, an enterprise data landscape consisted of a mainframe, SQL-server, file-share and email. Fast-forward to 2020, and enterprise data landscapes include all sorts of systems from data warehouses, to data pipelines, to data lakes, to noSQL, to cloud and more.

Traditional data classification or catalog products only provide a small window into a modern data landscape since they only support a small number of different types of data stores – but in the last few years, privacy regulations require organizations to find sensitive and personal data everywhere.

BigID therefore broke down the silos and remains the first and only data discovery technology to look across any data source or pipeline, in the data center or cloud. With BigID, organizations can find sensitive, privileged and personal data anywhere inside a modern enterprise.

Finding PI and Not Just PII

Privacy demands redefining what data qualifies as personal.

Historically, regulations dealing with personal data like HIPAA, PCI and Breach Response defined personal data by specific types of PII (Personal Identifiable Information). PII was exact and uniquely identifiable. However, regulations like GDPR and CCPA broadened the definition of what is personal to include data that is not just uniquely personal, but is personal because it is in the context of a person.

For instance, a written date in and of itself is not personal. However, when it’s a birthday – it is.

Similarly, a geolocation is not explicitly personal. It is only personal if it can be associated with a person’s Web or mobile session. Examples of data that may be considered personal abound: session keys, IP addresses, cookies, passwords, click streams, gender and more can be characterized as personal when it is by or about a person.

Legacy data discovery technologies whether classification or catalog based were not made for identifying PI. Being pattern-based, they could on occasion discern what the data was but not whether it belonged to a person. That requires an ability to understand both content and context and the ability to trace a piece of data’s connection to a person.

One of BigID’s bigger ideas is that identity matters in data discovery.

BigID therefore remains the only vendor purpose-built from the ground-up to be able to identify what data is personal, even if only contextually personal. This is essential for meeting data right requirements in GDPR and CCPA – and it’s also essential for locating other kinds of sensitive data that are sensitive because of their relationship to other data.

Privacy-aware Correlation, In Addition to Classification and Cataloging

Software-based data discovery typically falls into one of two camps depending on whether the tool is aimed at security or data governance professionals. Security professionals typically avail themselves of classification-centric products that emerged post PCI to find specific types of sensitive data like credit card numbers or national IDs. Data governance professionals, conversely, are most familiar with metadata catalogs that surface technical metadata in structured data sources that can provide a cursory view of what kind of data resides where.

Neither approach is adequate for privacy. Both lack the ability to identify contextually personal information (PI); they lack the data coverage to give a broad-based view of what data resides where, and perhaps most importantly – they have no identity context and so cannot map or correlate data back to a person.

Privacy is about people.

Without identity context, it’s impossible to identify what data belongs to what individual. Individual data rights are the primary purpose of privacy regulations like GDPR and CCPA. BigID therefore rethought data discovery for the privacy era and patented a first-of-its-kind approach to data discovery and intelligence that puts identity at the center. With BigID’s privacy-aware correlation based approach to discovery, organizations can both find “what” data and “whose” data.

Providing data accountability to consumers requires data accounting down to an identity level.

With BigID, organizations no longer have to choose. They get market leading classification and catalog capabilities, in combination with correlation so that they can get the privacy-centric detail required for meeting data right regulations – all without compromising the views and insight that security and data governance professionals need.

One source of data truth across three disciplines.

Knowing Your Data: A Journey From Data Discovery to Data Intelligence

Know your customer, know their data!

Data knowledge is the critical ingredient for solving critical privacy, security and data governance use cases. You can’t accurately trust data compliance without an ability to trust the underlying data. Privacy requires a rethink of how organizations find and understand their data. It requires a level of content and context about not just “what” data a company collected but also “whose” data was collected.

BigID created the first product purpose-built platform for the kind of data discovery required for privacy. In so doing, we also fashioned the first platform that was able to capture context around data like to whom it belonged, or whether there existed an associated permission, or who had access to that data to give some examples.

Some of these were essential for privacy. But they also play a larger role in providing deeper insights into the what, where, who, why, and when of how data was, collected, processed and shared.

As the BigID platform has evolved to encompass the best of cataloging, classification and correlation, the platform has delved further beyond discovery alone.

Today, BigID is the most comprehensive platform in the market to provide organizations insight and intelligence on their most important assets: the data they collect and process on their customers and clients.

Finding and Protecting Crown Jewels

Part of BigID’s original innovation was the ability to identify contextual personal information (PI) and to correlate it to a person no matter the data source. As BigID expanded the variety and sophistication of its data discovery and intelligence capability, it became evident that a larger opportunity existed to protect any sensitive and privileged data.

Personal data has increasingly become one of the highest liability types of data a company can collect and process. When it comes to value or sensitivity, however, personal data is not alone.

Intellectual property, account details, health data, credentials, transaction histories can all be inherently both valuable and high risk. That’s why BigID has introduced more ways for companies to define what is important and privileged to them along with moreways to identify that data across unstructured, structured, noSQL, cloud, mainframe and more.

With BigID, companies can define what is high risk and high value in more kinds of ways, while providing more ways to identify and action it.

Data At Rest and Data In Motion

If data is the new oil, it’s not enough to know what data you have in storage tankers, you also need to be able to account for what data you have in pipelines.

BigID’s promise to its customers is to give them insight into their sensitive data where it resides or flows. That means an ability to provide unmatched data coverage across all manner of data at rest from databases, to file shares, to data warehouses, to data lakes, to data clouds and more. It also means providing coverage for all kinds of streaming platforms like Kafka, Kinesis, FTP, APIs and more. Data is everywhere; therefore BigID has to be anywhere.

Support for Images, OCR and Biometrics

Text is terrific but it represents only one fragment of how personal and sensitive data is stored inside an organization. Increasingly companies gather all kinds of other sensitive and personal data that is encoded inside an image or picture or some form of biometric.

Whether a passport or headshot used to verify an identity, it’s essential that organizations trying to find and inventory their personal and sensitive data can also identify relevant image and biometric data for privacy, security and data governance purposes. BigID is the first privacy-aware data discovery and intelligence platform to go beyond text.

Unstructured Discovery Unbounded

One of the hardest data formats to process for personal and sensitive data is unstructured data formats like files and emails. Data is not well situated or organized in specific tables and places – making locating and identifying what’s sensitive harder. Examples of unstructured data stores can include SMB, NFS and CIFS file servers, or Box, Google Drive and O365 cloud file stores. It can also include IaaS object stores like AWS S3 or GCP Cloud Storage or Azure Blob. It could mean mail and chat like MS Exchange, Google Gmail, MS Outlook and Teams or Slack. It could also include all the nooks and crannies inside SAP or Salesforce where organizations can store communications, files or media. Unstructured data resides in countless of places and for companies aiming to find personal or crown jewel data in those places, the options were few.

Before BigID, companies wanting to scan unstructured data to find privacy or security sensitive data in unstructured had to avail themselves of tools that were limited old-school pattern matching approaches for specific silos of unstructured data stores. The tools lacked scale or scope to look beyond basic files and email. The technologies were stuck in 2006 when many of them were first introduced, whether DLP, Data Access Governance or e-Discovery. BigID completely rethinks how companies scan unstructured data at scale.

With BigID’s microservice architecture, additional scanners can be spun up dynamically to add lateral scale to processing data. Machine learning (ML) is used to pre-process dense documents to speed analysis and processing. Classification is married with Correlation, Cataloging and Cluster Analysis to better analyze and organize data. Supported data sources are increased with new support for IaaS, Big Data, SaaS, ERP and more. Supported documents formats are broadened from just PDF, Office and Google productivity to media, Parquet, Zip, Orc and more. With BigID, finding sensitive privacy and security data in unstructured data is completely rethought for 2020 and beyond.

Smarter Classification for Data Attributes, Metadata and Files

BigID uses multiple methods to find and categorize data. This includes traditional pattern-based classification for locating data of a specific form or type. Traditional classification technology as found in legacy DLP or data access governance security tools rely on Regular Expression primarily to find exact matches in strings of data like a credit card or national ID. BigID modernizes these approaches with new smart validation while also marrying them with newer ML and AI approaches for increasing accuracy and broadening their scope to metadata and documents.

BigID provides enterprises an expandable library of predefined classifications with smart rules and validation checks to remove false positives. BigID couples this classification method with fuzzier patterns using ML for accurately separating similar structured data entity names and attributes. Moreover, BigID’s smart entity classifiers can operate not just across unstructured files and structured data bases but also across anything BigID can connect to which is almost everything. This includes noSQL, Big Data, SaaS, IaaS, Mainframe, Data pipelines and streams and more.

But, BigID doesn’t stop there: seeing as how data governance organizations are increasingly adopting metadata management tools to govern their information lifecycle, BigID has also brought it’s smart classification for the first time to metadata management. Not only can BigID help organizations re-classify miscategorized metadata, but now organizations can also simplify the tedious and error prone process of mapping their physical data to their logical data definitions. Using BigID, existing registries can more easily be mapped to their actual data attributes. Moreover, where no logical data definitions or registry exist, BigID can help recommend definitions from the classified metadata and data attributes.

However, data attributes, entities and metadata are not the only format of data organizations want to classify when categorizing their information. In recent years, companies have faced an explosion in unstructured documents and form creation and storage. In many companies these documents or files represent many petabytes of information stored across legacy NAS systems like NetApp or EMC or else modern file stores like Box, O365, GDrive, S3, Salesforce, Sharepoint and more. BigID has therefore introduced deep learning based file classification to help organizations categorize and label their vast quantity of unstructured data. BigID provides many pre-trained document classifications out-of-the-box, but also gives organizations the option to train BigID’s document AI off their own data.

End of content.

There are no more items to load.