Smarter Classification for Data Attributes, Metadata and Files

By Dimitri Sirota , Chief Executive Officer

March 21, 2020

2 minute read

BigID uses multiple methods to find and categorize data. This includes traditional pattern-based classification for locating data of a specific form or type. Traditional classification technology as found in legacy DLP or data access governance security tools rely on Regular Expression primarily to find exact matches in strings of data like a credit card or national ID. BigID modernizes these approaches with new smart validation while also marrying them with newer ML and AI approaches for increasing accuracy and broadening their scope to metadata and documents.

BigID provides enterprises an expandable library of predefined classifications with smart rules and validation checks to remove false positives. BigID couples this classification method with fuzzier patterns using ML for accurately separating similar structured data entity names and attributes. Moreover, BigID’s smart entity classifiers can operate not just across unstructured files and structured data bases but also across anything BigID can connect to which is almost everything. This includes noSQL, Big Data, SaaS, IaaS, Mainframe, Data pipelines and streams and more.

But, BigID doesn’t stop there: seeing as how data governance organizations are increasingly adopting metadata management tools to govern their information lifecycle, BigID has also brought it’s smart classification for the first time to metadata management. Not only can BigID help organizations re-classify miscategorized metadata, but now organizations can also simplify the tedious and error prone process of mapping their physical data to their logical data definitions. Using BigID, existing registries can more easily be mapped to their actual data attributes. Moreover, where no logical data definitions or registry exist, BigID can help recommend definitions from the classified metadata and data attributes.

However, data attributes, entities and metadata are not the only format of data organizations want to classify when categorizing their information. In recent years, companies have faced an explosion in unstructured documents and form creation and storage. In many companies these documents or files represent many petabytes of information stored across legacy NAS systems like NetApp or EMC or else modern file stores like Box, O365, GDrive, S3, Salesforce, Sharepoint and more. BigID has therefore introduced deep learning based file classification to help organizations categorize and label their vast quantity of unstructured data. BigID provides many pre-trained document classifications out-of-the-box, but also gives organizations the option to train BigID’s document AI off their own data.

Dimitri Sirota

Chief Executive Officer

Author

Dimitri Sirota