Cracking the Code of Unstructured Data: BigID’s Approach to Data Discovery
One of the hardest data formats to process for personal and sensitive data is unstructured data formats like files and emails. Data is not well situated or organized in specific tables and places – making locating and identifying what’s sensitive harder. Examples of unstructured data stores can include SMB, NFS and CIFS file servers, or Box, Google Drive and O365 cloud file stores. It can also include IaaS object stores like AWS S3 or GCP Cloud Storage or Azure Blob. It could mean mail and chat like MS Exchange, Google Gmail, MS Outlook and Teams or Slack. It could also include all the nooks and crannies inside SAP or Salesforce where organizations can store communications, files or media. Unstructured data resides in countless of places and for companies aiming to find personal or crown jewel data in those places, the options were few.
Before BigID, companies wanting to scan unstructured data to find privacy or security sensitive data in unstructured had to avail themselves of tools that were limited old-school pattern matching approaches for specific silos of unstructured data stores. The tools lacked scale or scope to look beyond basic files and email. The technologies were stuck in 2006 when many of them were first introduced, whether DLP, Data Access Governance or e-Discovery. BigID completely rethinks how companies scan unstructured data at scale.
With BigID’s microservice architecture, additional scanners can be spun up dynamically to add lateral scale to processing data. Machine learning (ML) is used to pre-process dense documents to speed analysis and processing. Classification is married with Correlation, Cataloging and Cluster Analysis to better analyze and organize data. Supported data sources are increased with new support for IaaS, Big Data, SaaS, ERP and more. Supported documents formats are broadened from just PDF, Office and Google productivity to media, Parquet, Zip, Orc and more. With BigID, finding sensitive privacy and security data in unstructured data is completely rethought for 2020 and beyond.