BigID & Cloudera

Scan, map, and inventory sensitive data at scale in Cloudera

Find and Analyze Data Across Cloudera

BigID scans, maps, and inventories sensitive data at scale with unmatched support for Big Data and Hadoop. BigID leverages the native MapReduce compute capabilities of the stack—and then uniquely finds and analyzes data across its structured, unstructured, and semi-structured interfaces, including Hive, HDFS, and Hbase.

BigID can scan any of the three common interfaces using data cataloging, classification, or correlation. This allows organizations to know their data in detail for data governance, security, and privacy. Organizations have the option of leveraging native MapReduce to take advantage of local compute capabilities. Scans can be scheduled for preferred time windows or availability KPIs via APM integrations.

For organizations that only want to scan changes, BigID offers options including scanning Kafka / Confluent data pipelines to stream data to and from Hadoop / Cloudera. For Cloudera, BigID has certified up to their latest release.

Technical Benefits

  • Hive, HDFS, and HBase interface support
  • ML-based classification, cataloging, and correlation
  • Kafka stream scanning for data entering or leaving Hadoop / Cloudera
  • Optional MapReduce scan support to leverage native compute and locality
  • Support for Parquet and other Big Data structured file formats

Business Benefits

  • CCPA and GDPR privacy compliance for Big Data, including DSAR fulfillment
  • Simplified sensitive data and crown jewel discovery across a data lake
  • Detailed data profiling and de-duplication 
  • Data hygiene and minimization