
Ethical Baseline
Built on a foundation of strong ethical principles, with the guiding principle that decisions made while designing AI features align with fairness, respect and integrity.
Built on a foundation of strong ethical principles, with the guiding principle that decisions made while designing AI features align with fairness, respect and integrity.
Maintaining the security and privacy of customer data is prioritized from ideation through implementation of all AI features. AI features are designed to protect customer data throughout the data lifecycle.
BigID is committed to being transparent about how the AI features work, so customers understand how their data is used.
BigID’s AI features are designed to put control in the hands of our customers, giving you tools to make informed decisions about how your data is used.
The optional AI components within the BigID platform serve several supporting functions, such as improving data insights, accelerating scans and enhancing product usability. The platform operates under the principle of “Know Your Data,” and AI features further enhance data understanding by delivering more precise, actionable insights. Where customers elect to use these AI features, AI can help users gain a deeper understanding of their data more quickly, making data governance and compliance easier.
All AI features in BigID are disabled by default to facilitate user control. Detailed documentation is available to guide customers through the process of enabling and configuring AI features according to each customer’s preferences. Customers are able to assess BigID’s AI features before use and deploy these features only in accordance with their organization’s AI governance policies.
Machine Learning is a type of artificial intelligence that enables systems to find patterns and make predictions by learning from data without explicit programming. ML capabilities in the BigID platform help improve the efficiency and precision of data scans in several ways, as described in more detail in this section.
BigID develops its ML models in a lab environment using publicly available and synthetic data. Two of the BigID platform’s ML features, Hyperscan and Classifiers, can be further fine-tuned to the customer’s environment using customer metadata. These models are only fine-tuned using customer metadata if the customer specifically elects to use their metadata for this purpose. Hyperscan and Classifiers models trained on customer metadata are always stored and run locally, meaning that they are limited to the individual customer’s environment and are available only to that customer. Fine-tuned models are not shared across customer environments or used by BigID on behalf of any other customers. Customers can use all other AI features, even if they decide not to use their metadata to fine-tune the Hyperscan and Classifiers models to their environment.
Document and file clustering is an unsupervised machine learning algorithm that groups similar files based on their content. This feature helps BigID users organize and manage their documents more efficiently. By analyzing textual document contents, BigID can group files like contracts, NDAs and invoices into separate clusters without needing to know the number of clusters in advance.
Predictive Discovery, or HyperScan, is an ML model designed to reduce the time required for scanning unstructured data sources by predicting the presence of sensitive information based on metadata (e.g., file path, owner, file extension). BigID offers customers the option to opt-in to having the model learn from their metadata collected during data scans in order to inform predictions. By using metadata, this model speeds up the scanning process and allows users to more quickly identify files with sensitive information.
ML-Enhanced Classifiers in BigID are designed to reduce false positives in RegEx-based data classification. By analyzing metadata from true and false positives, the model learns to adjust the classification results and reduce errors. This model improves the accuracy of classification, currently applicable only to structured data sources, thereby enhancing the precision of data discovery.
NER is a Natural Language Processing (NLP) task that identifies named entities (e.g., people, locations) in unstructured data (usually in documents or free text columns). BigID uses NER to classify personal information by analyzing unstructured data sources. The NER models are developed using deep learning and run locally within each customer’s unique BigID scanners for enhanced efficiency and security.
Column/Dataset Clustering is an unsupervised algorithm that groups similar columns based on data patterns. For example, columns containing phone numbers are clustered together. By comparing column vectors using cosine similarity, BigID can more efficiently manage and analyze large datasets. This feature also helps detect near-duplicate datasets and suggest higher-quality data for analysis.
BigID develops its Large Language Model (LLM)-based AI features using pre-trained models and by following strict security procedures and Privacy by Design principles. BigID does not train its own LLMs or share customer data with third party providers for LLM development or training. BigID’s GenAI features employ appropriate security measures, including private networks and private endpoints. BigChat also uses Limited Life Memory servers, which do not retain any transmitted prompts or responses. BigID utilizes Azure OpenAI GPT to power BigChat and the Business Asset Mapping feature. BigChat and the Business Asset Mapping feature are only usable over encrypted channels, and connections to Azure are managed via VPN so that traffic does not traverse untrusted networks.
Optional for customers and operate on an opt-in only basis
BigID has introduced a Q&A bot named BigChat, based on GenAI technology. BigChat’s function is limited to assisting users in navigating and troubleshooting the BigID platform. BigChat only interacts with BigID’s software documentation and product-related information, and it does not store or use any shared user information or customer data to train or fine-tune the model.
Optional for customers and operate on an opt-in only basis
This GenAI feature enables BigID to ingest a customer-provided business glossary and label table columns of connected data sources using the glossary terms. This is performed by analyzing contents of the table, nearby columns and other context cues to determine an appropriate label. This optional feature aims to reduce the manual effort and errors associated with traditional data stewardship and is limited to the parameters set by each customer using the business glossary they provide. Customers can edit the labels suggested by this feature.
BigID AI features are developed using publicly available and synthetic data or pre-trained models. BigID does not use any user information or customer data to train Generative AI features or the base Machine Learning models. Two of the BigID platform’s Machine Learning features, Hyperscan and Classifiers, can be fine-tuned to each customer’s environment using customer metadata. However, these models are only fine-tuned using customer metadata if the customer specifically elects to use their metadata for this purpose. In addition, fine-tuned models are always stored and run locally, meaning that they are limited to the individual customer’s environment and are available only to that customer. Fine-tuned models are never shared across customer environments or used by BigID on behalf of any other customers.
BigID has implemented tailored security controls and testing to mitigate risks associated with AI usage within our platform. BigID conducts AI assessments on all initiatives prior to deployment with a focus on critical risk mitigation. These assessments evaluate the privacy and security controls in place and review for algorithmic bias and discrimination, prioritizing compliance with all laws and regulations applicable to the provision of the BigID platform and alignment with applicable ethical standards. BigID also places control over the AI features in the hands of the user by disabling AI features by default.
BigID prioritizes compliance with all relevant laws and regulations applicable to the provision of the BigID platform in the jurisdictions where we operate, including applicable laws governing the development and provision of AI features. We understand the importance of meeting these standards to protect your data.
BigID strives to regularly update and improve its AI capabilities. The company pursues a continuous improvement strategy, which includes adding new optional features to enhance data insights and provide more powerful tools for our customers. BigID’s goal is to keep AI capabilities at the forefront of innovation while continuing to pursue high standards of data protection and security.