What Is Dark Data?
In the most straightforward terms, dark data is data that organizations don’t know they have. It is part of the massive, complex, sprawling world of Big Data — and the biggest part, at that.
Think about all the data that organizations collect and process for a specific purpose. If they’re actively analyzing it, chances are they know about it. But then there’s the rest of the data that organizations collect and store — the data that doesn’t get used, processed, or analyzed; the data that lurks in the shadows and hides below the surface, gathering risk and sleeping on missed business opportunities; the unorganized, untapped, unprotected, and unknown data that organizations inevitably have, but just don’t know it.
That’s dark data. And there’s a lot of it — likely more than half of your organization’s total data, right now.
55% to over 80% of the data that a business stores [is] dark. Lurking in this dark data are risks unknown to the organization.
– Richard Bartley, Dennis Xiu, Anthony Carpino, Gartner Analysts (Gartner’s 2023 Planning Guide for Security)
Dark Data Challenges
Dark data often gets captured right alongside purpose-driven data — and therefore regularly contains sensitive, personal, regulated, vulnerable, or high-risk information that must be kept out of the wrong hands. The fact that this data remains unanalyzed creates both active and passive problems for companies — problems that can lead to substantial costs.
Actively, dark data increases security risk merely by existing in a company’s system, unnoticed, without having the proper safeguards around it — sometimes for a very long time. Since the data is unknown, it also goes without the necessary regulatory processes a company would normally put in place for compliance. And since unknown data is essentially ignored, malicious attackers consider it ripe for the picking.
Additionally, untapped data may contain valuable information that companies could leverage for insight if they only knew that it existed, what it contained, and how to locate and utilize it. Businesses might spend millions collecting or analyzing new data to derive insights from relevant information they already have — and could uncover and leverage with the right technology.
Types of Dark Data
Data that organizations hold breaks down into three categories:
- critical business data, the highly valuable information that is relevant to a business’s continuous growth and the meeting of goals
- redundant, obsolete, and trivial (ROT) data hiding in internal networks that, once discovered, can be marked for deletion or moved into remediation workflows
- dark data that companies don’t know they have, don’t use — and that poses constant risk
Unknown data can be anywhere, and while unstructured data makes up the lion’s share of dark data, it can reside in sources that are:
Untapped data may consist of forgotten data, metadata, expired time-sensitive data that is no longer relevant, and more. Some common examples include:
- emails and email attachments
- zip files that are downloaded and then forgotten
- former employee data, including project files and notes
- presentations and spreadsheets
- geolocation data
- log files and account information
- transaction histories
- customer call logs and records
- audio, video, image, and text files
- financial statements
Where Is Dark Data Generated?
Gartner calls dark data “the information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes.”
Therefore, unused data is often collected right along with data that gets utilized and processed. Any data, anywhere — stored across any type of data source, on-prem or in the cloud — can be dark. Of the average organization’s data, 15% is critical business data, 33% is ROT data, and 52% is dark — and dark data by its very hidden nature is vulnerable and subject to constant risk.
Dark Data Discovery and Classification
Dark data is one of the unparalleled risks within an organization’s daily operation. Powerful data discovery tools can automatically and accurately inventory, validate, and classify data across an organization’s entire environment. Despite its unstructured and semi-structured state, dark data can provide beneficial insights with the help of machine learning classification.
ML classification provides automated and accurate deep data insights—giving businesses valuable context for what their data is, where it’s stored, and how it’s being used. Deep data discovery is the first step to safeguarding an enterprise’s most valuable assets, only then can an organization’s dark data be leveraged for true data intelligence.
How Is Dark Data Used?
Often overlooked and unsecured, dark data represents a significant risk for enterprises. Hackers can sift through large amounts of data looking for sensitive information like login credentials, financial data, or other personally identifiable information (PII).
In addition to the risk of data breaches and exploitation by malicious attackers, dark data poses a risk to an organization’s compliance. Regulations like NIST, HITRUST, even the GDPR and CCPA require organizations to secure and protect the personal data of individuals. Without proper management of their dark data, organizations can be subject to harsh fines and other penalties.
Regular assessment of data collection and storage practices is essential for organizations to protect their most critical data from falling into the wrong hands. Whether in active use or not, dark data must be secured to ensure organizations maintain compliance, protect against breaches, and reduce their risk posture.
Dark Data Trends
Dark data is difficult to analyze (but easy to grow) —and as cloud adoption continues to rise, so does the volume of dark data.
In 2023, dark data will continue to pose significant risk for organizations that aren’t proactive about their data security posture. Many organizations have data that is stored in different systems and departments, making it difficult to access and analyze.
How Should You Handle Dark Data?
If you don’t know your data is there, you can’t ensure that it meets compliance — and you can’t meet data privacy standards if you can’t associate your data with an identity. Additionally, you can’t protect what you don’t know you have — or know what level of protection it needs. Therefore, unknown data carries unknown levels of risk, but is often more vulnerable to data breaches and data leaks — which is pretty scary news, considering it very likely contains personal and sensitive information.
For many businesses, beginning to capture untapped data may seem overwhelming, but the process of finding, classifying, analyzing, and unlocking value from it is just a matter of implementing the right discovery solution. Companies need ML-driven technology with a deep discovery foundation that can find data across all systems and sources — everywhere in an organization, no matter where it’s hiding.
Dark Data Analytics
Dark data analytics refers to the technology solutions that companies use to locate unknown data so that its value can be unlocked to inform better business decisions.
Companies that prioritize mining dark data are well-poised to reduce risk and unlock valuable business insights that can help their organization grow and thrive. Enabling a solution so that previously untapped data can be moved to a data analytics platform provides a broader and far more accurate view of customer data across an entire enterprise.
How BigID Identifies and Eliminates Dark Data
BigID is purpose-built to discover all enterprise data — the data you know about, and the data you don’t. The powerful, machine-learning platform leverages a deep data discovery foundation that automatically finds, classifies, and catalogs all hidden data a company holds, no matter where it lives, how long it’s been hiding, or how buried it is.
Using BigID, businesses can:
- Automatically discover and classify all dark data — including personal and sensitive data that must meet compliance standards — based on the content and structure of the data.
- Clean up all untapped data, find relationships, and add context.
- Identify, measure, and manage risk on hidden data so it can be appropriately protected.
- Integrate hidden data into a unified inventory that serves as the enterprise’s single source of truth.
- Automatically uncover unknown data that links to an existing identity or entity.
- Take action to unleash the value of dark data — and establish workflows for retention, remediation, and risk reduction.
- Meet compliance for any regulation that pertains to the business.
BigID helped a global airline that had sensitive data spanning decades, technologies, and structures discover, classify, and catalog petabytes of data that even the system owners did not know how to find. Using BigID, the airline:
- discovered regulated data in locations where it shouldn’t have been and brought it up to compliance
- uncovered overexposed databases containing SSNs from decades before, and was able to lock down and protect that data
- gained visibility into their current systems and legacy data stores