How to Prevent Data Leaks in Amazon S3

September 15, 2020

4 minute read

The past several years have seen the migration of massive amounts of data to cloud platforms like Amazon Web Services (AWS). While many businesses have established and enforced measures to ensure that data stored in the cloud stays protected, cautionary tales about exposed data from Amazon S3 buckets tell a different story.

AWS launched S3, or Simple Storage Service, in 2006, to widespread popularity across industries — healthcare, finance, government, and a host of others that handle personal and sensitive data.

Since then, one leaky S3 bucket has followed another in a torrent of high-profile data breaches affecting organizations from Netflix to the United States Department of Defense. Compromised sensitive data from these breaches run the gamut — from health details, passport info, political affiliations, tax histories, credit reports, system passwords, emails, salaries, government intelligence, and even medical and recreational marijuana use. You name it, it’s been exposed in an S3 data breach.

Are Amazon S3 Buckets Insecure?

While Amazon S3 buckets are private by default and can only be accessed by administrators who have been explicitly authorized, the buckets are also notoriously complicated to configure. Not every company that is migrating or managing data in the AWS cloud has an IT infrastructure equipped to properly handle the complex environment.

Public versus private access, for example, isn’t easy to determine or establish, partly due to a slew of overriding rules and access control lists that are constantly changing — plus complicated nested directories with potentially conflicting levels of restriction. For example, a bucket may be configured as private, but contain items that override that setting, effectively exposing some of the data inside, while unknowing administrators go about their day.

Whether the culprit is knowhow or negligence, most S3 data breaches or leaks result from user or administrator error. This makes the compromised data low-hanging fruit, in cybersecurity terms. And while Amazon has released security updates to address the misconfiguration issue, the problem has grown too vast for quick fixes.

How to Protect Sensitive Data in AWS S3

As is often the case when it comes to data management in the cloud, the solution isn’t to try harder; it’s to look deeper — drilling inside your data to gain full visibility into it. Without being able to find, classify, and catalog your organization’s personal and sensitive data, it’s impossible to effectively protect it in the cloud.

Discover your data

BigID’s ML-based data discovery automatically discovers your sensitive, regulated, and personal data stored in S3 buckets so you can effectively secure it in the cloud. Discovery-in-depth enables organizations to know where their data is, what sensitive and personal information it contains, whose data it is, and where it’s vulnerable.

Classify and correlate sensitive data

Traditional classification leaves sensitive data in the dark — and can’t identify broader types of sensitive or personal data. BigID leverages multiple techniques — from pattern-based RegEx to deeper, ML-based content classification — to classify the data you have in S3 buckets by sensitivity, policy, person, type, category, attribute, and more.

Catalog for context and clarity

With a clear, catalog view of your data in a unified inventory, you can understand data context and align your S3 configuration settings to your data’s varying levels of sensitivity. BigID’s catalog incorporates active metadata for business context and clarity so you know exactly what you’re looking at, and exactly how it should be handled.

Identify duplicate and derivative data

BigID highlights duplicate and derivative data, so that you can easily identify duplicate, redundant, or similar data to minimize risk and more securely manage it in S3.

Get insight into open access

Uncover overexposed data, easily find over-permissioned and open access files at a glance, and protect sensitive data across the organization to minimize the risk of data leaks and data breaches. BigID scans any object stored in an S3 bucket, including its content and metadata, identifying sensitive data and access status. BigID’s file access intelligence highlights overexposed and over-privileged sensitive data in AWS S3 so organizations can reduce risk of data breaches and data leaks.

Identify compromised data

BigID enables organizations to accurately determine impacted users following a data breach or data leak incident to meet breach notification requirements and speed up investigation response.

Take action to manage S3 data

Take action with BigID’s apps for privacy, protection, and perspective to proactively manage and monitor S3 data — from fulfilling privacy and compliance requirements to identifying breached data to improving data quality.

See BigID in action today, and learn more about how BigID helps organizations protect their data in AWS — and everywhere else.

BigID

Meet BigID's author collective, a diverse team featuring product marketers, subject matter experts, and copywriters deeply versed in data privacy, security, and governance. Our collaborative approach harnesses a wealth of industry expertise to craft insightful and informative content, ensuring you stay informed in this ever-evolving landscape.

Contents

Are Amazon S3 Buckets Insecure?
How to Protect Sensitive Data in AWS S3