Where do data breaches come from?

on December 3, 2018

I recently did a bit of research on the source of data breaches. In this post, I’ll talk a bit about my current favorite source for breach information, and a bit of what I learned.

A man in a zip-up flannel jacket holding a laptop awkwardly, wearing a mask and hat. Text over the photo reads, “Common hacker fashion”

Verizon publishes the ‘Data Breach Investigations Report’ annually

The 2018 edition of this free report by Verizon Enterprise Solutions is the 11th edition – they’ve had some practice. The reports are extremely well detailed, and shockingly, they’re even entertaining to read.

The reports don’t claim to discover all data breaches. After all, not all data breaches are discovered, and those that are discovered aren’t necessarily reported.

2,216 breaches, analyzed

The 2018 report covers 53,000 incidents, defined as “a security event that compromises the integrity, confidentiality or availability of an information asset”.

It also covers 2,216 breaches, which are defined as “an incident that results in the confirmed disclosure – not just potential exposure – of data to an unauthorized party.”

These numbers (and the screenshots I’m sharing below), do NOT include breach data involving botnets. 43,000 “successful accesses via stolen credentials” associated with botnets are handled in a special insights section of the report.

Are data breaches caused mainly by insiders or outsiders?

A colleague of mine mentioned that he’d recently seen some numbers suggesting that data breaches were mainly perpetuated by insiders to an organization – but he hadn’t been able to track down the source of those figures or substantiating data. With the number of data breaches we see these days, that’s a pretty dark view of employee-employer relationships!

Here’s what the Verizon report shows in terms of who is behind the breaches:

A screenshot from the Verizon Data Breach Investigations Report 2018, showing 73% perpetuated by outsiders, 28% involving internal actors, 2% involving partners, 2% featuring multiple parties, 50% carried out by organized criminal groups, 12% involved actors identified as nation-state or state-affiliated

2018 Data Breach Investigations Report, 11th Edition, Verizon, page 5

These figures are regarding those confirmed data breaches, not all security incidents. While 28% involve internal actors, the bulk of data breaches are coming from people outside the organization, finding their way in by using malware or social attacks, or by exploiting vulnerabilities created due to errors.

Who can a database administrator trust?

For those internal actors involved in data breaches, my first thought was, “Well, so WHO WAS IT?”

That’s answered a couple pages later. While the exact internal actors weren’t found for all of the reported data breaches, analysis was done for 277 data breaches:

A screenshot from the Verizon Data Breach Investigations Report 2018, showing internal actors: 72 system admin, 62 end user, 62 other, 32 doctor or nurse, 15 developer, 9 manager, 8 executives

2018 Data Breach Investigations Report, 11th Edition, Verizon, page 9

As much as database administrators like to focus on denying permissions to developers for production, developers were much less likely to be involved in data breaches than system admins.

And who exactly are system admins?

Well, I’m guessing that includes… the DBAs.

Awkward.

This is remarkable given that you don’t need production access to cause a data breach

It’s a pretty normal practice in an Enterprise to make copies of production data for use in internal environments. Copies of data are used by analysts, developers, product managers, marketing professionals, and more.

Redgate’s 2018 State of Database DevOps Report found that 67% of respondents use production data in development, test, or QA Environments, and that 58% of respondents reported that production data should be masked when in use in these environments:

The 2018 State of Database Devops, Redgate, p12

There are good reasons that production data is spread around like this: performance is extremely difficult to predict using data that doesn’t have a very similar distribution and similar size to production data.

But after many years of working in IT, I know that most often this data is not modified or masked after being duplicated. These environments tend to be far less secure than production environments, and they are a very rich target for data breaches – even if it’s not the developers themselves intentionally causing the data breach.

The rise of malware and social attacks means that all environments in our company can the the source of a data breach.