Adam Caudill

Security Leader, Researcher, Developer, Writer, & Photographer

LinkedIn: The Breach That Isn't but Is

The definition of a data breach seems to be reasonably straightforward and easy to understand — but that isn’t always the case. LinkedIn is back in the news thanks to a dataset containing profile information for 700 million records being traded among the darker actors on the internet. But LinkedIn is very clear about how they view this situation:

This was not a LinkedIn data breach and our investigation has determined that no private LinkedIn member data was exposed.

This statement is sufficiently unambiguous, but that fact remains that data about their members are being traded freely. So, is this a breach or not?

What is a data breach? #

According to Wikipedia, it’s “the intentional or unintentional release of secure or private/confidential information to an untrusted environment.” Though this doesn’t answer the question as there’s the question of what’s secure or private/confidential information. ISO 27040 gives us the definition as “compromise of security that leads to the accidental or unlawful destruction, loss, alteration, unauthorized disclosure of, or access to protected data transmitted, stored or otherwise processed.” This doesn’t exactly answer the question either.

Perhaps the right way to define a data breach is to craft a new definition that is a bit more clear: the release of information that is expected to remain private or otherwise restricted. I think most people would agree that this aligns with their mental model of a data breach.

A Data Breach or Not? #

It’s being reported that according to the attacker, the data was scraped using a LinkedIn API, leveraging that API to collect as much information as possible (though LinkedIn has stated that it’s a combination of their data and data from other sources). The data contains various information, such as email, name, phone number, geolocation data, Facebook profile, and more. All of this is data that users have provided to LinkedIn to build their profiles. In addition, LinkedIn provides this data to others, from third parties using their integration products to other users.

This raises the question, did users have a reasonable expectation that LinkedIn would protect this data? When data is provided to a social media company, most users aren’t aware of how it will be used or exposed. This can result in a situation where a user’s expectations don’t accurately align with what’s actually happening.

In this case, there is data that LinkedIn is knowingly making available, which has been collected and enhanced with data from other sources; this isn’t a breach in that the information is public, not protected, or otherwise private. So while users may see this as a breach, both of data and trust, it likely doesn’t meet the definition.

LinkedIn did clarify that this type of data scraping is a violation of their terms of service; relying on a legal document that almost no one reads to prevent their data from being used in illegal attacks doesn’t seem to be the most effective strategy. However, that’s the hill they’ve decided to make a stand on.

Distinction without Difference #

However, something that should be acknowledged is that it likely doesn’t matter if this is technically a breach or not; the impact to users is essentially the same. Data has been exposed that can be leveraged to simplify spearphishing, social engineering, and other attacks; it can also be further enriched in the future to enable new attacks.

So even though LinkedIn’s position that there wasn’t a breach is likely correct (from a technical point of view), it’s also wrong in that it’s a distinction without a difference. Making an argument from a technical perspective discounts the impact to users for the sake of protecting their brand does a disservice to the very users that have made them successful.

Adam Caudill


Related Posts

  • On Apple, Privacy, and Device Control

    If you’ve bothered to look at Twitter or any technology news source, you’ve seen that Apple made a major announcement: Expanded Protections for Children. This has been written about by countless outlets, so I’ll assume you’re familiar with the basics. The announcement covered a few new features being added to the next version of Apple’s operating systems, namely: Scanning of inbound and outbound messages for sexually explicit images. Scanning images being uploaded to iCloud for CSAM.

  • Confide, Screenshots, and Imaginary Threats

    Recently Vice published a story about a lawsuit against the makers of the ‘secure’ messaging application Confide. This isn’t just a lawsuit, it’s a class-action lawsuit and brought by Edelson PC – an amazingly successful (and sometimes hated1) law firm – this isn’t a simple case. The complaint includes a very important point: Specifically, Confide fails to deliver on two of the three requirements that it espouses as necessary for confidential communications: ephemerality and screenshot protection.

  • Crypto Crisis: Fear over Freedom

    Yesterday, President Obama spoke at SXSW on topics including the oft-discussed fight between Apple and the FBI – what he called for, while more thoughtful than some of the other comments that we have been hearing from Washington, was still tragically misinformed. He repeated the call for a compromise, and by compromise, he meant backdoors. Here, I feel I must paraphrase one of my favorite authors to properly express the magnitude of what’s being discussed here:

  • Verizon Hum Leaking Credentials

    or, Christmas Infosec Insanity… A friend mentioned Hum by Verizon, a product that I hadn’t heard of but quickly caught my attention – both from a “here’s a privacy nightmare” perspective, and “I might actually use that” perspective. While looking at the site, I decided to take a look at the source code for the shopping page – what I saw was rather unexpected. Near the top is a large block of JSON assigned to an otherwise unused variable named phpvars – included was some validation code, a number of URLs, some HTML, and the like.

  • Juniper, Backdoors, and Code Reviews

    Researchers are still working to understand the impact of the Juniper incident – the details of how the VPN traffic decryption backdoor are still not fully understood. That such devastating backdoors could make it in to such a security-critical product, and remain for years undetected has shocked many (and pushed many others deeper into their cynicism). There are though, some questions that are far more important in the long run: