The definition of a data breach seems to be reasonably straightforward and easy to understand — but that isn’t always the case. LinkedIn is back in the news thanks to a dataset containing profile information for 700 million records being traded among the darker actors on the internet. But LinkedIn is very clear about how they view this situation:
This was not a LinkedIn data breach and our investigation has determined that no private LinkedIn member data was exposed.
This statement is sufficiently unambiguous, but that fact remains that data about their members are being traded freely. So, is this a breach or not?
According to Wikipedia, it’s “the intentional or unintentional release of secure or private/confidential information to an untrusted environment.” Though this doesn’t answer the question as there’s the question of what’s secure or private/confidential information. ISO 27040 gives us the definition as “compromise of security that leads to the accidental or unlawful destruction, loss, alteration, unauthorized disclosure of, or access to protected data transmitted, stored or otherwise processed.” This doesn’t exactly answer the question either.
Perhaps the right way to define a data breach is to craft a new definition that is a bit more clear: the release of information that is expected to remain private or otherwise restricted. I think most people would agree that this aligns with their mental model of a data breach.
It’s being reported that according to the attacker, the data was scraped using a LinkedIn API, leveraging that API to collect as much information as possible (though LinkedIn has stated that it’s a combination of their data and data from other sources). The data contains various information, such as email, name, phone number, geolocation data, Facebook profile, and more. All of this is data that users have provided to LinkedIn to build their profiles. In addition, LinkedIn provides this data to others, from third parties using their integration products to other users.
This raises the question, did users have a reasonable expectation that LinkedIn would protect this data? When data is provided to a social media company, most users aren’t aware of how it will be used or exposed. This can result in a situation where a user’s expectations don’t accurately align with what’s actually happening.
In this case, there is data that LinkedIn is knowingly making available, which has been collected and enhanced with data from other sources; this isn’t a breach in that the information is public, not protected, or otherwise private. So while users may see this as a breach, both of data and trust, it likely doesn’t meet the definition.
LinkedIn did clarify that this type of data scraping is a violation of their terms of service; relying on a legal document that almost no one reads to prevent their data from being used in illegal attacks doesn’t seem to be the most effective strategy. However, that’s the hill they’ve decided to make a stand on.
However, something that should be acknowledged is that it likely doesn’t matter if this is technically a breach or not; the impact to users is essentially the same. Data has been exposed that can be leveraged to simplify spearphishing, social engineering, and other attacks; it can also be further enriched in the future to enable new attacks.
So even though LinkedIn’s position that there wasn’t a breach is likely correct (from a technical point of view), it’s also wrong in that it’s a distinction without a difference. Making an argument from a technical perspective discounts the impact to users for the sake of protecting their brand does a disservice to the very users that have made them successful.