Adam Caudill

Security Leader, Researcher, Developer, Writer, & Photographer

Trojan Source and Why It Matters

Yesterday the news hit of a new vulnerability that threatens the security of all code; dubbed Trojan Source by the researchers from the University of Cambridge. From an initial analysis, it does seem to impact just about everything, and the status of fixes is very hit or miss at this point. But the real question is, does this even matter? Is this issue worth spending your time on? Let’s look closer.

What’s Impacted? #

Impacted languages include at least:

  • C
  • C++
  • C#
  • JavaScript
  • Java
  • Rust
  • Go
  • Python

That’s not all, tools are impacted as well, including at least:

  • GitHub
  • BitBucket
  • VS Code
  • Atom
  • SublimeText
  • Notepad++
  • vim
  • emacs

This list of tools isn’t complete, as it appears that GitLab is impacted as well; it’s likely that the complete list of affected tools and languages is nearly as long as the list of tools and languages in common use today. With a list this long, there’s undoubtedly at least reason to give this issue serious attention — it can’t just be quickly dismissed.

What is the Issue? #

Unicode, the text encoding standard used to power pretty much everything today, supports both left-to-right languages and right-to-left languages — and it allows you to easily switch between them thanks to special control characters (Unicode Bidi). When used in a left-to-right language like English, ‮odd things happen‬.

This issue takes advantage of this fact to create these odd situations; to create code that, when read, is understood to do one thing but is read differently when executed. This means that during a code review, you see something like this (from their paper):

#include <stdio.h>
#include <stdbool.h>

int main() {
	bool isAdmin = false;
	/* begin admins only */ if (isAdmin) {
	printf("You are an admin.\n");
	/* end admins only */ }
	return 0;
}

This is somewhat odd formatting but simple enough to understand; what the compiler sees when it parses the file isn’t the same thing - and results in a very different outcome:

#include <stdio.h>
#include <stdbool.h>

int main() {
	bool isAdmin = false;
	/* } if (isAdmin) begin admins only */
	printf("You are an admin.\n");
	/* end admins only { */
	return 0;
}

What is the Real World Risk? #

In short, your tools may not show you what’s actually in the code you are looking at; they can be tricked to display something different from what’s going to be executed. As a result, you can be misled into believing malicious code is innocent, which is especially important during code reviews. Accepting a Pull Request that exploits this could have devastating consequences.

So we need to update these tools and compilers, and we’re good, right?

No, and probably not by a long shot. It’s been pointed out that this isn’t new — it’s been reported at least as early as 2017, though no serious action was taken (and again in 2018, and again in 2019, and again in 2020). In all likelihood, these issues exist just about everywhere. This issue will still exist in a decade. I wouldn’t be surprised if it still exists in two decades.

Okay. So there’s no point in worrying about it?

No again. While it has been overhyped, maybe it needed to be. Given that there’s been no real traction on addressing this issue over the last few years, perhaps this is what was needed. Maybe the website and name are needed to get people to finally act when they ignore security issues. Maybe, instead of looking at these named issues as researchers being more interested in fame and accusing them of overhyping their findings, should rather see this as a tool to move an industry that isn’t paying attention? But I digress.

This attack is possible and (until recently) was easy to execute; even if you trust your developers, do you know their motivations and weakness enough to know that they wouldn’t take the chance? If you maintain open-source projects, how confident are you that none of the Pull Requests you’ve approved have something like this? For high-profile projects, a backdoor could be worth thousands to millions — there are plenty of motivated people.

This issue raises interesting questions and risks. While I don’t see this as the most likely attack scenario (actually, I think it’s relatively unlikely), that doesn’t mean that there’s a justification to ignore it. There will always be people motivated to get something into a codebase, and they will use the tools they have available.

Fixing the Right Tools #

The Rust team acted quickly and released an update to add a new linter to check for these Bidi control characters. This approach makes a lot of sense to me, in that updating linters and style checking tools are a logical place to check for these things. That said, not everyone will follow their lead, and there’s an argument to be made that the compiler is the wrong place to fix this. Fixing every compiler, interpreter, transpiler, pre-processor, and every other tool that can be used as an attack point is a significant effort — and one that may not ever be entirely complete.

There’s a far smaller group of tools that should be considered a higher priority, and those are the tools that represent the first line of defense for this and a variety of other attacks: tools used in code review. These tools matter most, and should be the focus of these efforts to make this issue impractical.

At least GitHub and GitLab have issued updates to address this issue, and that right there will prevent a substantial portion of attempts to use this. While you should care about addressing this, the thing that’s most important to the industry is that the most common tools used in code review are updated.

Code Reviews & Supply Chains #

Today, the vast majority of developers are guilty of placing trust in strangers. That is, they rely heavily on third-party dependencies without really understanding either the code they are bringing into their work, or the security around these projects. Unfortunately, attacks against these dependencies are becoming more and more common, and yet the situation really isn’t improving.

What’s worse is it’s not just the dependencies that get pulled in you have to worry about, there’s also the dependencies they rely on. One small addition to the dependency list can easily add a dozen or more others, and each of those represents a real threat.

Any dependency in the chain could be compromised via this (or countless other attacks), and your application is compromised. This is why it’s essential to understand everything being included in your work and how they impact your security posture. Maybe you are careful with your code, only trusted contributors, careful security controls, detailed code review, and robust unit tests to ensure that everything works just as it should. That would be great! But can you say the same for all the dependencies you rely on? Or their dependencies? Or the dependencies of your dependencies dependencies? These chains are so long, so complex, it’s extremely difficult to ensure that they are secure and aren’t an active threat.

It’s All About Code Review #

While I see this issue as interesting but somewhat overhyped, I believe that it should be addressed, not just because of what it is, but because it’s a proxy for a larger issue. As noted above, this issue impacts the entire supply chain, which is too often ignored.

Detailed code review of changes to an application are considered standard practice today; code shouldn’t ever be added without meaningful review. Unfortunately, what’s happening is that dependencies are added and updated without this same level of care, which is a recipe for disaster. This issue is an excellent reminder that you need to look at everything and ensure that it is actually safe; in that way, it’s raising some very useful questions that shouldn’t be ignored.

Maybe it’s overhyped, but perhaps it’ll do enough good that it doesn’t matter that it’s overhyped. Maybe, just maybe, this is what the industry needed.

It’s time we take this a lot more seriously.

Adam Caudill