How to protect yourself against Trojan Source unicode attacks with Sonatype Repository Firewall

December 03, 2021 By Chris Good

3 minute read time

Open source ecosystems and the tools that developers use have seen increasing attacks over the past three to four years, with so many "novel" attack vectors coming to fruition this past year. In November 2021, researchers at the University of Cambridge found yet another way to invade the open source community, called "Trojan Source."

The paper Trojan Source: Invisible Vulnerabilities (PDF format), details how malicious adversaries can exploit a weakness in text encoders which compiles and interprets source code differently than how it's displayed to developers. This, in essence, creates an invisible vulnerability that cannot be seen by the human eye and is a great threat to development environments if not properly addressed.

What is a Trojan Source attack?

As explained by researchers, Nicholas Boucher and Ross Anderson:

"This attack exploits subtleties in text-encoding standards such as Unicode to produce source code whose tokens are logically encoded in a different order from the one in which they are displayed."

It's a scary and high-risk attack for developers affecting both "comments and strings to appear to be code and vice versa."

The full scale of this vulnerability is massive, potentially affecting almost every language and system from Linux to Webkit. The weakness exploits Unicode's bi-directional or "Bidi" algorithm, which handles displaying text that includes mixed scripts with different display orders. This includes Arabic (right to left) and English (left to right). Because the editor misses detecting bi-directional characters in the source code, bad actors can inject malicious code that looks harmless.

Targeted development teams can pull in malicious packages, carefully review them for suspicious activity, and still not find any red flags. Thus, it is more important now more than ever for teams to have an effective, automated malware detection and protection system in place to secure your software supply chain.

Protection against Trojan Source unicode attacks

So - how can you protect yourself against these invisible, Trojan Source attacks? If you're a Sonatype Repository Firewall user, we’ve already got you covered. If you're not, we should talk.

To address this issue, we've expanded the Release Integrity protection and added signals that are continuously monitoring for any "odd behaviors," searching for Unicode within the components you are managing and flagging them as suspicious when discovered. This new protection against Trojan Source attacks is available now to all Sonatype customers running Sonatype Repository Firewall, including Release Integrity.

We are proud that Sonatype is among the first in the market to offer intelligent, automated protection against this sophisticated Trojan Source Unicode Attack - and are here to help you understand what's happening with this new attack vector in any way we can.

Evolution of malicious attacks on developers and open source ecosystems

This discovery is a further indication that developers are the new target for adversaries over the software they write. Sonatype has been tracing brandjacking, typosquatting, and cryptomining malware lurking in software repositories for years. We've also found critical vulnerabilities and next-gen supply-chain attacks, and copycat packages targeting well-known tech companies.

This evolution is why we created our automated malware detection feature, Release Integrity, to begin with. Getting ahead of the constant onslaught of attacks on developer ecosystems takes more than just due diligence and luck – it takes the expertise of experienced security professionals and hundreds of terabytes of data.

Powered by our Sonatype Intelligence data, Release Integrity combines over 50 different signals used to identify potentially malicious activity and block risks before download. These signals feed into a first-of-its-kind artificial intelligence / machine learning (AI/ML)-powered automated malware detection and protection system.

article - repo firewall flowchart

Combined with the most precise and accurate vulnerability database, Sonatype Repository Firewall will automatically detect and block new and emerging threats in a manner that is non-disruptive, impacting developer productivity only when necessary.

Get protection from new vulnerabilities and other novel attacks and have confidence that your development pipelines are secure with Sonatype Repository Firewall.

Written by Chris Good

Chris is a Product Marketing Manager with Sonatype. Originally from Pittsburgh, PA, Chris studied Communications and Computer Science at the University of Pittsburgh. He enjoys working for Sonatype because of the culture here at the company -- it's diverse and promotes creativity. When he's not working with DevSecOps community, he loves snowboarding, cycling, and traveling.

Explore All Posts by Chris Good

How to protect yourself against Trojan Source unicode attacks with Sonatype Repository Firewall

What is a Trojan Source attack?

Protection against Trojan Source unicode attacks

Evolution of malicious attacks on developers and open source ecosystems

Get Sonatype Blog Digest

Subscribe for all the latest software security news and events