oss-sec mailing list archives

Trojan Source Attacks


From: Nicholas Boucher <nicholas.boucher () cl cam ac uk>
Date: Mon, 1 Nov 2021 17:27:53 +0000

OSS Security teams,

We have identified an issue affecting all compilers and interpreters that support Unicode. We believe that the techniques described hereafter can be used to generate adversarial encodings of source code files that can be used to craft targeted attacks against source code that cannot be seen by human reviewers in rendered text. This is of concern to the open source community because, absent defenses, supply chain attacks can be imperceptibly mounted against the ecosystem.

This vulnerability has undergone a coordinated disclosure process that has concluded today. The security advisory can be found at https://trojansource.codes.

Multiple organizations will be releasing parallel security advisories, such as Rust's advisory at https://blog.rust-lang.org/2021/11/01/cve-2021-42574.html, Red Hat's advisory at https://access.redhat.com/security/vulnerabilities/RHSB-2021-007 <https://access.redhat.com/security/vulnerabilities/RHSB-2021-007>, and GitHub's advisory at https://github.blog/changelog/2021-10-31-warning-about-bidirectional-unicode-text/ <https://github.blog/changelog/2021-10-31-warning-about-bidirectional-unicode-text/>.

The attached paper describes an attack paradigm -- which we believe to be novel -- discovered by security researchers at the University of Cambridge. There are two techniques for attack, both of which exploit Unicode's high expressiveness to craft source code files for which rendered text displays divergent logic from the underlying encoded bytes seen by compilers.

The first and primary technique, which we dub the Trojan Source attack, uses Unicode Bidirectional (Bidi) control characters embedded in comments and string literals to produce visually deceptive source code files. This technique enables an adversary to encode constructs that visually appear to be comments or string literals but execute as code, or vice versa. Complete details, as well as recommended mitigations, can be found in the attachment 001 Trojan Source.pdf. This vulnerability is tracked under CVE-2021-42574.

The second technique, to which we refer as the homoglyph variant, uses homoglyphs (characters that render to the same glyph but are represented by different Unicode values) to define adversarial identifiers. In this technique, an adversary defines an identifier such as a function name that appears visually identical to a target function, but is defined using Unicode homoglyphs. This adversarial function then performs some malicious action, then optionally calls the original function it is impersonating. When defined in upstream dependencies such as open source software, these adversarial functions can be imported into downstream software and invoked without visual indication of malicious code. Complete details, as well as recommended mitigations, can also be found in the attachment 001 Trojan Source.pdf. This vulnerability is tracked under CVE-2021-42694.

Proofs-of-concept can be found at https://github.com/nickboucher/trojan-source.

We hope that this information proves useful in building and applying defenses where applicable.

Best,
Nicholas Boucher
University of Cambridge

Attachment: 001 Trojan Source.pdf
Description:

Attachment: OpenPGP_0x5662BCEC5F1D2BEA.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


Current thread: