Yesterday, Darcy Clarke, a software developer and former npm CLI team Engineering Manager, steered everyone's attention toward a gap in the npm registry website – what he calls "manifest confusion."
What is manifest confusion?
As Clarke details in his blog post, an npm package's manifest is published on npmjs.com independently from the actual contents of its tarball, as first reported by The Register. Since the manifest and package are decoupled, this creates "confusion" by those using the package when the dependencies and manifest are vastly different than expected.
As a succinct example, Clarke created a proof-of-concept npm package called `darcyclarke-manifest-pkg` that, according to the npmjs.com website, has zero dependencies and no license information listed:
But a quick look inside the (real) manifest file, package.json, bundled within the package reveals otherwise:
Upon installation, `darcyclarke-manifest-pkg` will clearly pull in the 'sleepover' dependency – also created by Clarke, and the package is in fact ISC-licensed.
Furthermore, we see that the package calls itself 'express' as apparent from the first line, and has a version (3.0.0) listed that differs from the actual version we analyze (2.1.x).
Clarke's dummy package also runs an "install" script – something yet again not revealed on the npm website, that demonstrates potentially malicious behavior that a threat actor can leverage, as we have seen time and time again.
Why does npm manifest confusion matter?
Manifest confusion becomes problematic in development environments without effective DevSecOps workflows and tooling in place, especially when applications blindly trust application manifests rather than the actual (vulnerable or malicious) files contained within open source packages, increasing manifest confusion.
Some security tools, for example, may choose to fetch and display information as provided on the npmjs.com website, as is, thereby misleading developers into mistaking that a package contains different dependencies and licensing information than what may truly be the case.
Is manifest confusion something to worry about?
If you just rely purely on the metadata contained within package manifests and security tools that perform manifest-based scanning, then yes, you should be worried about npm manifest confusion. This information can often be full of inaccuracies, even in cases that are not malicious. For example, when genuine projects are copied or forked, the new party may fail to update the metadata information within the manifests.
That's exactly why the Sonatype platform doesn't rely on manifests alone. This isn't a new problem, just one gaining attention now. When we built Sonatype Lifecycle and Sonatype Repository Firewall, we also created Advanced Binary Fingerprinting (ABF), a proprietary way of looking at packages that scans them as "deployed" rather than as "declared."
With ABF scanning, we examine binary fingerprints (similar to a truncated sha1 hash) of all of the files and not just the file names and manifests. ABF is highly accurate because it examines everything included in the application after the build, including any embedded dependencies. This means that an ABF scan will never return false positives in its report, leaving less room for manifest confusion. Sonatype data is tied to the component fingerprints of any files where the vulnerability is discovered. When a vulnerability is reported it is because the component fingerprint is in your application.
By providing an accurate assessment of embedded dependencies, the outcome is a comprehensive software bill of materials (SBOM) that transparently represents third-party risk. ABF identification utilizes cryptographic hash for binaries, structural similarity, derived coordinate, and file names. It even detects renamed or altered components, regardless of declaration, incorrect labeling, or manual addition to the code base.
Clarke's PoC package, `darcyclarke-manifest-pkg`, for example, tracked as sonatype-2023-2789 flags a security violation when scanned in Sonatype. The package information displays the most appropriate name (not 'express') and version information – as consistent with the tar.gz file:
Additionally, our reports show the correct license information - ISC, as opposed to what's relayed by the npmjs.com website:
Should Clarke's 'sleepover' bundled dependency be pulled into your development project and contain any security vulnerabilities, violations, or malicious code detected by our automated ML/AI-powered malware detection engine, it would also be flagged and displayed clearly in Sonatype products.
What to do about manifest confusion?
Due to the nature of how Sonatype Lifecycle and Repository Firewall operates, Sonatype customers and users are protected by default against "manifest confusion."
Previous bug reports [1, 2] on the issue were apparently closed by npm without action. But that's not necessarily a cause for concern.
While a lack of validation between what's actually present within an npm package vs. what its manifest claims can pose potential problems – such as cache poisoning, unexpected dependency injection, execution of malicious scripts and version downgrade attacks, as Clarke points out, the risk of manifest confusion is ultimately dependent on what your security tooling trusts.
If you're not using Sonatype, check your tools to make sure they're not relying on manifests. If they are, come chat with us, we're happy to help, as manifest-based scanning is never a good idea. Tools should be digging deeper to rid you of any manifest confusion.
Written by Ax Sharma
Ax is a Staff Security Researcher & Malware Analyst at Sonatype with a penchant for open source software. His works and expert analyses have frequently been featured by leading media outlets including the BBC. Ax's expertise lies in security vulnerability research, reverse engineering, and cybercrime investigations. He has a passion for educating a wide range of audiences through writing and vlogs.
Explore All Posts by Ax Sharma