10th Annual State of the Software Supply Chain®
10th Annual State of the Software Supply Chain Report
Scale of Open Source
The growth of open source is a signal for innovation within the software industry. You can observe new waves of technology being invented and adopted by measuring it.
With this growth, the engineers and innovators at large gain access to a source of innovation that is world-class and can in turn innovate faster.
The scale of open source is something that is hard to grasp intuitively and relate to a human scale, yet has a tremendous influence on how we innovate via software. At-scale effects may be unanticipated in nature and as usage grows ever wider, new risks and rewards emerge for its maintainers, users and the ecosystems they serve.
In this year’s report, we are taking a 10-year perspective on all measures. What is clear is that open source adoption has reached a multi-trillion request scale and shows no signs of slowing down. Over the decade, new challenges have appeared on the ecosystem scale that we will deep dive into. All our data is sourced from public sources and was collected in July 2024.
704,102
Malicious open source packages discovered by Sonatype since 2019
Figure 2.1 Open Source Adoption as Projected for 2024
Ecosystem | Total Projects | Total Project Versions | 2023 Annual Request Volume Estimate | YoY Project Growth | YoY Download Growth | Average Versions Released per Project |
---|---|---|---|---|---|---|
Java (Maven) | 67K | 18.7M | 1.5T | 7% | 36% | 28 |
JavaScript (npm) | 4.8M | 48.8M | 4.5T | 23% | 70% | 10 |
Python (PyPI) | 635K | 6.6M | 530B | 10% | 31% | 10 |
.NET (NuGet Gallery) | 664K | 10.5M | 159B | 6% | 14% | 16 |
Totals/Averages | 3.9M | 60M | 6.689T | 29% | 52% | 16 |
2024 Software Supply chain statistics. Figures estimated using Linear regression based on downloads to July 2024.
Open Source Supply Balloons Due to Malicious Actors
The supply side of open source is an interesting metric to gauge the pace and scale of innovation that occurs in a given ecosystem. The more open source projects are published every year, the more innovation occurs in a given ecosystem.
This year however, we observe both an unusual expansion effect in one ecosystem in particular, which was not organic in nature. This new kind of problem — packages intended to spam an ecosystem — shows that open ecosystems are liable to abuse. In this case, the act of publishing garbage also results in consumption that can be measured at scale.
Over recent years, npm has experienced a groundswell of new projects being published — not all of which have good intentions. Increasingly, the ecosystem has been a subject of malicious packages of various description as well as spam of various types, including packages aiming to redeem crypto rewards, packages aimed at publishing content via unorthodox means and others. Many ecosystems have faced challenges coping with this type of increase — PyPI famously paused accepting new releases due to a deluge of malicious releases.
Not all growth is organic. We've seen an unusual uptick in packages intended to spam open ecosystems.
Figure 2.2 Open Source New Project Growth Rate Over the Past 9 Years
Open source new project growth rate over the past 9 years. 2024 data to date in July 2024
It’s also clear Microsoft-stewarded ecosystems (npm and Nuget) have gone through clean up operations due to large volumes of malware and spam being published into the ecosystem, as is evident from concurrent and identical identical drops in project growth rates.
Between 2023 and 2024, the number of available open source projects grew an average of 11%. The average open source project in 2023 released 16 versions available for consumption, with specific ecosystem averages ranging from 10 to 28.
Figure 2.3 Open Source Projects and Versions Growth
Open Source Consumption Rockets through npm
This year will see the largest single annual consumption increase we have on record — the estimated volume of open source packages the world will download by the end of the year will sit by our estimates at 6.6 Trillion requests. This above baseline growth can be attributed to two things: spam and AI.
6.6 Trillion
Open source packages will be downloaded by the end of the year.
Figure 2.4 Cumulative Estimated Requests Per Ecosystem
Figure 2.5 Yearly Downloads Per Ecosystem
Broken down by ecosystem, it’s clear to see that npm is the largest contributor to this growth spurt, somewhat distorted by the malware spam observed this year, followed by PyPI and Maven Central. npm has undergone the second largest request growth since 2020, which is an incredible increase in volume served, given the scale of the ecosystem.
This growth is not entirely organic but, as noted, is likely caused by a deluge of spam packages published into open source registries. The below figure shows the yearly download view where this trend is clearly visible in npm. This anomaly might be causing issues with our linear regression and could lead to inflated estimates.
Individual Ecosystem Analysis
Java (Maven)
Through the first 7 months of 2024, 828 billion Java components were requested from the Maven Central Repository. This continues the strong average request growth seen and is due to continue towards the second half of the year, with linear regression forecasting the ecosystem possibly reaching nearly 1.5 trillion requests served.
Maven Central is one of the oldest open source ecosystems tracked, which can be seen from the amount of versions each project has published — an average of 28. This is 75% more than the average across all ecosystems.
Java 2023 by the numbers:
- 1.5 trillion packages, estimated request volume
- 36% YoY growth estimated, an increase compared to 2023.
- 7% project growth rate
- 28 versions per project on average.
JavaScript (npm)
npm continues to be the titan of the open source ecosystems when it comes to requests served, undergoing a significant growth spurt this year which is a significant anomaly from the usual pattern we observe. We can’t underscore enough that we believe this is because, in 2023, npm was riddled with a deluge of components that could be classed as spam, all aiming to get payouts using the Tea.xyz crypto protocol. This has inflated their numbers and shows up in the massive uptick of request volume. Similarly project counts are distorted due to this spam. Although not unique to npm, the virtue of a low bar to publish and a high degree of adoption makes it the perfect target for such activity.
To say npm supports a titanic volume would be an understatement. We estimate the ecosystem to serve well over 4.4 trillion requests by the end of 2024 — more than the entire volume of requests across all 4 monitored ecosystems in 2023.
JavaScript 2024 by the numbers:
- 4.5 trillion packages, projected download volume
- 70% YoY Request growth
- 23% project growth
Python (PyPI)
Python is the fastest grower in both project creation and request volume. It continues to be fueled by the AI and cloud adoption boom as a favored language in both domains.
Python 2024 by the numbers:
- 537 billion packages, projected download volume
- 87% YOY request growth
- 10% project growth
.NET (NuGet Gallery)
NuGet is the chosen ecosystem of the .NET family of languages and continues to serve engineers working with the growing set of Microsoft technologies. The rate of growth has slowed down significantly in terms of download requests. This is not entirely unsurprising given the integrated nature of the .NET language core library.
.NET 2024 by the numbers:
- 159 billion packages projected request volume
- 14% YoY request growth
- 6% project growth rate
Differentiating Software Vulnerabilities and Open Source Malware
To understand the risks in the software supply chain, it’s important to clarify the difference between Open Source Malware and Vulnerabilities. While the two concepts are related, they are completely different in terms of the type of risk they introduce into your organization, as well as the type of response that is required to mitigate said risk.
Software Vulnerability: A Flaw in the Code
A software vulnerability is akin to a flaw in code, much like a faulty lock on a door. Unlike malware, vulnerabilities are not intentional. Instead, they represent weaknesses in software components or projects.
Similar to how a faulty lock compromises the security of a building by allowing unauthorized access, a software vulnerability creates a gap in the software’s security perimeter. This gap becomes an entry point for intruders to exploit, gaining unapproved access to the system, application, or component.
Malware: Malicious Intent in Open Source
Malware, short for “malicious software,” poses a significant threat to open source software ecosystems. It encompasses a wide range of malicious programs, such as viruses, worms, trojans, ransomware, spyware, and adware, all designed to gain unauthorized access to information or systems. In the software supply chain, malware is most often passed off as legitimate open source components or introduced to previously legitimate projects via takeovers.
With its various forms, malware’s primary purpose is to steal data, install harmful software, gain control of a network, or compromise software or hardware. Threat actors employ diverse distribution methods, such as infected email attachments, malicious websites, or compromised software downloads.
Malware in the software supply chain is designed to target developer environments, like continuous integration systems and are commonly seen in ransomware attacks and sophisticated breaches. The only known cure is prevention and avoidance.
Vulnerabilities in the OSS Ecosystem
Security vulnerabilities are a fact of life — as technology evolves and ages, it also requires maintenance. New issues are discovered at a rate over time, and thus it’s important to acknowledge that vulnerabilities appear all the time. A good analogy is to think software components age like milk, not fine wine (or a new analogy you’ll see when we talk more about risk, it’s more like steel than aluminum) — they don’t get better with age. They might be good for a long time, but when a vulnerability is discovered, it’s akin to spoiled milk — something that needs to be discarded quickly.
On average, 13 Critical or Highly Severe security vulnerabilities are being discovered each year, per application.
The challenge, of course, is the scale of new security vulnerabilities being discovered in the different ecosystems, as well as the scale of issues being discovered in the software you manage.
Organizational Challenges
A few fundamental facts — last year we reported that the average Java application has about 150 open source components when counting both direct and transitive dependencies. On average, an application has 13 Critical or High severity security vulnerabilities being discovered each year. Depending on the size of the organization, the effort to remediate issues can vary wildly, from a few minutes to a few days, depending on the breaking changes needed to go from the current version to the non-vulnerable one.
Another challenge is the source of information about security vulnerabilities itself — in 2024, it has become evident that relying on free sources of information is almost considered neglectful for any organization not specializing in intelligence aggregation.
For example, the National Vulnerability Database, the canonical catalog of known security vulnerabilities via the Common Vulnerability Enumeration System (“CVE”), had an outage early 2024 that caused a massive backlog of vulnerabilities being published. At the time of writing, this backlog of published vulnerabilities sits at 17,656 unprocessed issues. This meant that in Q1 of this year, nearly no new security issues were made available to the community.
The volume of security vulnerabilities discovered is growing in linear ratio with the growth rate of open source being invented and published. This is to be expected and is uncomfortable news for organizations seeking to manage them.
NATIONAL VULNERABILITY DATABASE BACKLOG
17,656
The backlog of published but unprocessed vulnerabilities at the National Vulnerability Database, at the time of writing.
Open Source Malware & Next Gen Supply Chain Attacks are Now Commonplace, Dangerous Business
The growth of downloads hides a disturbing fact — the continued extreme growth of malware, protestware, and intentionally hidden vulnerabilities passed on to the users. These types of packages are published not due to carelessness, but with purely malicious intent. Using open source as a medium of transport for malware isn’t new. However, traditional scanning tools struggle to identify novel attacks, like we now see with malicious packages, otherwise known as open source malware. These tools, while effective on known malware, are incapable of finding malware that has not yet been identified.
We have logged 704,102 malicious open source packages — meaning in the last year, we’ve seen the number of malicious packages grow by 156% YoY.
Some have noble intentions, such as packages that protest wars around the world, while some hide extremely sinister motivations, including serious malware families and ransomware gangs that sell off their victims to the highest bidder. Every single one of them targets an often undefended prey: developers and automated build environments.
A great example of a successful malicious campaign targeting developers is the Snowflake breach of 2024, where developers were specifically targeted with malware families that stole Snowflake authorization tokens. These were later used to breach over 160 organizations.
In our YOY monitoring, at the time of writing in August 2024, we have logged 704,102 malicious open source packages — meaning in the last year, we’ve seen the number of malicious packages grow by 156% YOY. More troublingly, we observe via an anonymous survey conducted on more than 100k repositories that over 50% of unprotected instances surveyed have already fallen victim and cached a piece of malware.
A sobering finding in this year’s data is that more than 400k new pieces of malware have been introduced to the public binary repositories, with 65K of them being CVSS >= 7 since November 2023. All of these represent yet another facet of Persistent Risk (read more about this in our Risk chapter), and bring a total data set of more than 700k identified, malicious open source components.
Figure 2.6 Next Generation Software Supply Chain Attacks (2019-2024)
704,102
Malicious packages discovered
Malware Types
As with ‘traditional’ malware, malware disguised as open source comes in many guises and types. What is not traditional with open source malware is that it is executed entirely without developer interaction. Once the package is downloaded on the developers or build automation machine it is too late to avert disaster.
Figure 2.7 Malware Types Observed
Potentially Unwanted Application (PUA) 46.4%
A majority of the malware we observe being spread in the open source ecosystem is what we call “Potentially Unwanted Application” or PUA, which represents functionality that is present in the software but not disclosed to the end user. Examples of this include protestware, anti-work protests, and other uninvited functionalities. Though mostly innocent in practice, they represent a lack of process in getting packages and act as evidence of a hole in an organization’s open source defense.
Phishing 13.8%
These types of packages leverage attack methods such as dependency confusion to target organizations directly, pretending to be an internally developed package. They trick an organization’s build automation into downloading them and often drop malware as they are downloaded.
Data Exfiltration 13.7%
Data exfiltration packages read a number of pieces of data found on the machine, such as environmental variables, authentication tokens, password files and anything that might aid the assailant. Once collected, these files are uploaded to an external command and control server for future use.
Security Holding Package 12.7%
These are packages that were found to be malicious, but got removed by the maintainers of the ecosystem and replaced by a holding package. Finding any of these signifies a disaster averted by a hair’s breadth, only protected by the swift actions of the upstream maintainers.
PII Exfiltration 2.8%
A form of data exfiltration that targets Personally Identifiable Information like personal access tokens and information.
Backdoor 1.9%
A package that installs a backdoor virus onto the machine that executes it. This backdoor will allow the attacker to access the tainted machine at a later date.
Crypto Stealer / Miner 1.2%
These types of packages aim to make money fast by stealing any available cryptocurrency housed on the affected machine. This category also includes packages that drop a crypto miner that hijacks the machine’s resources to mine cryptocurrency for the hacker’s benefit.
Research Project 1.2%
Some malware is simply a research project, either by a researcher or a whitehat hacker that contains malicious code but typically does not go so far as to breach the machine or steal information. They are often seen during penetration tests.
Dropper 0.7%
As the name suggests, these types of packages drop an encrypted payload onto the affected machine, often a Remote Access Trojan that disappears from sight and allows hackers to return at a later date.
Other Malicious Packages 6.8%
The rest of the malicious packages discovered range from destructive ones aiming to corrupt the file system they launch on, to aiming to affect the code that a developer writes, often seen disguised as IDE or CI plugins.
Traditional malware scanning solutions are unable to detect these novel forms of attack, leading developers and devops environments to be uniquely at risk. As the volume continues to grow so too will the clear and present danger facing organizations.
Explore the Dangers of Open Source Malware
Learn how to be defensible against malware in your software supply chain. Explore our Open Source Malware and Vulnerabilities Resource Center.
Notable Malicious Packages
As we continue to document an overall rise in malicious attacks on open source ecosystems, the monitored 2023-2034 period has seen more professional criminal campaigns emerge. The software supply chain lends itself well to the cybercriminal ecosystem — either as an initial access vector to Initial Access brokers or even as a means of distributing initial access malware for Advanced Persistent Threat groups. Here are several examples we’ve seen this year:
Russia-linked 'Lumma' crypto stealer now targets Python devs
Devs flood npm with 15,000 packages to reward themselves with Tea 'tokens'
CVE-2024-3094 The targeted backdoor supply chain attack against XZ and liblzma
A Timeline of Attacks
We have continued to curate a timeline of known malicious packages and malware campaigns. This interactive timeline summarizes notable supply chain incidents, next-gen attacks and other incidents propagated using the software supply chain.
Next: Evolution of Open Source Risk
See Next Chapter