10th Annual State of the Software Supply Chain®

hero-scale

Scale of Open Source

The growth of open source is a signal for innovation within the software industry. You can observe new waves of technology being invented and adopted by measuring it.

With this growth, the engineers and innovators at large gain access to a source of innovation that is world-class and  can in turn innovate faster.

The scale of open source is something that is hard to grasp intuitively and relate to a human scale, yet has a tremendous influence on how we innovate via software. At-scale effects may be unanticipated in nature and as usage grows ever wider, new risks and rewards emerge for its maintainers, users and the ecosystems they serve.

In this year’s report, we are taking a 10-year perspective on all measures. What is clear is that open source adoption has reached a multi-trillion request scale and shows no signs of slowing down. Over the decade, new challenges have appeared on the ecosystem scale that we will deep dive into. All our data is sourced from public sources and was collected in July 2024.

704,102

Malicious open source packages discovered by Sonatype since 2019

Figure 2.1 Open Source Adoption as Projected for 2024

Ecosystem Total Projects Total Project Versions 2023 Annual Request Volume Estimate YoY Project Growth YoY Download Growth Average Versions Released per Project
Java (Maven) 67K 18.7M 1.5T 7% 36% 28
JavaScript (npm) 4.8M 48.8M 4.5T 23% 70% 10
Python (PyPI) 635K 6.6M 530B 10% 31% 10
.NET (NuGet Gallery) 664K 10.5M 159B 6% 14% 16
Totals/Averages 3.9M 60M 6.689T 29% 52% 16

2024 Software Supply chain statistics. Figures estimated using Linear regression based on downloads to July 2024.

Open Source Supply Balloons Due to Malicious Actors

The supply side of open source is an interesting metric to gauge the pace and scale of innovation that occurs in a given ecosystem. The more open source projects are published every year, the more innovation occurs in a given ecosystem.

This year however, we observe both an unusual expansion effect in one ecosystem in particular, which was not organic in nature. This new kind of problem — packages intended to spam an ecosystem — shows that open ecosystems are liable to abuse. In this case, the act of publishing garbage also results in consumption that can be measured at scale. 

Over recent years, npm has experienced a groundswell of new projects being published — not all of which have good intentions. Increasingly, the ecosystem has been a subject of malicious packages of various description as well as spam of various types, including packages aiming to redeem crypto rewards, packages aimed at publishing content via unorthodox means and others. Many ecosystems have faced challenges coping with this type of increase — PyPI famously paused accepting new releases due to a deluge of malicious releases.

Not all growth is organic. We've seen an unusual uptick in packages intended to spam open ecosystems.

Figure 2.2 Open Source New Project Growth Rate Over the Past 9 Years

Open source new project growth rate over the past 9 years. 2024 data to date in July 2024

It’s also clear Microsoft-stewarded ecosystems (npm and Nuget) have gone through clean up operations due to large volumes of malware and spam being published into the ecosystem, as is evident from concurrent and identical identical drops in project growth rates.

Between 2023 and 2024, the number of available open source projects grew an average of 11%. The average open source project in 2023 released 16 versions available for consumption, with specific ecosystem averages ranging from 10 to 28. 

Figure 2.3 Open Source Projects and Versions Growth

Open Source Consumption Rockets through npm

This year will see the largest single annual consumption increase we have on record — the estimated volume of open source packages the world will download by the end of the year will sit by our estimates at 6.6 Trillion requests. This above baseline growth can be attributed to two things: spam and AI.

6.6 Trillion

Open source packages will be downloaded by the end of the year.

Figure 2.4 Cumulative Estimated Requests Per Ecosystem

Figure 2.5 Yearly Downloads Per Ecosystem

Broken down by ecosystem, it’s clear to see that npm is the largest contributor to this growth spurt, somewhat distorted by the malware spam observed this year, followed by PyPI and Maven Central. npm has undergone the second largest request growth since 2020, which is an incredible increase in volume served, given the scale of the ecosystem.

This growth is not entirely organic but, as noted, is likely caused by a deluge of spam packages published into open source registries. The below figure shows the yearly download view where this trend is clearly visible in npm. This anomaly might be causing issues with our linear regression and could lead to inflated estimates.

Individual Ecosystem Analysis

Java (Maven)

Through the first 7 months of 2024, 828 billion Java components were requested from the Maven Central Repository. This continues the strong average request growth seen and is due to continue towards the second half of the year, with linear regression forecasting the ecosystem possibly reaching nearly 1.5 trillion requests served.

Maven Central is one of the oldest open source ecosystems tracked, which can be seen from the amount of versions each project has published — an average of 28. This is 75% more than the average across all ecosystems.

Java 2023 by the numbers:

  • 1.5 trillion packages, estimated request volume
  • 36% YoY growth estimated, an increase compared to 2023.
  • 7% project growth rate
  • 28 versions per project on average.

JavaScript (npm)

npm continues to be the titan of the open source ecosystems when it comes to requests served, undergoing a significant growth spurt this year which is a significant anomaly from the usual pattern we observe. We can’t underscore enough that we believe this is because, in 2023, npm was riddled with a deluge of components that could be classed as spam, all aiming to get payouts using the Tea.xyz crypto protocol. This has inflated their numbers and shows up in the massive uptick of request volume. Similarly project counts are distorted due to this spam. Although not unique to npm, the virtue of a low bar to publish and a high degree of adoption makes it the perfect target for such activity.

To say npm supports a titanic volume would be an understatement. We estimate the ecosystem to serve well over 4.4 trillion requests by the end of 2024 — more than the entire volume of requests across all  4 monitored ecosystems in 2023.

JavaScript 2024 by the numbers:

  • 4.5 trillion packages, projected download volume
  • 70% YoY Request growth
  • 23% project growth

Python (PyPI)

Python is the fastest grower in both project creation and request volume. It continues to be fueled by the AI and cloud adoption boom as a favored language in both domains. 

Python 2024 by the numbers:

  • 537 billion packages, projected download volume
  • 87% YOY request growth
  • 10% project growth

.NET (NuGet Gallery)

NuGet is the chosen ecosystem of the .NET family of languages and continues to serve engineers working with the growing set of Microsoft technologies. The rate of growth has slowed down significantly in terms of download requests. This is not entirely unsurprising given the integrated nature of the .NET language core library.

.NET 2024 by the numbers:

  • 159 billion packages projected request volume
  • 14% YoY request growth
  • 6% project growth rate

Differentiating Software Vulnerabilities and Open Source Malware

To understand the risks in the software supply chain, it’s important to clarify the difference between Open Source Malware and Vulnerabilities. While the two concepts are related, they are completely different in terms of the type of risk they introduce into your organization, as well as the type of response that is required to mitigate said risk.

icon-vulnerability@2x

Software Vulnerability: A Flaw in the Code

A software vulnerability is akin to a flaw in code, much like a faulty lock on a door. Unlike malware, vulnerabilities are not intentional. Instead, they represent weaknesses in software components or projects.

Similar to how a faulty lock compromises the security of a building by allowing unauthorized access, a software vulnerability creates a gap in the software’s security perimeter. This gap becomes an entry point for intruders to exploit, gaining unapproved access to the system, application, or component.

icon-malicious@2x

Malware: Malicious Intent in Open Source

Malware, short for “malicious software,” poses a significant threat to open source software ecosystems. It encompasses a wide range of malicious programs, such as viruses, worms, trojans, ransomware, spyware, and adware, all designed to gain unauthorized access to information or systems. In the software supply chain, malware is most often passed off as legitimate open source components or introduced to previously legitimate projects via takeovers.

With its various forms, malware’s primary purpose is to steal data, install harmful software, gain control of a network, or compromise software or hardware. Threat actors employ diverse distribution methods, such as infected email attachments, malicious websites, or compromised software downloads.

Malware in the software supply chain is designed to target developer environments, like continuous integration systems and are commonly seen in ransomware attacks and sophisticated breaches. The only known cure is prevention and avoidance.

Vulnerabilities in the OSS Ecosystem

Security vulnerabilities are a fact of life — as technology evolves and ages, it also requires maintenance. New issues are discovered at a rate over time, and thus it’s important to acknowledge that vulnerabilities appear all the time. A good analogy is to think software components age like milk, not fine wine (or a new analogy you’ll see when we talk more about risk, it’s more like steel than aluminum) — they don’t get better with age. They might be good for a long time, but when a vulnerability is discovered, it’s akin to spoiled milk — something that needs to be discarded quickly.

On average, 13 Critical or Highly Severe security vulnerabilities are being discovered each year, per application.

The challenge, of course, is the scale of new security vulnerabilities being discovered in the different ecosystems, as well as the scale of issues being discovered in the software you manage.

Organizational Challenges

A few fundamental facts — last year we reported that the average Java application has about 150 open source components when counting both direct and transitive dependencies. On average, an application has 13 Critical or High severity security vulnerabilities being discovered each year. Depending on the size of the organization, the effort to remediate issues can vary wildly, from a few minutes to a few days, depending on the breaking changes needed to go from the current version to the non-vulnerable one.

Another challenge is the source of information about security vulnerabilities itself — in 2024, it has become evident that relying on free sources of information is almost considered neglectful for any organization not specializing in intelligence aggregation.

For example, the National Vulnerability Database, the canonical catalog of known security vulnerabilities via the Common Vulnerability Enumeration System (“CVE”), had an outage early 2024 that caused a massive backlog of vulnerabilities being published. At the time of writing, this backlog of published vulnerabilities sits at 17,656 unprocessed issues. This meant that in Q1 of this year, nearly no new security issues were made available to the community.

The volume of security vulnerabilities discovered is growing in linear ratio with the growth rate of open source being invented and published. This is to be expected and is uncomfortable news for organizations seeking to manage them.

NATIONAL VULNERABILITY DATABASE BACKLOG

17,656

The backlog of published but unprocessed vulnerabilities at the National Vulnerability Database, at the time of writing.

NVD Dashboard Screenshot

Open Source Malware & Next Gen Supply Chain Attacks are Now Commonplace, Dangerous Business

The growth of downloads hides a disturbing fact — the continued extreme growth of malware, protestware, and intentionally hidden vulnerabilities passed on to the users. These types of packages are published not due to carelessness, but with purely malicious intent. Using open source as a medium of transport for malware isn’t new. However, traditional scanning tools struggle to identify novel attacks, like we now see with malicious packages, otherwise known as open source malware. These tools, while effective on known malware, are incapable of finding malware that has not yet been identified. 

We have logged 704,102 malicious open source packages — meaning in the last year, we’ve seen the number of malicious packages grow by 156% YoY.

Some have noble intentions, such as packages that protest wars around the world, while some hide extremely sinister motivations, including serious malware families and ransomware gangs that sell off their victims to the highest bidder. Every single one of them targets an often undefended prey: developers and automated build environments.

A great example of a successful malicious campaign targeting developers is the Snowflake breach of 2024, where developers were specifically targeted with malware families that stole Snowflake authorization tokens. These were later used to breach over 160 organizations.

In our YOY monitoring, at the time of writing in August 2024, we have logged 704,102 malicious open source packages — meaning in the last year, we’ve seen the number of malicious packages grow by 156% YOY. More troublingly, we observe via an anonymous survey conducted on more than 100k repositories that over 50% of unprotected instances surveyed have already fallen victim and cached a piece of malware.

A sobering finding in this year’s data is that more than 400k new pieces of malware have been introduced to the public binary repositories, with 65K of them being CVSS >= 7 since November 2023. All of these represent yet another facet of Persistent Risk (read more about this in our Risk chapter), and bring a total data set of more than 700k identified, malicious open source components. 

Figure 2.6 Next Generation Software Supply Chain Attacks (2019-2024)

704,102

Malicious packages discovered

Malware Types

As with ‘traditional’ malware, malware disguised as open source comes in many guises and types. What is not traditional with open source malware is that it is executed entirely without developer interaction. Once the package is downloaded on the developers or build automation machine it is too late to avert disaster.

Figure 2.7 Malware Types Observed

Explore the Dangers of Open Source Malware

Learn how to be defensible against malware in your software supply chain. Explore our Open Source Malware and Vulnerabilities Resource Center.

Notable Malicious Packages

As we continue to document an overall rise in malicious attacks on open source ecosystems, the monitored 2023-2034 period has seen more professional criminal campaigns emerge. The software supply chain lends itself well to the cybercriminal ecosystem — either as an initial access vector to Initial Access brokers or even as a means of distributing initial access malware for Advanced Persistent Threat groups. Here are several examples we’ve seen this year:

A Timeline of Attacks

We have continued to curate a timeline of known malicious packages and malware campaigns. This interactive timeline summarizes notable supply chain incidents, next-gen attacks and other incidents propagated using the software supply chain.

Next: Evolution of Open Source Risk

See Next Chapter