Modernizing Open Source Dependency Management
In the last three State of the Software Supply Chain reports, we’ve delved into the intricate aspects of managing open source dependencies. We firmly believe that understanding the intricacies of both macro and micro decisions when selecting open source components fosters robust, streamlined, and secure software supply chains.
Open source consumers are not paying attention
This means, we once again saw that 96% of all known-vulnerable downloads were avoidable.
The size of the “consumer” problem is concerning
This year, we again analyzed how the world consumes open source from Maven Central across 400+ billion downloads over the year. We compared consumers downloading vulnerable dependencies without a fixed version to vulnerable dependencies where a fixed version was available but not chosen.
How common are vulnerable downloads?
From the average 37.8 billion monthly downloads from Maven Central, 3.97 billion vulnerable components were consumed.
How common are fixed vulnerabilities?
Consumers opted for these no-fix-available projects due to lack of alternatives. Interestingly, even though there are only a few vulnerable versions without alternative fixes, a significant portion of vulnerable versions with available solutions are still being downloaded.
FIGURE 3.1. CONSUMER VS MAINTAINER OF VULNERABLE DOWNLOADS
FIGURE 3.1. CONSUMER VS MAINTAINER OF VULNERABLE DOWNLOADS
This sentiment is echoed in the Consumption Manifesto recently published by the Open Source Security Foundation (OpenSSF). The manifesto emphasizes the need for organizations to shoulder the responsibility for the open source software they employ, their methods of consumption, and their strategies for handling the associated risks.
Why is making good choices so hard?
Developers face a multitude of challenges and responsibilities in their work, often leading to overwhelming and inefficient experiences when dealing with dependency management. This struggle has been humorously dubbed "Dependency Hell" in the development community. This bears repeating.
Organizations expect developers to make informed choices regarding open source components for their software projects. However, a significant portion (approximately 85%) of projects hosted on Maven Central are considered inactive, with fewer than 500 monthly downloads. This proliferation of inactive projects further complicates the already challenging task of selecting the most suitable components for a given project.
Consider for a moment the average Java application now boasts a whopping 148 dependencies, with around 10 releases occurring annually. Developers not only contend with the initial selection and management of around 150 dependencies but are also tasked with tracking an average of 1,500 dependency changes per year per application, possessing security and legal expertise to choose the safest versions, understanding ecosystem nuances, and sifting through thousands of projects to pick the best one. Now, imagine the scale of these decisions for enterprises with tens of thousands of developers and thousands of applications.
Developers not only contend with the initial selection and management of around 150 dependencies but are also tasked with:
Sifting through thousands of projects to pick the best ones
The additional workload described above is merely a fraction of what software developers already face in their day-to-day responsibilities. The immense pressure to meet industry demands for efficiency and speed often leads to inefficiencies and risks within enterprises. When developers are inundated with an overwhelming array of choices and limited resources, it not only hampers productivity but also jeopardizes the organization's success.
With all of this in mind, we also recognize that developers are making their best efforts given the circumstances. As we delve deeper into this issue, we encounter several key challenges:
Potential challenges impacting component choice
POPULARITY
When deciding which dependencies to use in a development project, popularity is often used as a proxy for quality (i.e., "everyone else is using it, so it must be safe, secure, and reliable"). Theoretically, this makes sense as more popular projects should be getting fixed faster. But they aren't. As revealed in our 2019 State of the Software Supply Chain report, the popularity of a dependency does not correlate with a faster median update time. Developers may feel safe in selecting more popular projects, but just because a dependency is popular, doesn't necessarily mean it's "better."
CLARITY
Oftentimes, developers aren't manually selecting individual versions when building software supply chains and those dependencies are already part of a project that's being used or built upon. As cited in the 2020 State of the Software Supply Chain report, 80-90% of modern applications consist of open source software. If an SBOM and proper DevSecOps practices are not implemented, developers and software engineering teams may have no way of knowing that those vulnerable components are being used, pulled, or built upon.
AUTOMATION
Although there are plenty of open source automation tools, very few have security capabilities built in. Similar to the Clarity issue above, this automation may mask potential vulnerable dependencies, enabling developers to unknowingly build upon projects with known vulnerabilities.
INACTIVE RELEASES
There are almost 500,000 projects within Maven Central, but only ~74,000 of those projects are actively used. That means 85% of projects are sitting in this repository and taking up space, potentially overwhelming developers with available options.
Modern dependency management practices
While not surprising, the problem of dependency management didn’t magically go away in the past year. So, as we continue on the quest to understand how we could potentially fix it, we honed in on the aspects that reflect modern dependency management or a potential answer to solving the 96% problem.
Our emphasis lies on understanding the nuances of behavior within this domain and how it impacts the way we work. This includes:
- Defining the optimal component. What actually makes a "good open source component"
- Dissecting the optimal time to upgrade an open source dependency
- Reflecting on current upgrade behavior
- Analyzing global patterns of download behavior
What makes an optimal open source component?
In the realm of dependency management, understanding the inherent risks and benefits of individual software component versions is paramount. To address this challenge, we conducted a comprehensive assessment of software artifacts and have developed a robust scoring system that evaluates components across five key dimensions.
Optimal open source component dimensions
1 - SECURITY
Uses a sophisticated method to gauge the "total risk" associated with a particular component version, considering all its vulnerabilities and common weaknesses. This scoring technique allows for a thorough comparison of different versions of the same component or across components by considering vulnerability counts, severities, and types.
2 - LICENSE
Based on License Threat Groups: Categorizes licenses into severity groups, allowing for informed decisions that align with your application's licensing requirements.
3 - AGE
Evaluates a component's age in relation to the latest version, emphasizing the benefits of staying current within the software ecosystem.
4 - POPULARITY
Analyzes download counts across repositories and other sources, enabling an understanding of component usage trends and popularity.
5 - RELEASE STABILITY
Quantifies the stability of a version by assessing its version label, factoring in development stages, such as pre-release, beta, etc, as well as double-publishing occurrences.
Understanding component upgrade urgency
To better understand this cost, we looked into the concept of "upgrade urgency" to shed light on the need for a data-driven solution. Our approach is based on a component scoring algorithm, which not only scores a component but also categorizes its versions into distinct zones. These zones range from the best/optimal versions to reactive (worst) versions.
Proactively managing dependencies is of utmost importance because, as the saying goes, software ages like milk not wine. Remaining in a reactive state is not only suboptimal from a security perspective, but it also puts development teams at an immediate disadvantage and penalizes them when issues arise. Analyzing the observed patterns reveals that teams that proactively make upgrade decisions are in a significantly better position to make the most informed choices.
FIGURE 3.2. UPGRADE URGENCY
- 0 - Optimal version(s): Component scores in the top 10% of the best version’s score
- 1 - Proactive: Scores between the top 10% and 1 standard deviation below the max
- 2 - Borderline proactive: Scores 1 standard deviation below the max but less than 2 standard deviations
- 3 - Reactive: Scores 2 or more standard deviations below the best version for that component
We will apply these algorithms to explore download patterns across different urgency zones, providing valuable insights into how software development industry professionals are making their component selections and upgrade decisions.
Downloads by upgrade urgency
As you know, Maven Central is the de facto repository for Java-based open source libraries. It acts as a centralized hub, enabling developers to effortlessly discover, access, and integrate dependencies into their projects. Its widespread adoption and comprehensive collection of artifacts have made it an essential resource for Java developers worldwide.In our analysis, we used Maven Central to examine the download patterns of component versions. We looked at a typical month of data and determined the upgrade urgency of each download. In a perfect world, developers would only be downloading optimal versions of their dependencies in order to minimize risk and future upgrade effort.
Figure 3.3. VERSION DOWNLOAD BY UPGRADE URGENCY FROM MAVEN CENTRAL
Upgrade urgency | Downloads by urgency | Percentage urgency |
---|---|---|
0 - Optimal (best) Version | 18,055,476,664 | 80.6% |
1 - Proactive | 602,398,633 | 2.7% |
2 - Borderline | 2,604,054,004 | 11.6% |
3 - Reactive | 1,128,938,205 | 5% |
downloaded an Optimal (best) version
18,055,476,664 downloads
downloaded a Proactive version 602,398,633 downloads
downloaded a Borderline version
2,604,054,004 downloads
downloaded a Reactive version
1,128,938,205 downloads
With 2.6 billion downloads being classified as “borderline” and a further 1.1 billion falling into the worst (reactive) category, there is some cause for concern. After all, we live in a world where it only takes just one vulnerability to "set the internet on fire."
A ton of wasted time
In prior years, we reported on how efficient timing for upgrades can yield significant cost savings. Opting for safer versions and prolonging their use can significantly reduce upgrade expenses. In a medium-sized enterprise with 20 development teams, the potential gain equates to two extra development weeks per application each year.
FIGURE 3.4. TIME SAVED WITH OPTIMAL UPGRADE DECISIONS
FIGURE 3.4. TIME SAVED WITH OPTIMAL UPGRADE DECISIONS
In answering this question, we found that when teams use better security data that reduce false positive findings by 25% in combination with making optimal upgrade decisions, each team saved a total 1.5 months of time per application, per year. This equates to a 2X boost in time saved over just the optimal upgrade process we described in our 8th annual State of the Software Supply Chain report last year. In other words, if you improve the security data you use and reduce false positives in tandem with upgrade efficiency, you double your gain.
In past years, we’ve looked at "optimal" as being a single component or a single version. There was a right and wrong version. This year, we’ve established that optimal is really a range. There is still a “right” or a “wrong” but on a spectrum. To further this, we examined the Maven Central Repository to quantify wasted developer time. In this scenario, we define wasted time as the download of an optimal version of a component when another optimal version of that same component has already been downloaded by that user instance. To put this into perspective, let’s again go back to our car analogy. Would you buy a car every time a new model year comes out? The optimal buying period might be when the model refreshes say every five years, rather than incremental changes each year. In our analysis, we consistently observed this phenomenon, which is all too frequent and raises concerns. The gains of such frequent upgrades are minimal, and from a lead/management perspective prioritization can also be optimized/aligned based on things that are in the optimal range versus just getting the new component.
However, we also noted great variability in this practice. To highlight this drawback we call attention to anonymized data from several companies in two different industries. We analyzed two organizations in the energy industry with a similar number of Maven Central downloads. Out of each company's total downloads, we found there was approximately 1.8 times more wasted downloads from one organization compared to the other.
The value of an artifact repository manager
In the scenario above, we know much of this wasted time could be solved by simply using a good repository manager, like Sonatype Nexus Repository Manager. A centralized, binary repository allows you to proxy, collect, and manage your dependencies so that you are not constantly juggling all of these various versions. Or, in this case, wasting time in downloading optimal versions, when another optimal version is already in your instance. It becomes the single source of truth for an organization’s software components and applications.
Global patterns of open source consumption behavior
In today's interconnected global landscape, various regional groupings and trade agreements play a pivotal role in shaping international relations, trade dynamics, and economic cooperation. We explored whether these groupings also inform software development in the form of component consumption patterns.With the component score, we can consider the quality of the component. With upgrade urgency, we can assess how good the downloaded version is. Bringing it all together, we examined global download patterns. We gave equal weight to component scoring and upgrade urgency. We gave a little additional weight to the sheer quantity of downloads, because picking the right components and staying on the best versions is surely harder at scale. Below are some observed global trends.
Global Patterns: G7 Analysis
Global Patterns: Regional Group Analysis
Analysis: ASEAN
ASEAN (Association of Southeast Asian Nations) is a regional intergovernmental organization comprising ten member states in Southeast Asia. On average, the ASEAN grouping was the highest scoring group in regard to consumption behavior. The member countries of ASEAN are Brunei, Cambodia, Indonesia, Laos, Malaysia, Myanmar, the Philippines, Singapore, Thailand, and Vietnam. Among them are technological hubs such as Singapore and Malaysia as well as many nations undergoing a digital transformation with heavy investment into IT infrastructure and digital innovation.
Analysis: USMCA and the EU
Tied for a close second were the economic powerhouses of North America through the North American Free Trade Agreement (NAFTA), which has evolved into the United States-Mexico-Canada Agreement (USMCA), and the European Union (EU), the collaborative bloc of European nations. The USMCA trio had the US rank on top, followed by Mexico, and finally Canada. Estonia ranked among the very top of the EU. Unsurprisingly so, given its innovative approaches to e-government services, world recognition for a digital society and robust digital infrastructure. Germany is a prominent software developer hot spot in the EU, and closely followed Estonia in its component consumption score. The country's robust economy, strong engineering tradition, and technological infrastructure make it an attractive destination for software professionals. Cities like Berlin, Munich, and Hamburg host thriving tech ecosystems, offering a blend of innovative startups, established tech companies, and research institutions. In contrast, while the difference was not vast, it was unexpected to observe Sweden and the Netherlands placed near the lower end of EU member states, given their well-regarded technology sectors and developer practices.
Analysis: Latin America and the Carribbean
Next in the rankings were the Latin America and the Caribbean nations. Among the countries with the highest scoring downloads we saw Nicaragua and Bolivia. On the other hand, Panama, and Guatemala were ranked towards the end of the list. Despite the sweeping digitization trends happening across this economic grouping, some of the analysis was nevertheless skewed by the significantly lower download counts across many of the member countries.
Analysis: BRICS nations
Finally we examined the BRICS nations – Brazil, Russia, India, China, and South Africa. This examination unveiled that these regions scored comparatively lower in the quality of OSS consumption compared to the other geographical groupings. In conclusion, while each region possesses unique attributes that shape their component consumption habits, it's crucial to recognize that the rankings presented are a mere snapshot of their software landscape. They are not intended to comprehensively encapsulate the full scope of any individual country's software development maturity. As these regions navigate their own technological journeys, they are poised to further engage with open source initiatives, contributing to the global software community's growth and innovation.