More entries in this series: Part 1, Part 3
Part 2: Evolution of attacks on the software supply chain
Welcome back to Sonatype's series on the shifting landscape of open source supply chain attacks.
In the previous installment, we argued that software's reliance on open source parts introduced a new supply chain – the software supply chain. We also discussed the need to build upon the lessons (still being) learned to apply the best practices to software supply chain management.
In part 2, we'll look at how bad actors (typically hackers) are attacking the software supply chain and how those attacks are evolving.
Sonatype involvement since the inception of open source security gives us a unique perspective. As we see it, there are three distinct phases of the evolution of vulnerabilities and attacks on the software supply chain.
Phase 1: The zero-day
The source of a zero-day vulnerability is typically not malicious. When a bug exists in code for a while, someone often figures out how to exploit it. Then it's a collective race to patch before the bad actors can find and exploit your software.
Apache Struts
For open source, at least before Log4Shell, the most memorable zero-day attack was in 2013 with the Java framework Apache Struts. By almost any standard, it was as bad as it can get in terms of ease of use. The vulnerability existed within the Struts framework itself, so it didn't matter what code you had; it could be exploited.
You might recall this was when Anonymous was making its name. With financial firms representing many of the attack targets at this time, even the FBI began alerting field agents about potential attacks.
Heartbleed and Shellshock
The following year brought us Heartbleed and Shellshock.
2014's Heart Bleed was a vulnerability in OpenSSL, which encrypts the bulk of HTTPS/SSL traffic across the internet. In the case of Shell Shock, the vulnerability was within Bash and had existed for 30 years.
Because of these attacks, the world began to lean in collectively. Security teams, no matter what type of organization, started to understand that attention was needed to vulnerabilities in open source and their widespread impact.
Apache Commons-Collections
As you might guess, 2015 provided no respite from zero-day attacks. This time, a high-profile vulnerability in commons-collections, another Java framework from Apache, was disclosed. But as the saying goes, "it only matters to people when someone dies."
That almost became an all too true reality when nearly a year after the commons-collection vulnerability, Hollywood Presbyterian Medical Center was breached through ransomware.
According to the LA Times, when the attackers took down the systems, hospital staff were forced to "... return to pen and paper for its record-keeping." And without intervention from Security Teams and the FBI, the only option to restore systems was paying the demanded $17,000 in bitcoin.
To understand the impact of an event like this, consider what happens to hospitals in major metropolitan areas where a major public event occurs, like Boston or New York on Marathon Day. According to Reuters, "A new study of cities hosting the largest U.S. marathons has found that the odds of dying if you have a heart attack or cardiac arrest jump 13 percent the day the race is run."
Imagine the consequences of a whole hospital outside a major metropolitan area being taken offline for a week. And what about the errors that will arise when a hospital must use analog systems like pen and paper over more modern ones with safety features and checks?
Don't dawdle on implementing better software security
On one side, this looks like a ploy of fear, uncertainty, and doubt. And it might have been. Unfortunately, in at least two cases, one in Germany and another in the US, deaths have now been attributed to malware attacks that target vulnerable and exploitable software. The hard reality is that software security may be the only thing preventing someone's death if we don't get this right as an industry.
Sadly, things aren't improving. And that collective leaning in industries did a few years ago hasn't seemed to impact organizational actions. If things were getting better, there would be less fear, uncertainty, or doubt.
But even in these Phase 1-style of attacks, we see no slowdown. We only have to look to last year and Log4Shell to see things are only getting worse.
The image below shows the download statistics during the early days of awareness. As expected, there is a significant spike on the left-hand side as people quickly adopted the versions. The good news is we got to about 40-50% of the consumption of the patched, non-vulnerable versions. But then it flatlines.
The following image, taken in May 2022, shows 38% of the worldwide consumption was still of the vulnerable versions. And as of late 2022, this hasn't changed much. At least not enough to declare victory. About 25-30% of the world is still consuming the vulnerable versions of Log4j.
It's worth repeating. A year after Log4Shell, many organizations still aren't figuring out how to fix this issue. The biggest problem is that many remain entirely unaware and continue downloading a known vulnerability into their codebase.
Not only is this frustrating, but it's also frightening. We understand the potential impact and the actual risk to lives. We also know that technology, processes, and best practices exist that can prevent these attacks. It's hard not to be frustrated, especially when we're only at Phase 1.
Phase 2: The software supply chain
In 2017, a researcher discovered "52% of all JavaScript npm packages could have been hacked via weak credentials." This potential for attack included at least 14% with passwords set as "password" or "123456" for a project. In some cases, the password was even checked into source control.
This event marked the unfortunate first signal that attackers could see success by shifting their focus to the supply chain. A few weeks later, we were officially in the second phase, with confirmation of the first typo squatting attacks in npm and Python.
What was interesting about these two attacks is that they were not doing what you expected. They were not trying to steal credit card data or even trying to mine crypto. Instead, they were stealing the credentials of open source maintainers. They were after the ability to publish components into the supply chain. Attacks had suddenly become much more sophisticated and much harder to defend.
It's clear why they focused on this attack vector. When combined with the high ROI of attacking the supply chain, there is a force multiplier, allowing attackers to reap the benefits of the millions of users ultimately consuming these components.
To demonstrate this, in 2017, a study regarding the most influential 100 maintainers came to the same conclusion. If you get to 15 or 20 of the right maintainers, you can gain access to nearly 50% of the components in npm. Inversely, if you can find a way to do something with one of the most popular packages in npm, you can impact millions of downloads.
Over the next four years, attacks of this type continued to increase and have to slow down. According to the 2021 State of the Software Supply Chain report, at least seventeen major attacks were utilizing this approach. Putting all of this together, we see an average year-over-year growth rate of 742% for attacks on the software supply chain.
And although we've collectively been raising the alarm every week since 2017, it is pretty clear that the attacks on the supply chain have evolved to become much more sophisticated.
Phase 3: The corporate developer
When it comes to the sophistication of an attack, look no further than Phase 3. And based on our analysis, we are still in the early days of this phase as it continues to expand.
One of the first identified attacks on development infrastructure was against an unpatched Jenkins instance in 2018. Attackers used it to mint three and a half million dollars of Bitcoin. In 2021, this happened again through a continuous integration (CI) server utilizing a known vulnerability with Confluence.
To find one of the most impressive attacks on a Jenkins server, we need to look no further than the Verkada Data Breach. The attack exposed feeds of 150,000 security cameras, including hospitals, schools, police stations, and at one point, a Tesla plant.
This type of attack is not isolated to CI servers, though. In 2021, Codecov, a tool that provides code coverage scanning, found an exploit that resulted in an attack on "hundreds of networks."
What made the Codecov attack worse than most wasn't just that it went undetected for nearly two months, it was the scope of how integrated this tool was into the software supply chains of effectively all of their customers.
As Sonatype reported, "... an error in Codecov's Docker image creation process enabled the actors to extract sensitive credentials and modify the Bash Uploader script." As is typical, development infrastructure tends to have the keys to the kingdom, making this the perfect target for attackers. While there's way more here than we can cover in a blog post, the theme is consistent across all of these attacks.
As you can see from the image below and what we've covered above, attackers aren't only stealing credentials and passwords. In some cases, they're stealing money, dropping backdoors, and tampering with the tools through virus-like approaches.
But just when it seems like you've seen everything, 2021 brought something entirely new. Dependency confusion became yet another step in the evolution of attacks.
From our research, the origin of dependency confusion arose when an attacker figured out the name of a package a company was using internally. With the name of an internal package, the attacker could go to one of the public repositories and publish an identical package with a high version number. Given the automation in many developer tooling options, this hacked version is pulled from the repository. And with that, the attacker can access just about anything they want.
The interesting thing about dependency confusion is that we saw signs of this during the lead-up to this disclosure. Utilizing some of the advanced tooling options that Sonatype has built to detect real-time malware, our systems were already picking up and isolating some of the components that the white-hat researcher was distributing. We even talked to the researcher to better understand what they were doing.
We became more concerned about this as we began considering the overall potential to impact the software supply chain. Time and time again, attackers are always close behind the research or disclosure. In some cases, they may already be exploiting the vulnerability.
Consider the timeline in the image below. In February 2021, researchers disclosed dependency confusion. Within a week, the number of exploits – essentially copycats – exploded by about 7,000% over the baseline.
Most of these were other researchers after bug bounties. However, we found attackers amongst the noise. Almost immediately, we could see the prototype research code exploited in actual attacks. And that continued to explode within the next four weeks to the point where we saw over 10,000 instances.
This type of attack follows the same goals as those before it. From trying to fetch bash history to shadow password files and even dropping backdoors, there was now a new front to defend.
Pulling back to consider the overall evolution, we first discussed attacks on the developer infrastructure in 2017. At that time, we could count the number of attacks on one hand. At the end of that first set of years, we had seen about 216 that we could name.
However, the following year, we saw an explosion of 430% up to almost 1200. The year after that, we were closing in on 12,000. This represented another 650% year-over-year increase. And in mid-2022, Sonatype proprietary systems detected over 95,000 malicious packages.
Cybercriminals are after more than you might think
Each point we've discussed represents the potential to get into the development infrastructure, drop back doors, steal data, steal credentials, and anything of value your organization may have.
Even when human lives are on the line which became a concern during Phase 2, we haven't seen the collective leaning in that one might expect. If we had, organizations would set up their infrastructure to defend against these attacks.
It's important to remember that attack methods are constantly evolving and that constant vigilance is needed to stay on top of shifting approaches. In the next installment, we'll discuss where responsibility lies and actions you can begin to take today. Spoiler alert: it's not just printing out a list of components in passing it off to customers.
Editor's note: This article was co-written by Jeff Wayman. Connect with him on LinkedIn.
Written by Brian Fox
Brian Fox is a software developer, innovator and entrepreneur. He is an active contributor within the open source development community, most prominently as a member of the Apache Software Foundation and former Chair of the Apache Maven project. As the CTO and co-founder of Sonatype, he is focused on building a platform for developers and DevOps professionals to build high-quality, secure applications with open source components.
Explore All Posts by Brian Fox