Sonatype's automated malware detection platform Sonatype Repository Firewall has flagged multiple dependency confusion packages on the PyPI registry today, all uploaded by the same user.
On January 23, 2022, PyPI user arturlebedev began flooding the PyPI registry with 1,275 packages, as observed by Sonatype:
The list of these packages is too large to include in a single article but, of note are some interesting package names, that appear to target popular open source projects and companies:
- Sagepay, imitates the well-known British payment processor.
- Apple-py-music, named after a Python-based Apple Music project.
- Google themed projects such as xgoogle-cloud-storage, xgoogle-cloud-core, and others.
- AadhaarCrypt, named after a legitimate API project that "[lets] users store Aadhaar Card information online in [a] secure way." Note, Aadhar itself refers to India's biometric national identification system.
- Xsetuptools has an identical name to Python's setuptools packaging library
- Openbabel-python, an expert system used by chemists and scientists.
- OpenRobotics, named after the Mountain View-based nonprofit.
- Xpip, an alias used within the Xon.sh project.
- Airflow-*: imitates Apache Airflow-related project plugins and scripts.
- Xcryptography: an apparent transitive dependency used by some projects.
- librat, as referenced in many OSS projects.
All of the 1,275 were reported by Sonatype today to PyPI admins who removed these packages within an hour of our report.
What's inside the packages?
Nearly all of the 1,200+ packages are identical in structure. Given their huge volume, internal contents, and the fact all of these were packages dumped on PyPI the same day, we believe these components were automatically generated via a script, after parsing names of well-known companies and existing open source projects.
The PyPI homepages for these packages seen by Sonatype were typically blank with no description of the package. Interestingly, we also did not see an ethical hacking disclaimer either on the page or within any of the packages — a message asserting "these were created for research purposes."
Inside the package, the setup.py file is simplistic and contains the name and version information associated with the package:
The "__init__.py" file is what is responsible for exfiltrating fingerprinting information from the system where one of these components are installed:
Upon installation, the package collects the system's username, computer's name, IP address, and attempts to upload this information both via HTTP and DNS to the following domains:
For DNS: .sub.deliverycontent[.]online
For HTTP: www.deliverycontent[.]online
Therefore, we believe, like many dependency confusion copycats caught in the past, these 1200+ are also proof-of-concept (PoC) candidates, aimed at checking if any of the relatively known organizations and projects still suffer from the weakness — although an explicit text or memo within the packages somewhere, attesting to the ethical nature of the research would have helped.
Dependency confusion: Year in review
Dependency or namespace confusion technique gained mainstream popularity in 2021 when researcher Alex Birsan used it to successfully hack over 35 big tech firms and walk away with over $130,000 in bug bounties last year. Within hours of the news story hitting the wire, Sonatype's automated malware detection systems caught hundreds of copycat packages by third-party bug bounty hunters — who now wanted to imitate Birsan's trick in a quest to also win bug bounties. And, the saga hasn't stopped since then.
What may have started out as a research project by a bug bounty hunter was soon abused by threat actors, now looking to target popular companies to exfiltrate sensitive files, such as .bash_history and /etc/shadow, as previously discovered by Sonatype.
The next in line was open source "vigilantes," hijacking the technique for their cause.
Last year, a pseudonymous user RemindSupplyChainRisks polluted PyPI and npm registries with over 5,000 packages in an attempt to educate the wider community of security threats to open source repositories.
Since then, this week's incident marks the second time when PyPI registry has been flooded with a noticeably large number of suspicious components, all published in a single day. In fact, 60% of all packages published to PyPI on January 23, 2022 were the 1,275 packages from arturlebedev alone.
Since the inception of our automated malware detection systems, Sonatype has thus far caught upwards of 40,000 packages on npm and PyPI that were either suspicious, malicious, or dependency confusion research packages.
Users of Sonatype Repository Firewall can rest easy knowing that such PoC candidates would automatically be blocked from reaching their development builds.
Sonatype Repository Firewall instances will automatically quarantine any suspicious components detected by our automated malware detection systems while a manual review by a researcher is in the works, thereby keeping your software supply chain protected from the start.
Users of Sonatype Nexus Repository can additionally download Sonatype's "dependency/namespace confusion checker" script from GitHub to check if they have artifacts with the same name between repositories and to determine if they have been impacted by a dependency confusion attack in the past.
Written by Ax Sharma
Ax is a Staff Security Researcher & Malware Analyst at Sonatype with a penchant for open source software. His works and expert analyses have frequently been featured by leading media outlets including the BBC. Ax's expertise lies in security vulnerability research, reverse engineering, and cybercrime investigations. He has a passion for educating a wide range of audiences through writing and vlogs.
Explore All Posts by Ax Sharma