News and Notes from the Makers of Nexus | Sonatype Blog

Software packages, do we even need them?

Written by Michael Prescott | May 01, 2023

Apple's Swift software development ecosystem is undergoing a big change, restructuring how Swift code packages are created and distributed. This move echoes the same change in the Go ecosystem a few years earlier, and reveals an essential design trade-off at the heart of package management: Do we even need packages at all?

What is a package manager?

To answer this, let's look at a question that every major software ecosystem has to answer: How do we share and reuse software?

In the dark ages of open source, using third-party packages required fishing around on the internet for download sites. This could mean wasting time, using old versions, or borrowing from dead projects. Thankfully, this landscape has changed completely.

Today, developers share packages within the walls of their organizations through private repositories. Massive public repositories like Maven Central or the Python Package Index (PyPI) let developers share their open source software contributions with developers around the world.

What makes all this possible is the humble package manager, a type of utility that lets developers bundle up, describe, publish, and consume software declaratively. Package managers like Maven for Java or npm for JavaScript remove the pain of finding, downloading, and installing packages.

Every package gets a unique name (the so-called "package coordinates"), allowing developers to list which ones they need while the package manager automatically finds and downloads them. By freeing developers from worrying about where to get a package (and all its dependencies), we can now simply declare what we want and get on with the coding.

DevOps teams also benefit from package managers

By caching third-party packages where they enter the organization, DevOps teams can provide their developers with fast, repeatable access to every dependency that an application requires.

In an age of rapidly increasing software supply chain attacks, it's important to have a single point where open source dependencies enter the organization. This enables increasingly important monitoring and automated policy enforcement.

No two package managers are alike

Despite their similarities, no two package managers are the same. A healthy spirit of experimentation and innovation has produced a bevy of useful improvements.

For example, many package managers now have a built-in search mechanism (Maven being a notable exception). This makes it possible to find and download packages from the command-line interface (CLI) or standardized REST endpoints.

Meanwhile, Docker's protocol for downloading and publishing binary containers includes built-in support for de-duplicating the layers shared between many images. Without this, copies of base images would crush repositories with terabytes of redundant data.

However, one trend that has proven less promising is getting dependencies straight from the git source repositories where they are written.

Packages fresh from the oven

Building a central package repository like Maven Central is an expensive and time-consuming proposition. The widespread availability of public source control systems like GitHub offers an appealing alternative.

After all, why not skip having a centralized repository completely?

Originally, Swift and Go both embraced this idea in the design of their package managers. Rather than requiring a central repository for package binaries, they could resolve package coordinates as tags in any public git source repository.

This has a few advantages:

  • Nobody has to pay the cost in effort and time to build a centralized binary repository

  • Developers can push software into the ecosystem as fast as they can commit it

This is a tempting accelerator for a new ecosystem where a critical mass of reusable packages is so important for adoption in the development mainstream.

But if that's the case, why has the Swift community proposed an alternative approach?

Package management in the enterprise

In a mature, widely used ecosystem, new challenges that favor binary packages over git-based distribution become apparent.

  • Efficiency - Pulling from git means taking the entire history of the source code. Most projects need only the very latest data, usually a tiny percentage of the total project data. At scale, this adds up to real costs in time and bandwidth.

  • Insulation - If a package developer decides to throw in the towel and close their project one day, that repository is gone and no longer accessible, even if it was a valuable resource to the community. There's just no way to source even the old versions.

  • Immutability - Repeatable builds rely on package immutability. Since git tags can be changed, package developers can modify or re-tag existing versions. For a large application with many dependencies, this can add significant unplanned work to any change.

  • Traceability - When an application's dependencies are mutable, it's impossible to say exactly what packages were used to build it. This lack of traceability makes it impossible to comply with new regulations like the software bill of materials mandate.

  • Security - git repositories can and have been infiltrated by bad actors. We've seen old contributor accounts taken over, bringing projects back to life in the hands of malware developers. When downstream consumers pull straight from git, even old versions can be changed or replaced with new malware masquerading as an old, stable version.

Immutable means reliable

Switching to immutable binary packages addresses these risks. Since they can't be changed, they can be cached in a local repository manager for fast, repeatable builds that let developers work quickly. Teams are insulated from disruptions in the supply chain of dependencies because whatever versions in use can be readily sourced from private caches or the centralized repository.

When every package is uniquely identified, reliable software composition analysis becomes possible. Teams can easily comply with requirements for a software bill of materials, and when advisories or recalls are necessary, identifying which application versions are affected is a simple process.

Finally, malicious open source attacks on developers and development infrastructure can be stopped at the repository, before they even enter the software development life cycle.

The future is binary

Using binary packages for software distribution has important benefits for a mature software ecosystem, across all stages of the software development life cycle. As more ecosystems take root and grow, I hope we’ll see even more bold innovations while fully utilizing the hard-won lessons from existing package managers.

* * *

Want to talk shop about package management, using DevOps at scale, or anything open source? Come join us over on the Sonatype Community.

Sonatype runs anywhere — self-hosted, on the cloud, or air-gapped. Sonatype's cloud offers can be found and are hosted on AWS.