Public Service Announcement: Your build is leaking (and how to stop it)
By Tim OBrien
4 minute read time
Use Maven. Gradle, or Ivy? or any other tool that depends on a remote repository? (Which is just about every build tool these days.) If you do, there's a good chance that your builds are constantly leaking information about your projects, and if you don't take some simple measure to protect yourself external actors can learn a lot about your internal projects.
The Department of Super Secret Projects (The SSP)
To illustrate this problem, let's take a fictional government agency, the Department of Super Secret Projects, the SSP. Now everything the SSP does is, by definition, super secret. From the super secret propulsion system for a new sub to the super secret space-based laser project, this department is working on the kinds of projects that only a handful of people are working on and even fewer people are aware of.
Now, imagine that the SSP uses Maven (or Gradle or Ivy) to build this project. The project is a large multi-module Maven project with a groupId of "gov.ssp". They integrate a few open source projects, and they've even gone to great lengths to make sure that every open source library being used has been checked out. They are focused on making sure that everything that comes into the organization is not only secure, but super secure. Everything's locked down, or so they think.
Maybe they run a repository manager, maybe they don't, but one thing is certain. They haven't done anything to configure artifact resolution. Whenever someone runs a build on a clean machine, the build tries to resolve internal artifacts against remote repositories. In other words, when you run the build for "gov.ssp.spacelaser:spacelaser-control" and it tries to resolve "gov.ssp.spacelaser:spacelaser-target-subsystem" there's a chance that this artifact request ends up going to a remote repository.
"Cool, looks like the SSP is working on a Space Laser Project"
Let's say you are the administrator of a remote repository. Maybe you have an open source project that has setup a remote repository instance at http://www.example.org/maven2/. Someone at the SSP wanted to integrate one of your SNAPSHOT builds into a development version so that added your repository into a build.
Because the build at the SSP doesn't do anything to control artifact resolution, you'll see requests in your access logs for artifacts with the groupId: "gov.ssp.spacelaser:spacelaser-target-subsystem". But, really, it's not just that one artifactId, you end up seeing a collection of artifact coordinates for almost every project at the SSP.
You see the problem here, right? If you do, you should also understand that without the proper configuration this is probably happening to you right now. If you have large multi-module builds with internal dependencies, if you use remote repositories without controlling artifact resolution you could be leaking information. Whether you work for a government, a large financial institution, a university, or an organization that deals with sensitive data, you could be unwittingly sharing internal project information with random strangers.
Now that you understand the problem, let's fix it.
Step #1: Use Nexus
The first step to protecting yourself against this problem is to install a repository manager. When you hit a remote repository directly you have no control over how artifacts are resolved. Without a repository manager, if an artifact isn't in the local repository, a tool like Maven will interrogate a list of remote repositories.
Going back to our fictional government project, someone building the "gov.ssp.spacelaser:spacelaser-control" project tries to run a build a required, internal dependency isn't present in the local repository and a request is made to a remote repository. Information leaked. With Nexus, that request is going to a local server instead of a remote repository, this is the first step to addressing the leak.
Step #2: Use a Repository Group (and think about the order)
Next, your developers should be hitting a repository group. A repository group groups several repositories into one URL and it contains references to other repositories. When Nexus receives a request for an artifact, it will iterate through this list and attempt to resolve the artifact against each repository.
The issue here is that, if remote repository is listed first, Nexus will attempt to resolve the internal "gov.ssp.spacelaser:spacelaser-control" artfiact against that remote repository. As a first line of defense, make sure that your internal hosted repositories are listed first. This will prevent requests for these artifacts from cascading down to the remote repositories.
But, there's one more step to make certain that your organization isn't leaking information....
Step #3: Define a Route for Internal Artifacts
The previous step of putting your hosted repositories in front of your remote repositories will go along way to making sure that internal artifact coordinates don't end up in a remote repository's access logs, but there's still a chance that missing or unresolved artifacts can endup making it out to a remote repository. If someone attempts to resolve an artifact that hasn't been published yet, or if someone adds a typo to a dependency, information can still leak out to remote repositories.
Repository Routes can be used to prevent any internal artifacts from showing up on a remote repository. SSP should create an inclusive route for ".*/gov/ssp/.*" and this route should be attached to internal hosted repositories.
Do this, and there's no risk that a request to Nexus will result in your organization leaking information to a remote repository.
Here's what our documentation states: "The first route is an inclusive route, it is provided as an example of a custom route an organization might use to make sure that internally generated artifacts are resolved from the Releases and Snapshots repositories. If your organization’s group IDs all start with com.somecompany, and if you deploy internally generated artifacts to the Releases and Snapshots repositories, this Route will make sure that Nexus doesn’t waste time trying to resolve these artifacts from public Maven repositories like the Maven Central Repository or the Apache Snapshots repository."
Conclusion: Everyone should do this...
It doesn't really matter if you are working on a sensitive project at some large bank or government agency, this should be something that everyone does in the course of setting up a Nexus instance. Make sure that you've defined at least one route for your internal artifacts and make sure you run a few tests to make sure that you are not sharing more than you intended.
Written by Tim OBrien
Tim is a Software Architect with experience in all aspects of software development from project inception to developing scaleable production architectures for large-scale systems during critical, high-risk events such as Black Friday. He has helped many organizations ranging from small startups to Fortune 100 companies take a more strategic approach to adopting and evaluating technology and managing the risks associated with change.
Explore All Posts by Tim OBrien