Whenever I speak to someone doing Java development, I always ask if they are using a repository manager. Repository managers are still an emerging technology, but I've noticed a consistent trend: more and more developers view a repository manager as an essential part of development infrastructure. This certainly wasn't the case just two years ago, and I think that the big motivator behind this trend is that the quality and stability of Maven Central has improved remarkably because of the efforts of people like Brian Fox and others who are focused on making the service more stable.
Another reason why we've seen more adoption is that most developers understand the benefits of using a tool like Maven for automatic dependency management. In 2005, it was common to see projects store binary JARs alongside source code in projects. In 2010, you rely on the repository and the metadata it contains. If you use a library like Guice, you'll add a dependency on the artifact and let your build tool take care of the details. To do otherwise would be to commit yourself to a manual work updating JARs and testing dependencies each time a new version of an external library is released.
Despite the increasing prevalence of repository managers, I still stumble upon workgroups and organizations that haven't heard of repository management. When you ask if they are using a repository manager, they might think you are referring to Subversion or source control. This series of posts is a high-level overview of the main benefits of repository management. If you are trying to convince someone to start using a repository manager, the next few blog posts are for you.
Repository Management: The Big Picture
Compare the diagram shown above with the diagram shown below. In the next few posts, I am going to emphasize the specific benefits of using a repository manager. Specifically, I'm going to talk about:
- How a repository manager changes the development cycle
- How continuous integration is used to continuously publish internal build artifacts
- How a repository manager simplifies the process of building and deploying systems to production
- How a repository manager can act as a gateway between vendors and external partners
When you don't use a Repository Manager
Before I get started on the benefits of repository management, I want to talk about the realities you face when you don't use a repository manager. Here are some common anti-patterns when you don't use a repository manager:
- All of your developers download artifacts directly from public repositories. A new developer starts on a Monday. That developer will spend an hour downloading a massive library of dependencies from Maven Central. Worse, if Maven Central happens to be down that day, they will be out of luck entirely.
- Proprietary or Vendors libraries are passed around, from developer to developer. If you don't use a repository manager, how do you distribute the Oracle JDBC driver? Maybe you place it in a shared file system and tell people to download it and install it in ~/.m2/repository. More likely, developers just pass this JAR around as an email attachment with some ad-hoc instructions.
- JARs are checked into source control. If you don't use a tool like Maven, which knows how to download artifacts from a remote repository, you might be following the very common pattern of checking binary dependencies and libraries into source control. I've seen many instances of companies creating ad-hoc JAR repositories and checking these repositories into source control, only to version and branch these static binary files with every release.
- The source control repository is used to store everything from source code to binary builds. Because there is no repository designed to store binaries, developers start to use tools like Subversion to keep track of binaries. As time passes, the Subversion repository becomes an ad-hoc file system for files that have no business in an SCM.
- The continuous integration server depends on public repositories. When you change your build or add a new dependency, your CI system downloads dependencies from the public repo. It depends on the availability of this public resource to run builds.
- Production deployments have to run the entire build, from start to finish, to generate binaries for deployment. When a build is tested and then ultimately pushed to production, the build and deployment scripts checkout source code, run the build, and deploy the resulting binaries to production systems.
- Sharing source code with external partners means granting them access to your SCM. Since there is no established mechanism for publishing source or binary artifacts, the only way to share code with partners is to either send an archive of your source, or provide them with direct access to your SCM.
The general theme in all of these anti-patterns is that either your systems depend on public resources, or they all depend on the SCM system as a central collaboration point. In the next few posts, I'm going to detail how using a repository manager provides a solution for each of these issues. I'll go into why each of these anti-patterns is a bad idea, and how you can use Maven, Nexus, and Hudson together to solve these problems and create a more efficient software development effort.
Stay tuned for the next post: Caching and Collaborating.