In the previous post in this series I discussed three compelling ways in which a repository manager can benefit the development cycle. It proxies artifacts locally, it is optimized to store binary artifacts, and it facilitates a new level of collaboration and agility that isn't possible when your SCM is only way for workgroups to collaborate. In this post, I'm going to talk about how a repository manager works in concert with a continuous integration server like Hudson or Bamboo.
First, the how, what, and when of a continuous integration server. Continuous integration (CI) servers are an established fact of of modern development infrastructure. It is a server which, for the most part, waits and watches. It keeps a vigilant eye on your source control system and jumps into action every time it sees a code change. When code changes, your CI system is usually configured to run the entire build, execute all of your unit and integration tests, and send out an email to every developer if it identifies a defect or a failed test.
It does this so that you will have an easier time identifying where a particular problem was introduced to the source code. If John checks in some bad code, the CI system runs the build immediately, and about 30 minutes later, everyone in the group receives an email with the subject header "John just broke the build". It is a great way to identify errors, and it is also a great way to motivate developers to test locally before committing to a source control system as no one likes to be the reason for a build failure email.
Running a CI server is more than "just a good idea". Once your system reaches a certain level of complexity you can't scale a system without commiting to continuous integration and testing. If you don't have continuous integration, you end up having to put all development on hold each time you want to perform a release. If you don't build, test, and deploy your system on a regular basis - if it isn't something that is well rehearsed, integration becomes a time consuming nightmare of manual testing and builds that often leads to inconsistent builds. This is especially true if your development effort spans multiple systems and multiple development workgroups. You run a CI system because building, testing, and deploying your system should be automatic: it should be as trivial as pressing a button.
The concept of a CI server is only slightly more established than a repository manager, and very often you will see that an organization has identified the need for a CI server before they've identified the need for a repository manager. If you are coding a complex system, there is a very good chance that you are already running a CI server. The most popular servers out there are Hudson, Bamboo, and CruiseControl. While the connection between CI servers and repository managers might not be immediately obvious, when used together they can introduce some new possibilities for the way you develop your systems.
When you have a system to continuously build your code, you also have a system that can continuously publish SNAPSHOT artifacts to a repository manager to enable a more granular approach to development. What do I mean by "a more granular approach to development"? To answer that question, let's take a look at a complex multi-module project using the example of the eCommerce group from the previous post in this series.
Assume you have a new programmer starting tomorrow. Instead of throwing him at the entire forty-thousand lines of code, you would like to be able to give that developer a small, easy to digest task. You want this developer to add support for PayPal's Adaptive Payments API in your eCommerce system. That's it. You don't want them to be distracted by the overwhelming scope of the project, and you certainly can't afford for them to take a three month voyage through your project's code before they start contributing to the effort. Deadlines are tight, and you don't have enough people on your team. It is important that new hires start programming as soon as they walk in the door.
Without a repository manager hooked up to a continuous integration server, if you try to checkout just the ecom-paypal project, the build is going to fail because it will try to download dependencies from a repository manager. In the case of the ecom-paypal project, assume that the dependency graph looks like this.
When you have a repository manager and a continuous integration server, you can configure your continuous integration server to publish SNAPSHOT artifacts (in-progress SNAPSHOT binaries) to your repository manager. This will allow you to just check out a single, isolated portion of a much larger multi-module project.
Without a repository manager, trying to build version 1.3-SNAPSHOT of ecom-paypal in isolation is going to generate errors because you are forced to checkout the entire codebase to build and install all of the dependencies in your local repository. With a repository manager, SNAPSHOT artifacts are being continuously published because Hudson is checking you SCM every few minutes and building the latest code. When you run the ecom-paypal module's build in isolation, Maven is going to download the most recent SNAPSHOT.
Without a repository manager, your new developer is going to have to download the entire codebase and run a large time-consuming build. With a repository manager you can work on specific components of a larger multi-module project. This ability to divide and conquer your codebase comes in very handy when you need a consultant to take a look at a specific problem, or when you need to look at a coding problem in isolation.
When you continuously publish build artifacts to a repository manager, you move away from the single monolithic project build and toward a project layout and architecture that lends itself to modularization.
In tomorrow's post: How a Repository Manager decouples deployments from source code, and what that means for developer operations.