While many developers have adopted Maven as a build tool, most have yet to understand the importance of maintaining an artifact repository manager both to proxy remote repositories and to manage and distribute software artifacts.
This document defines artifact repository and repository management, providing context for developers interested in learning how to use Sonatype Nexus Repository to achieve a more efficient development cycle. Additionally, learn more about artifact repositories and Sonatype Nexus Repository.
An artifact is something created during the software build and deployment process. They are usually binary packages, like JAR, WAR, or ZIP files, generated when the code is compiled. But they can also be metadata, logs, or any other components required for successful deployment.
Artifacts play an important role in the software development life cycle (SDLC), especially in continuous integration and continuous deployment (CI/CD) practices. In CI/CD, build artifacts are automatically created, stored, and shared to support faster and more reliable software releases. Build artifacts and other artifact types must be managed to ensure the quality, security, and stability of software projects.
Maven developers are familiar with the concept of an artifact repository: A collection of binary software artifacts and metadata stored in a defined directory structure which is used by clients such Maven or Ivy to retrieve binaries during a build process. In the case of the Maven repository, the primary type of binary artifact is a JAR file containing Java bytecode, but there is no limit to what type of artifact can be stored in a Maven repository. For example, one could just as easily deploy documentation archives, source archives, or Ruby libraries. A Maven repository provides a platform for the storage, retrieval, and management of binary software artifacts and metadata.
In Maven, every software artifact is described by an XML document called a Project Object Model (POM). This POM contains information that describes a project and lists its dependencies - the binary software artifacts which a given component depends upon for successful compilation or execution. When Maven downloads a dependency from a repository, it also downloads that dependency's POM. Given a dependency's POM, Maven can then download any other libraries which are required by that dependency. The ability to automatically calculate a project's dependencies and transitive dependencies is made possible by the standard and structure set by the Maven repository.
Maven and other tools, such as Ivy, interact with an artifact repository to search for binary software artifacts, model the projects they manage, and retrieve software artifacts on-demand. When you download and install Maven without any customization, it will retrieve artifacts from the Central repository which serves millions of Maven users every single day.
While you can configure Maven to retrieve binary software artifacts from a collection of mirrors, the best-practice is to install an artifact repository manager such as Sonatype Nexus Repository which can proxy Central repository and cache artifacts retrieved from a remote repository on a server in your own network. In addition to Central, there are a number of major organizations such as Red Hat and Oracle which maintain separate repositories.
While this might seem like a simple, obvious mechanism for distributing artifacts, the Java platform existed for several years before the Maven project created a formal attempt at artifact management. Until the advent of the Maven repository in 2002, a project's dependencies were gathered in a manual, ad-hoc process and were often distributed with the source code for an open source project.
As applications grew more and more complex, and as software teams developed a need for more complex dependency management capabilities for larger enterprise applications, Maven's ability to automatically retrieve dependencies and model dependencies between components became an essential part of software development.
A repository stores two types of artifacts: Releases and snapshots. Release repositories are for stable, static release artifacts and snapshot repositories are frequently updated repositories that store binary software artifacts from projects under constant development. While it is possible to create an artifact repository that serves both release and snapshot artifacts, repositories are usually segmented into release or snapshot repositories serving different consumers and maintaining different standards and procedures for deploying artifacts.
A release repository and a snapshot repository serve different purposes, similar to the distinction between production and staging networks. The release repository is the equivalent of a production network, providing stable and reliable software. The snapshot repository is like a staging or testing network, offering a platform for experimentation and iteration.
While there is a higher level of procedure and ceremony associated with deploying to a release repository, snapshot artifact repositories frequently without regard for stability and repeatability.
A release artifact is an artifact which was created by a specific versioned release. For example, consider the 1.2.0 release of the commons-lang library stored in the Central repository. This release artifact, commons-lang-1.2.0.jar, and the associated POM, commons-lang-1.2.0.pom, are static objects which will never change in the Central repository.
Released artifacts are considered to be solid, stable, and perpetual in order to guarantee that builds which depend upon them are repeatable over time. The released JAR artifact is associated with a PGP signature, an MD5 and SHA checksum which can be used to verify both the authenticity and integrity of the binary software artifact.
Snapshot artifacts are artifacts generated during the development of a software project. A snapshot artifact has both a version number such as "1.3.0" or "1.3" and a timestamp in its name. For example, a snapshot artifact for commons-lang 1.3.0 might have the name commons-lang-1.3.0-20090314.182342-1.jar the associated POM, MD5 and SHA hashes would also have a similar name. To facilitate collaboration during the development of software components, Maven and other clients know how to interrogate the metadata associated with a snapshot artifact and always retrieve the latest version of a snapshot dependency from a repository.
Repositories and tools like Maven know about a set of coordinates including the following components: groupId, artifactId, version, and packaging. This set of coordinates is often referred to as a GAV coordinate which is short for "Group, Artifact, Version coordinate." The GAV coordinate standard is the foundation for Maven's dependency management and artifact management capabilities. Four elements of this coordinate are described below.
A group identifier groups a set of artifacts into a logical group. Groups are often designed to reflect the organization under which a particular software component is being produced. For example, software components being produced by the Maven project at the Apache Software Foundation are available under the groupId org.apache.maven.
An artifact is an identifier for a software component. An artifact can represent an application or a library; for example, if you were creating a simple web application your project might have the artifactId "simple-webapp," and if you were creating a simple library, your artifact might be "simple-library." The combination of groupId and artifactId must be unique for a project.
The version of a project follows the established convention of Major, Minor, and Point release versions, similar to the SemVer convention. For example, if your simple-library artifact has a Major release version of 1, a minor release version of 2, and point or patch release version of 3, your version would be 1.2.3. Versions can also have alphanumeric qualifiers which are often used to denote release status. An example of such a qualifier would be a version like "1.2.3-BETA" where BETA signals a stage of testing meaningful to consumers of a software component.
Maven was initially created to handle JAR files, but a Maven artifact repository is completely agnostic when it comes to the type of artifact it is managing. Packaging can be anything that describes any binary software format including ZIP, SWC, SWF, NAR, WAR, EAR, SAR.
Tools which are designed to interact with the Maven repository, translate these coordinates in a URL which corresponds to a location in a Maven repository. If a tool such as Maven is looking for version 1.2.0 of the commons-lang JAR in the group org.apache.commons, this request is translated into:
.../org/apache/commons/commons-lang/1.2.0/commons-lang-1.2.0.jar
You can look at Central as an example for how Maven repositories operate. Below are some of the properties of release repositories such as the Central repository.
All software artifacts added to Central require proper metadata including a Project Object Model (POM) for each artifact which describes the artifact itself, and any dependencies it might have.
Once published to the Central repository, an artifact and the metadata describing it never change. This property of release repositories guarantees that projects which depend on releases will be repeatable and stable over time. While new software artifacts are being published to Central every day, once an artifact is assigned a release number on Central, there is a strict policy against modifying the contents of a software artifact after a release.
Central is a public resource, and it is currently used by the millions of developers who have adopted Maven and other tools that understand how to interact with the Maven repository structure. There are a series of mirrors for the Central repository which are constantly synchronized with Central. Users are encouraged to query Central for project metadata and cryptographic hashes and they are encouraged to retrieve the actual software artifacts from one of Central's many mirrors. Tools like Sonatype Nexus Repository are designed to retrieve metadata from Central and artifact binaries form mirrors.
The Central repository contains cryptographic hashes and PGP signatures which can be used to verify the authenticity and integrity of software artifacts served from Central or one of the many mirrors of Central.