News and Notes from the Makers of Nexus | Sonatype Blog

Lessons Learned Again #npmgate

Written by Manfred Moser | March 28, 2016

The recent events in the world of JavaScript developers and npm definitely caused a storm on twitter and the internet in general. If you want to find out more around the kik package, the trademark threats, the unpublishing of the left-pad package and the resulting breakage of packages and builds everywhere, check out this recap.

No matter what your personal opinion is about who is at fault and what should be done about this – there are definitely a few lessons to be learned. Some will be new to you and old to others, but together they constitute a valuable step forward. If we followed these ideas, npm gate might never have happened the first place.

Don't fall for lawyer threats immediately  

In general, you can assume that people do not understand patent, copyright or trademark laws. This is exactly why companies like kik can threaten with lawyers, even if their claim is most likely unfounded. Both the developer of the kik package as well npm inc. fell victim to this bullying tactic. A trademarked company name does not give you rights to the sole usage of these letters in any context. It would have to be easy to confuse. Just ask Apple how they prevent supermarkets from selling an apple. Or Cisco and their usage of ios for their operating system. And don't confuse that with iOS. And then check kik.com vs kik.de.

With these complexities on the trademark side, not to mention things like onboarding, user support and so on, you can probably understand that:

A public repository is a big responsibility

Sonatype has been running the Central Repository for the Maven and wider Java ecosystem as the largest repository for years now. We manage billions of downloads, millions of components and many thousands of contributors. We supply lots of documentation and have strict guidelines and terms of service. Over the years we have learned many lessons and have been enforcing things like signatures, minimal metadata and proof of namespace ownership for years. You have to be very careful with your actions around managing the repository and always look out for the best interest of the community of users.

One of the aspects that definitely helps a lot is that:

Release components are immutable

The users of a repository rely on the fact that any component retrieved with a certain identifier, is the same, no matter if they retrieved it two years ago, yesterday or will do so in a year's time. The idea of unpublishing, as possible in npm, always struck me as a bad idea and against this immutability concept. Without it, you can for example not guarantee that a build of your project running today produces the same output as a week ago .. it might not even work. Not to mention what happens in a year's time.

The npm community suffered from an even worse aspect though, a package was actually deleted. This breaks the immutability and also opens the door to potential new packages of the same name with completely different characteristics. They are frantically working on plugging that hole now.

And the same immutability principle applies to your own application releases. If the output of your build or release process is different, it should be a new component and e.g. use a new version number. It should never overwrite an existing component, not in the file system and also not in your own in-house repository.

Namespace separation helps

The npm public registry does not force the usage of namespaces. In the npm case they are called scope. In the beginning the Central Repository we did not use namespaces either, but the usage and enforcement as part of the deployment process and user validation has been a tremendous help for users. Without it, administration and onboarding new users would not be possible at the scale of the Central Repository. In the case of the Central Repository, the groupId uses a naming convention that relies on reverse domain name patterns. E.g. if I want to publish to com.foo, I have to provide proof that I own the foo.com domain name.

Usage of libraries comes at a cost

Every developer knows that you don't want to reinvent the wheel and use libraries instead. We follow the Unix principles and stand on each others shoulders to enable creation of today's complex applications to be created at all. After all you would not tell someone, wanting to learn to drive a car, to learn how to weld a frame first, so they can build their own car…

Just like using a rusty car with bad welding creates problems, using components of unknown quality can be disastrous. Developing applications includes a requirement to understand the characteristics of all the involved components. Java developers using Maven have known that for a long time and have access to tools like the dependency hierarchy view in M2Eclipse.

Other developers in the npm or python ecosystem are not yet that lucky, but essentially have the same problem. The good news is that we are working on helping them as well.

First and foremost however, this will need to be driven by a shift in mindset. Your application is not just the code you write … but everything you ship and use as part of running your application. And you are responsible for it all. So try to work towards understanding all the parts inside and be able to generate a full bill of materials.

And don't tell me it gets easier with Docker. You just have one large container … with lots more stuff inside. You might want to talk to Twistlock to find out what they are up to.

Use a repository manager

A repository manager essentially allows you to run your own in house repositories or registries as well as proxy public ones. Check out more in my white paper Concepts and Benefits of Repository Management. It has long been a well known best practice in the Maven ecosystem to use a repository manager and users of Gradle and other tools are slowly waking up to that fact as well.

Beyond that, any tool that relies on public repositories is potentially victim to upstream deletions, outage and suffers from the network overhead of repeated downloads and wasted time. And that includes npm, NuGet, Bower, Docker and others.

Supply chain management

At this stage it is undeniable the software industry is growing up and supply chain management methodologies have become crucial. A mantra like "Use fewer, better components from trusted suppliers" applies to a car manufacturer outsourcing seat belts and other parts. But it also applies to a software developer using database abstraction layers and persistence frameworks, logging and web frameworks or security and encryption libraries.

And Sonatype Nexus Repository can be your warehouse of components. And it is free and easy to install and run.

Next steps

So is this it? Are we done and we got this all under control? Not by a long shot. Npm inc. has plans to improve the situation already. And there are a lot of further tools to be created to manage large dependency trees in various ecosystems as well as mature their repository formats and processes. And of course, there are trust and security related complexities that should be tackled as well. Exciting times!