#unpublish Happens — How to protect your Node.js Investment

Abstract:

Igor Soarez / Principal Engineer / YLD, May 2016: At the end of the day npm is a registry, it’s a tool that comes bundled with Node.js. But it has certain tradeoffs.
What are the tradeoffs of relying on npm for building and deploying? How do we approach this problem? What are the best guidelines?
you can’t relly on the community to support your business, open source developers do not work for your company, you can’t support your production environment on a public package repository.
You need to check for security. You need to guarantee that the version that’s deployed to production is consistently the same. Because the registry is an external dependency you need to mitigate this risk. The solution cannot be maintained by a third party that does not provide an SLA — you need to be able to rely on whatever you use to deploy. Our goal is to propose an architecture solves this problem.

Talk Transcription:

I’m Igor. I’ve been doing Node for more than four years now. I’ve helped all kinds of companies from startups to enterprises adopt Node. To build frameworks, microservices, the basic web development, all the works.

I work for YLD. YLD is a consulting company, based in London. We’ve been doing this for around two and a half years now. And YLD was founded by the creators of Nodejitsu. Nodejitsu for those who don’t know was the reference platform as a service for Node applications. They’ve now been acquired. Back then, it was Nodejitsu and the founders of YLD running the public NPM registry. We are founding members of the Node Foundation. This is a foundation that covers the Node.js project.

The way we in the industry deliver software has been changing. We see this list of changes as the future of engineering. Bill Scott, who is an advisor at our company, made these changes back at PayPal. And it’s all about breaking work into small, talented, independent teams. And always keeping an eye on customer value. And we see Node as a great tool to facilitate this kind of change.

The Node community has come a long way. It has grown incredibly fast and by many measures a Node is the fastest growing and most significant developing language right now. It has 4 million users and we have seen its growth double year after year. Especially in enterprise where adoption is up 400 percent since last year. Node is everywhere.

And not that long ago, Node was just a side project by Ryan Dahl. This image is from a video when Ryan introduced Node as JSConf Europe. This brilliant man took JavaScript, a language that didn’t have any I/O, put it together with Livy VC, a synchronous I/O library, and took JavaScript to the server. He wanted an event driven, non-blocking I/O. He made Node lightweight and highly performant even under extreme load. And this wasn’t that long ago.

Today it’s been deployed by a lot of enterprises. But in a lot of ways, Node is still seen, so often seen, as immature. Especially by older — other older communities. And I think a big reason for that is that JavaScript is a popular language. This is the programming language ranking by RedMonk. They are a leading industry analytics firm based in London. And JavaScript is actually the most popular language.

It started on browsers but now you can run JavaScript everywhere. And a big part of the Node community is made from web developers. Node uses JavaScript, browsers run JavaScript, but JavaScript and a server isn’t the same as JavaScript and the client. And too often these web developers don’t have any sort of engineering background. We believe that there’s a difference between engineers and developers. And with a strong focus on engineering, we at YLD would like to consider all aspects of our chosen toolset from security through its build pipeline. And not enough consideration is always given to this in the community.

It’s great that Node lowers the entry barrier for new backend developers. But it also explains why the community sometimes seems a bit out of touch with established engineering practices. This is something we work very hard to help with — our customers with at YLD.

But Node keeps getting adopted and enterprise use is still growing and for good reasons. Node is a simple, powerful, and flexible tool. It simplifies I/O without compromising performance. But it’s important to remember that not all use cases are fit for Node. […] sure that your organization is adopting Node is for the right use case.

One of the best use cases we often find in the enterprise for Node is what we like to call dedicated backends. Or also commonly referred to as backend for frontends. Node is great to give application developers a piece of land on the server side. And what I mean with that is it gives the application or client team flexibility to build the support they need for the platform they’re targeting, without having to constantly go back and forth with the API or backend teams. So it creates flexibility and speed and allows you to deliver value to your customer quicker. All of this without compromising security or performance. But that doesn’t mean that adopting Node is free from risk.

We have yet to find an enterprise Node.js project that does not have dependencies. Node has a vibrant community. It has a very rich ecosystem of public packages and naturally you want to make most — the most of that. And in Node, the development culture promote breaking problems into many small modules that do one thing and one thing well. So in comparison with other technologies, Node projects, on average will have many more dependencies. And this is one of Node’s greatest strengths but potentially also one of its weak points — one of its weak points if you’re not careful.

So dealing with this increased amount of dependencies can be a struggle. And this is where strong engineering practices will protect you and your investment.

All of these public packages are shared on NPM. NPM is a public registry. Anyone can publish and it’s free to use. NPM is the official package manager for Node. It’s also the public registry. And it’s also a startup. The NPM client is bundled with Node. So when you install Node, you get NPM. But while Node is governed under a foundation, made from the companies that have an invested interest in it, the public NPM registry is run and governed by a private company called NPM Inc. So the public registry where everyone doing Node is sharing their work, is controlled by a private company. It’s not under the control of the foundation.

When you install Node, you get NPM and an NPM configuration it will by default use, or point to, the public registry run by NPM Inc. If for whatever reason, the public NPM registry is not available, that can halt development for a whole team. This is what you’ll see if you’re trying to install an old module while the public registry is unavailable. On this instance we’re trying to install the joi package. But instead of joi all you get is frustration.

The public registry could and has been down for a variety of reasons. Even if it isn’t down, it can be slow. And last thing you want is your team to be frequently idling waiting for the free and public registry.

Much worse than development is building and deploying. Each time a build is triggered, you need to run NPM install. And if NPM install fails, the whole build fails. And that will delay the release cycle. If you’re running NPM install when deploying, the result can be much worse. Continuous deployment and or integration pipelines where capacity is reduced during deployment could put unnecessary strain on your available production instances.

And remember, if you’re using NPM, the frightening truth is you are relying on strangers. You’re using their packages. Using the public registry can leave you at the mercy of any number of package authors. Recently NPM was in the press for breaking the internet. Sites like WordPress dot com were affected. This developer decided to remove this tiny package that was a dependency for hundreds of other packages and that broke thousands of projects. Now following this incident, removing published packages is now disallowed by the public registry. But there are still many ways that open source contributions can affect your project.

If you think about security, all you need is one bad module. Again, the Node way is to have many small dependencies that do one thing and one thing well. But if on average, the likelihood of a specific version of a Node module having a security issue is 0.5 percent. Then a project with 30 dependencies, there’s a 15 percent chance that your project is backing a security vulnerability. And that is quite scary. Even if it isn’t a security update, even if it isn’t a security issue, if it’s just a bug or incompatible update, that’s bad enough. And because with so many dependencies, you don’t want to be stuck on specific versions and have to update every single one of them manually. But with so many updates to public NPM packages happening everyday, there is a big chance that from the time you […] and you test it and deploy it, you’ll get different software. And better working and faster software is great, but that could also be breaking changes or bugs. An unpredictable changes is unacceptable risk.

In an old project, dependencies are specified with ranges. There’s many ways to specify these ranges. You can refer to a specific version. You can allow any version above a certain version. If you use the tilde, that permits new patches. If you use a caret, that also allows new minor version, but not major, so there’s different ways. And the guidelines and the default way to specify them has been matter of much debate and it has changed a number of times.

NPM recommends using semver for application versioning. This is a set of rules that try to separate compatible from incompatible changes. But the reality is, that it’s completely up to the package author to honor this rule. And this can be problematic if the package you depend on does not follow semver. Or — and the new version of the package has breaking changes. And even if the package author follows semver, there’s still probability that a bug might get introduced in a compatible version.

And it’s hard enough already to select dependencies. There’s all of these things that you need to consider. Like license, release frequency, whether the authors are responsive to open issues, test coverage, et cetera. And with a high number of dependencies, that aggravates this issue.

So if you’ve seen all of these issues, what can you do about it? What’s the lesson here? The first and most important rule is don’t run NPM install when deploying. It’s amazing the amount of times we see this. If the registry isn’t available, or for whatever reason NPM install fails, then you might not have enough provision servers when you need them. Instead, you should try to deploy a prebuilt artifact.

You also need to stay on top of the software you depend upon. Even when you’re building those artifacts, you don’t want to depend on the public registry for that.

So the solution is to keep a cache, keep a copy of every dependency under your control. And in the past, it has been a recommendation for the Node community that applications should check their dependencies into source control. Essentially bundling or vendering dependencies into your application.

But that can get pretty messy. And because of native modules, that is NPM packages with C and C++ code and makefiles and not just JavaScript on them. When you install these, they need to be compiled. And that will target the specific architecture that will be different from developer machine to developer machine. And it will be different to the environment you’re targeting. So every time you check out the project, you only do NPM rebuild, and you need to be quite selective with the files that you check into source control. And that can be a nightmare. Not to mention that diff can get unpleasant to read. And the whole version control system will turn into a pain to deal with.

A better solution is to use a private package repository. A caching proxy to the public registry. And every time you add a new dependency, you cache it persistently under your control. Artifactory is what the majority of our clients use. It has support for NPM, and you can configure it to be a caching proxy to the public registry. Once you start using your dependencies, it gets cached, and if NPM is not available, if a package is pulled, if any trouble comes, you’re safe. You have what you need to keep on developing and building and deploying. And it also allows you to publish your public packages to your own private package repository. Which is nice.

It’s really simple to set up. All you need to do is configure NPM. Change this configuration file and set a different endpoint for the registry. And I wanted to do a demo of this, but I thought it would be cooler to do it on a plane over here.

So I’m filming this with my phone. And I have no connection on my laptop. I’m on the hapi project which is a popular web development framework for node. I removed all of the dependencies. And checking the npm configuration for the registries. We’ll see that it’s pointing to Artifactory running locally on Docker. Then we run NPM install and even though the public registry isn’t available, because all of the modules are cached under Artifactory, you’re still able to install.

So that takes care of the availability problem. But what about updates? Between cutting and testing […] there will be updates to your dependencies. And even if you don’t use version ranges, even if you specify your project dependencies to a specific version, that only solves the problem for direct dependencies. It doesn’t solve the problem for transient dependencies.

And a solution for this is to use NPM shrinkwrap. If you run npm shrinkwrap on your project that creates a file that lists all of the names and specific versions of NPM modules that are currently installed. And this file, which you should check into version control will be referred to by NPM and for the NPM installs. And when NPM install runs again, it will actually ignore the dependency specification and package json and it will install these specific versions instead.

The module by Mozilla called npm-lockdown even goes as far as to list not only the name and version but it also saves the checksums and it verifies them when you run NPM install again. So even if a specific version gets overwritten, if anything changes, npm will fail loudly and you know — you’ll know that a package has changed.

Even easier if you’re building containers, none of this is a problem. If the result of your build system is an image, then all of the dependencies are bundled in that artifact. And that artifact can be passed around from environment to environment without any of these changing issues.

But if everything’s frozen, if you lock down all of your dependencies, how do you get new software. How do you manage updates to your dependencies? Running NPM outdated helps you to keep track of outdated dependencies. It tells you how far behind you are.

Packages listed in yellow, means there’s a new version to that module but your dependency specification doesn’t allow for an upgrade. Red, not sure if you can see it but ejs in the middle is in red, means that you should run NPM install again so you get a newer compatible version of a module.

And to minimize security issue surprises, it’s a very good idea to run snky on your build pipeline. Snky is a free development tool that gathers your dependencies by name and version and queries that list against a public vulnerability database. So it will can warn you about security issues on dependencies so you know to upgrade or downgrade to a safer version or to just move away from that dependency entirely.

We’ve also been working on a cool tool. Called disclosure. Which I want to show you now. Okay.

So if you run disclosure on a project, it will also take a look at your dependency list and it will try to give you an overview if it works. Do I have internet? Okay.

So I’m running this on the hapi project, and it lists all of the dependencies along with the licenses you’re exposed to. It gives you a rough idea of the reliability of those dependencies. It tries to look at things like, amount of lines of code, whether it has stats or not, whether the package has been or the version has been deprecated or outdated. And we want to make it look at other metrics like number of open issues on GitHub and how frequently they get closed. How frequently the source gets updated. It will also tell you, delegating to snky. It will tell you if there’s any security issues and will try to give a score to your dependencies. There is another example running, I guess. Express.

And we want to make this open source. So I’ll do that now. There we go. So we look forward to people using this and getting contributions. All right.

And to conclude, don’t run NPM install on deploys, keep copies of all of your dependencies, know what you’re exposed to, you can use disclosure for that, run vulnerability scans, and try to narrow down dependency versions. That is all. Thank you.