How Bintray Saved my Marriage: Scaling WebJars with Bintray

Abstract:

James Ward, Principal platform evangelist / Salesforce, May 2016: The WebJars project uses the Bintray APIs to automate the deployment of NPM and Bower packages to Maven Central. This session will walk through the application architecture and the Bintray API basics. Code examples will be plentiful & functional (Scala).

Talk Transcription:

This talk is how Bintray Saved my Marriage: Scaling WebJars with Bintray. This is going to be a little bit of WebJars and my journey with that and then go through some of the Bintray APIs so you can get an idea for how to use the Bintray APIs.

So first off a little bit about WebJars — what WebJars is. WebJars are web libraries, so that’s like CSS and JavaScript libraries like Bootstrap and JQuery and that sort of things, but packaged up into jar files and then deployed on Maven Central so you can specify them in your Java package manager and consume those web libraries. So it just makes it really easy to pull those in as dependencies. What I’ve seen in a lot of projects before WebJars is that you would go take like a jQuery, JavaScript file and copy and paste it into your project but then you wouldn’t know what version you were actually using, if there were transitive dependencies in that library those wouldn’t get automatically pulled in for you. So it was kinda painful to deal with JavaScript libraries in Java projects.

So I’m gonna show you WebJars. This is the WebJars.org list of the classic WebJars. We’ll talk about in a minute why these are classic WebJars, but there’s quite a few different libraries in here so. So these are all libraries that are deployed on Maven Central and you can go pull these into your Java package manager. So if you use Maven you can click that and then you can copy and paste that dependency definition there and that will pull in the transient dependencies to the library, if there are any, and then you can start using that.

There’s different ways to use the WebJars in different Java web frameworks. It depends on the web framework, how you actually pull, how you actually use it, but essentially what you’ve done is you’ve added a static resource to your class path and so as long as your web framework can expose a resource on the class path out to – out through HTTP — then you can use WebJars.

So that’s – that’s — our classic WebJars. Let’s go back here.

[Audience Question]

Some frameworks that make it really easy to use WebJars is SpringBoot, of course, Play framework, and anything that’s now servlet 3.0 or higher has basically automatic support for WebJars. So lots of different, so essentially anything can work with WebJars but some frameworks make it easier than others.

So as you saw there’s a ton of WebJars in there and so how did they get there. Well the way that they got there was there is a project for every single classic WebJar and then for each of these, let’s look at like three.js, there is a POM file and the POM is the standard Maven build definition and this build definition describes all the required metadata and then has some additional config for required js and then, the important part, is that when I run the build on a WebJar it will pull down the source code for that web library, from wherever that comes from, and then package it into the jar file and then deploy it on Maven Central. So that’s how initially we started deploying WebJars was just by having a Maven build for every single one.

Okay. So, we got tons and tons of class WebJars. I’ll show you some stats in a minute. But we’ve had over eight million downloads of all the WebJars. So that’s quite a few downloads. What it did was created this burden of success so we’re pretty successful with WebJars. It became a pretty standard way for people to consume web libraries in their Java projects.

But with over thousand classic WebJars and about 15 minutes to create each of those WebJars that was about 250 hours of my free time, this wasn’t something my employer was sponsoring me to do, so this was nights and weekends. So 250 hours just to create those thousand WebJars. I didn’t necessarily do all of that time because I did had some contributors helping out. And then, there, on top of the classic WebJars artifacts, there’s also more than one version for many of the artifacts so the total is that there is 3600 classic WebJar versions. So different versions across those thousand classic WebJars. And each of them created a new version of an existing artifact takes about five minutes for me to create and deploy so that’s about 216 hours. If you do the math, I took out the thousand versions that were created implicitly when I created the WebJar. So only 2600 new versions have been created but five minutes each for those.

So lots and lots of time, lots of nights and weekends and this is where it relates to the title about how I saved my marriage is that I was really spending a lot of my free time on nights and weekends just creating WebJars and it’s really meaning — meaning, not meaningful – it is very meaningless, monotonous work. And so obviously my wife wasn’t happy when we would wake up Saturday morning and she want to go get brunch and I’d be like, no sorry I got to work on WebJars. That happened actually a lot. So we needed to come up with a way to deal with this. Because the number of JavaScript libraries is growing. Like astronomically. So the NPM repository, when I looked a couple of days ago had 278,000 packages and is getting almost 500 new packages a day into the NPM repository. So obviously, I can’t sustain this project in my free time.

So automation was the only way to fix this. So what I needed to do was automate the deployment, creation and deployment of WebJars and there’s this really good xkcd comic about automation. And so I put it off for a long time because I knew there would be a significant time investment up front to do the automation but that it would pay off, hopefully pay off, down the road and it definitely has.

So in order to automate this, some things that were key is that I wanted the WebJars to be in jCenter and wanted them to be on Maven Central so that Java developers can easily consume them. One of the options that we talked about early on was setting up a different repository, that would be like a proxy repository and then we would make it so whenever someone requested something from the repository, if didn’t exist already, it would go automatically create it. And somebody actually ended up building essentially that. But that didn’t work for with what I wanted with WebJars because I wanted the — I wanted the libraries to be in jCenter and Maven Central so that no one had to add in additional repository.

So the other key to automation was that I need metadata about the packages. So I needed the information to deploy to Bintray and Maven Central, I need that metadata. So things like where does the source code exists, what’s the license, I need that metadata. And so in order to get that metadata and to get the JavaScript libraries there was really two good options out there for doing this was NPM and Bower repositories. So we’ll talk about that in a second but the, one of the things we talked about early on was should we try to sink all of the NPM or all of the Bower repository into Bintray or should we just do it on demand. And because there’s just so many packages now on NPM I decided not to sink all of it to make it on demand. So a user has to actually go in and say I need this artifact and I need this version of this artifact in order to use my project to make it on demand rather than a sink all. There’s obviously a trade-off there. But I didn’t want to put terabytes of useless JavaScript libraries into Bintray and Maven Central.

Okay so Bintray and WebJars. So what we do is we deploy artifacts from NPM and Bower into Bintray and then sync them to Maven Central and they’re deployed on demand.

So let me show you a quick little demo of how this works. So this is on WebJars. Anyone can go there and say add, like, an NPM WebJar and we can go in and find, like, Bootstrap, Twitter Bootstrap is deployed there and we can pick a version and hit deploy and, I haven’t tried this one so we’ll see if this one actually works. Some of them don’t work because there’s sometimes missing metadata in the NPM and Bower repositories. So there are quite a few libraries that I’m just like we can’t deploy those because we’re missing like the license or the license is not in the correct form or various other reasons. The source repository is not defined, that sort of thing.

But what you see it’s doing it going through all the steps to assemble, to pull the WebJar out of the NPM repository or pull the library out of the NPM repository, assemble it into a WebJar then deploy it onto Bintray and then sync it over to Maven Central. So it takes — usually takes a minute but then there we go, now it says deployed so now that library is available and you can start using it for your Java build, you just have to specify the group ID, artifact ID, and version.

That all worked, so let’s go. So this has been really successful. So I launched the Bower and NPM support, think about a year ago, and since then users have deployed tons of artifacts and versions. Obviously nowhere near the number that is actually in NPM or Bower but this is just on demand so people are deploying them as they need them. And obviously most Java developers are not using every library that’s out there in the NPM ecosystem but they’re some that they’re using quite a bit.

So here’s what we do for the deployment architecture. We fetch the metadata from the upstream repository, we transform the metadata into the right form that we need, we create the artifact, we deploy it to Bintray, and sync it to Maven Central. So that’s how we’re using Bintray. Just this way to move the artifact into a place and then sync it to Maven Central. But also gives us a nice management console that we can use to manage all the WebJars that are in the system.

So I’ll show you that. If we go into Bintray. This is the repository in Bintray. Bintray.com slash WebJars slash Maven and you can see there’s 531 pages of data here so lots of different artifacts deployed and we can, like, go into this Bower AngularJS library and we can go see the files that are associated, the versions. We can see the status of it when it was synced to Maven Central. So really nice to be able to come here and do this. There are times where like a deployment will fail halfway through because of some bug in my code and so I can come in and manage that and delete the version, if I need to, and start over. I’ve had to do that a couple of times and then go fix the bug on my side.

But you can see the metadata is all here for finding the website of this library and that sort of thing, license. So that management console gives us a nice way to manage everything, resync it to Maven Central. Maven Central deployment is not the most reliable thing out there and breaks quite a bit and so there are definitely times where I have to come in here and resync a library to Maven Central because it failed because Maven Central deployment was broken at the moment when somebody tried to do a deployment.

So that is the basic architecture of Bintray. So any questions before I dive into the Bintray API and some code and stuff.

Yeah. Go ahead.

[Audience Question]

I don’t have the statistics on that, we can definitely look ‘em up and see though. Maven Central has their statistics and Bintray has the statistics there so we definitely could look at the statistics and see but I haven’t done it — I haven’t gone and looked at that yet. Now that jCenter is the default with Gradle, right, so I’m sure that more of these downloads are going to jCenter but I’m not sure on the numbers. Cool. Other questions about the deployment architecture or what WebJars are? Okay.

Okay so to do all of this I had to use the Bintray API which is, actually, a really fantastic API. It really preserves a lot of the norms of REST APIs so it was real easy to learn and use. So that was nice. So let’s take a look at the REST APIs and walk through just some of the basics of the REST API.

So first let’s take a look at, like, get repositories. So if I want to get the repositories for my repo I can just say get slash repo and give it a subject. In this case the subject would be WebJars. I can come back and say Maven is my repo and once we get a repository then we can create packages in that repository. So there’s some APIs for packages. So get packages. An interesting one here that I use is create package. So create — whenever we create a new WebJar artifact we create a package for it and you’ll see that there’s some metadata that we give Bintray when we call the REST API that then it gets used in Bintray. So things like license and licenses and that sort of thing.

Licenses has been, actually, the most difficult part of this whole process because NPM and Bower don’t have very good standards on how you define licenses and then they started doing really crazy things like allowing you to combine licenses. And you could combine licenses with either an ‘and’ or an ‘or’ operator and as many ‘and’ or ‘or’ operators as you want to and parenthesis syntaxes which make absolutely no sense legally to like ‘and’ together different licenses. So there was just some really weird stuff to deal with in dealing with licenses. That was just one of the most challenging, challenging parts of the whole thing in the place where I have to spend the most time with maintenance on this project is just dealing with like, oh this package won’t deploy because the license is not in a readable form. So that’s been a challenge.

Anyway, so there’s the REST API here’s are really straightforward. Just standard REST, you know, just using the HTTP verbs do the things that you would expect like patch to update a package, which I don’t have to do since we just do one way immutable deployment version. So once we create a package and we want to add a new version, we can create a version. You can see that there’s standard – there’s standard fields for doing that. The name of the version, so other metadata about this.

Okay so that’s the REST API. Bintray.com slash docs slash API. Super easy. Authentication is super easy. I have an auth token I send with every request and that’s how I authorize.

Okay so let’s dive into the code. Are there any questions about the REST API? Yeah go ahead.

[Audience Question]

So this all runs in a – on WebJars.org is a Scala play application and so it all runs within that application and so I just use the REST APIs directly. I could fork process and call the CLI if I wanted to but I figured it was easier just to — just to use the REST API directly rather than fork the processes. So this is all automated so when you click that deploy button it is calling Scala which is calling the REST API. Does that answer your question?

[Audience]

Okay so I want to go through some of the code so you can see how I implemented that API. So this is a Scala class that I have as a wrapper around the Bintray REST APIs. There is some configuration that I need: Like my user name and password, my GPG passphrase for doing signing of artifacts, and then my Maven Central credentials and those come from configuration. I’m just using environment variables as the provider for that configuration and then this app runs on Heroku and so I set those environment variables in Heroku config.

So let’s take a look at, like the, create package method here. Create package takes some parameters: subject, repo, name, description, labels, licenses, vcsURL, website URL, issue tracker URL, GitHub repo, and returns a future of js value. A future in Scala is an asynchronous callback. Allows us to have a handle to something that will produce value in the future.

So let’s go take a look at what happens in here. First I create a json object. This is using play json’s library which I think is a pretty nice Scala json API. And so I’m just assembling a json object with the fields that I need and one way to clean this up I probably could use a case class and convert the case class to a json object. It would be a little bit cleaner than what I’m doing here. And then I use the play web service client library. So this is making an HTTP. It’s like an HTTP client. And I’m sending it to URL – we’re giving it to URL — here and that’s just the same thing that the docs told us to create a new package. We get the subject which is WebJars and the repo which is the name of the WebJar. So for an NPM WebJar its org dot WebJars dot NPM colon and then like Bootstrap — which is the artifact ID. So then I post, HTTP post, to that json and then I do a flatmap here on the result.

So the flatMap allows me to provide some information about the success or failure of this operation. The HTTP request will, if the HTTP request itself fails then the future would be a failed state. And it would become a fail state rather than a success state. But there’s some semantic information that comes back in the HTTP response whether or not the operation actually succeeded on the backend. And so that’s why I’m using flatMap here is so that I can return a future, a successful future or a failed future based on the response code. So I check the response code and a successful response code from the Bintray API for this call is created. So it’s saying, all right we’ve created this new package in Bintray. And so now I’m say, all right, that, to me, that is a successful so I say future dot successful. And then the body of that future gets produced from that future is a response dot json – the json body.

[Audience Question]

That’s a good question. If there’s wrappers around it.

[Audience Question]

Yeah. Cool. So I chose to write it myself because I wanted, for this case it really doesn’t matter, because it doesn’t need to but I wanted it to be reactive. And so a lot of times the REST API wrappers that people produce for Java aren’t reactive using like a block HTTP client underneath the covers and something like Apache comes in with an HTTP client. I wanted mine to be reactive. Because everything needs to be reactive in my world. Whether or not it actually needs to be. In this case it doesn’t even need to be but […] that’s why I chose to write this directly.

[Audience Question]

Yeah, so I think that it’s great that the wrappers are out there. This API is so easy you don’t really need one. As you can tell, it’s pretty darn easy to use. The REST API here.

Okay so a few other methods and you can see some other examples here. Like get package returns a future of js value and that’s just doing an HTTP get and then again checking the status code and setting the status of the future successful or fail based on the status code.

And then things like delete package. This one only gets used for tests that I have some tests we’ll take a look at in a second. But that only gets used for tests. Create version, upload Maven artifact, publish version, sign version, and sync to Maven Central.

Now here’s the really fun one I mentioned earlier. Convert licenses. So there’s like a hundred different variations for how people in NPM and Bower specify their licenses and like 80% of the time they are not compliant with any standard. Cause there isn’t one cause it’s the JavaScript world. So I have to do a lot of work to transform what I think that they intended. The way they specify their license based on what the license actually is.

So — one of the — that’s gets specified is the URL, and we have to go get the data from the URL they specified and then there’s this license utility that I use which is a microservice. I’ll show you. So I have this microservice it’s oss dash license dash detector dot herokuapp dot com. And what you do is you post the contents of a license to it and then it returns back to you what it thinks the license is. So it does some, like, fuzzy matching on the text of a license with a list of known licenses and tries to tell you what the license is.

It doesn’t always succeed because sometimes people, like, do crazy things in their licenses like add in gibberish at the bottom for some reason. So it doesn’t always work. We try our best to figure out what license they were licensing their stuff under. And this side was originally part of the same code base but at some point I refactored that part of the code out onto a microservice. Cause it was something that needed to scale separately, had its own memory requirements and performance requirements that were separate from this whole app. And so that was a pretty easy candidate to factor out — refactor out — onto a microservice. So that’s part of the license detection strategy and then there’s this whole other thing that we needed to do which is to take the list of licenses that they’ve specified and convert them into the Bintray licenses.

So there’s a list of Bintray accepted licenses and we need to go through and figure out, how do the licenses that the user specify, in their package dot json or whatever, how do those align with Bintray licenses. So there’s some funky stuff that we have to do in that so for instance sometimes people will specify OFL dash one dot one as their license when in fact the 23:50 format for specifying that license, or what Bintray expects, is the OpenFont dash one dot one. And so we have to do some transformations between what users specify and what Bintray will accept. But that’s working pretty well now and we’re getting fairly few cases where we can’t convert the license. So maybe I can convert this to some machine learning thing at some point.

But anyway, that’s the wrapper I have to convert a package from NPM or Bower to a package for Bintray and Maven Central. Questions about that wrapper? Oh and here is the list of accepted licenses there. Anyway, questions?

[Audience Question] How long did that take?

The license part or just this whole wrapper.

[Audience Question] How much time do you think you’ve spent subject to this negative work over the year?

The basics of the wrapper were really simple like a couple hours to get the — but the license part I’ve sunk, like I don’t know, 50, 100 hours to getting license stuff figured out. So that’s where you probably saw the comic here from xkcd but it’s, I don’t think I’m fully reaping the rewards yet of automation. I’ve definitely sunk a lot of time into particularly license detection. But the Bintray stuff was easy. Taking the JavaScript world and trying to fit it into this more constructed, more safe world that’s been difficult.

[Audience Statement] Never take things from the real world and try to apply it to a JavaScript world.

Yeah exactly. Exactly.

Okay so show you real quick the Bintray specs. This is my – this is my test that I wrote while I was building that wrapper just to test that everything works. So this is a specks two, I think I’m using specks two. To do my test here and so this is pretty — pretty — easy but, you know, if we want to test that create package works, we’re going – we’re going — to call create package with some test values, a bunch of test values, and then we’re going to check to make sure that the create result created field that’s a json query on the json object that comes back, we’re gonna make sure that it’s a valid date object.

Pretty simple to write these tests and the only kind of tricky part was for continuous integration I needed some amount of test coverage of this stuff that worked without any credentials on a — so things I could do without providing any Bintray credentials for. Because this is running on Travis CI. A public continuous integration and there was some issues with, like, letting my credentials leak out through there.

So that’s the test. I can run and see — all right — that it’s gonna do a bunch of stuff against Bintray and we’ll make sure that everything works. So that’s just in SPT’s console and running the Bintray spec. I have a bunch of tests. I have like hundreds of tests that test all the license conversion stuff separately from this and so that’s where a lot of the work Josh mentioned is gone into the license conversion stuff. It should be going out and running those tests depending on how quickly the internet is here. So we’ll see if that finishes.

We can see it will run those tests. This one I’ve specified my Bintray credentials in my environment so it’s actually doing the whole sweep of tests across and it can test everything. Okay we’ll let that run.

[Audience]

I probably didn’t specify a license just like all the JavaScript people in the world. A lot of JavaScript libraries don’t even specify a license. I’m probably like them as well. Now you got me curious, we should. So now all that source code — that brings up a good point, all that source code is up on GitHub slash bin or slash WebJars – slash WebJars.

[Audience]

Do I have a license in here?

[Audience]

No license. It’s. Yeah. I’m horrible. I’m a horrible human. Relinquish me to the JavaScript world.

Okay anyway there’s my test. My test has run and everything’s passing. Okay so there we go we are running the tests, that’s all good and let’s go back to. So that was our Scala wrapper.

So now at some point we need to take all these methods and put them all together into a process that actually works so here’s essentially the process that we go through when somebody deploys a new library through this process. So we create the WebJar from the metadata on NPM or Bower and then we create the package, create the version, publish the Pom, publish the jar, publish the source jar, publish the javadoc jar, publish — do I have that in twice? I do, okay. That happens once. I should check my code, though, to make sure. So, and then, sign the version, publish the version and do a Bintray publish to make it public on Bintray and then sync to Maven Central. So all that stuff happens automatically. So let’s go check out that code and I’ll show you where we’ll do all that. This is one place I’m not real happy with my code base is. I have quite a bit of duplication between the Bower one and the NPM one and I need to do some refactoring in there. But I’ll walk through this one – the NPM one.

So this is what gets called when somebody does this and there’s two different ways I can run this. I can either run it in memory in the same process that serving the web process or I can do it on a different process and so in production I actually run it on a separate process and I’ll show you how I do that in a sec. But what we do is we call release, we give it some parameters and then we go through and create a WebJar. You see there this code’s kinda cluttered because I want some update messages for the user to see some update messages as the process is working. So this is actually pushing those update messages out through a cloud service called Pusher and it’s essentially just like a messaging broker that supports web sockets. So it’s pushing these messages out from this process that’s not running in my web process running somewhere else. So it’s pushing these update messages out so that the user can see what’s going on, and more importantly see where it failed.

So okay, so we figure out some information from the NPM repo, we create a Pom file, we create a tar gz of the artifact, and then we create the WebJar where we assemble everything together in the right format and then here’s the Bintray part. So now that we got our package, we convert the licenses, so we make sure we — that these licenses are compatible with Bintray. We create package. Its package is already there then we just get the package. Then we create the version, upload the Maven artifact which is the Pom, upload the Maven artifact which is the jar, and then we create an empty jar for the source and the Java doc one so you see the dash sources and dash javadoc just get an empty jar there. So we upload those. Then we sign the whole artifact, the version, we publish the version, and then once that’s all done we publish the, we okay, we up. We say that’s okay. And then we sync to Maven central and then we’re done. So that’s the whole process that we go through every time someone publishes a new artifactor and version.

So let me show you – what — how we run that production. This runs on Heroku, as I said. We can do Heroku run, only I can do this cause this requires me to be authenticated to Heroku. And then I say NPM pub and then give it an artifact. Let’s do lodash. Let’s go find a version of lodash to publish here. Let’s go to our NPM WebJars. And let’s go to lodash. So here’s my lodash. You can see here there are a bunch of different versions here and if we go lookup lodash. I think I. Find the git repo for lodash, we’re trying to find a version that don’t actually exist yet so we can verify that it actually works. But, let’s see 4.12 was just published. Perfect. Okay so let’s go try to publish 4.12.0.

Here we can say 4.12.0 this is actually what’s happening when in the web UI somebody clicks that deploy button is it’s actually doing this Heroku run, which is spinning up an instance of this app on a new server and then going through that whole publish process. If I can get the command right. Let’s see, what’s my command name. My proc file down here. Ah pub NPM not NPM pub. Let’s try that again. So pub NPM. Okay so now this is gonna run and going to, to do, that whole publishing process and then we should see in a minute that package up and running on Bintray. While that’s running let’s go over to Bintray and let’s go to my, let’s go to that lodash library and we should see. Search for it here, lodash. Somewhere is lodash. I got a rating on one of those. That’s great. Okay lodash. Okay so here’s that. You can see the versions there that are listed a 4.12 already there so I guess we’re gonna republish it but that’s okay.

We definitely could add one that’s not there but it’s nice to be able to go through and explore the different versions that are there. Let’s go see if it’s still working. I didn’t make an output to standard out in this case so I’m not actually seeing the output there. If I had given a publisher key then it would do the web socket thing through Pusher and that’s how we see the output in the web UI but should finish running here but this is already a version that exists so. That was weird that it already existed. I wonder if they, like, re-pushed their tags or something weird like that. Who knows the JavaScript world is crazy. Never know what’s going on. But this is what’s happening when you click that button. It spins up the instance, runs this command and does the publish through that whole process and then once it’s all done be like yea, all right it’s available now, we can use it.

Ok so, got plenty of time. Okay so that is, hey done. Sweet so we finished. So that’s how we use Bintray with WebJars, so we got some time for questions.