Keith Kreissl, Mac Heller-Ogden, Zlatko Sisic, Deep Mistry – Cars.com, May 2016: Explore the advantages of an immutable infrastructure and how this is achieved with Artifactory’s binary storage, including Docker Registry, a hosted repository for Alpine Package Manager, and a hosted NPM repository. Join cars.com on our inaugural journey down this path by witnessing how we were able to re-engineer their rendering engine into a node application within a Docker container. All of this is possible through a build pipeline consisting of Jenkins and Artifactory.
[Keith] Hello everyone. How are you guys doing today? You guys are having fun out here in Napa?
[Keith] Yeah, all right. My name is Keith Kreissl. I’m principal developer at cars.com. And with me today I have Zlatko Sisic, Mac Heller-Ogden, and Deep Mistry. All senior developers. And there’s a couple of things we want to get out of the way before we start this whole talk. And number one: we do not sell cars at cars.com. We also do not take a percentage of sells on cars.com. And mostly, most important, is that we do not get discounts on cars. So if anybody in here wanted any of that, unfortunately I can’t get that for you. But we hope you stay and enjoy the talk anyway.
So what do we do at cars.com? Cars.com is the leading online destination for car shoppers and owners. We offer creditable and easy to understand information to help consumers research, price, and find new and used vehicles and quality service and repair providers. Now that I got my marketing obligation out of the way, let’s talk about what the technologists do at cars.com.
Let’s take a look at some of these metrics. We are a company that was founded in 1998 and the most obvious choice for handling a lot of these metrics up here was to use your typical JDE stack. HTTP server, then an app server, then Java, of course, running on enter name for any Linux host you want to do it on and then it is all communicating with the database. So, but over the time, as you can see, we’ve actually moved up to some pretty significant numbers. And with that, we had to boost up our infrastructure, right. So that can all be described with this pretty architecture diagram.
Now, I don’t expect anybody to be able to read this, especially the ones in the front. So, just take my word for it up here. You know, we have a rendering tier, we have a composite tier, a services tier, and then a data store tier. Now over all those years, we’ve had many — we’ve had many architectural decision makers come in and leave. And what did they leave us, well they left us with two flavors of an HTTP server, so we have IBM and Oracle. We have two flavors of an app server. We have WebLogic and WebSphere. We have two flavors of Java runtime. Which is your Oracle and IBM. And different versions. Yes we are still running one-five cause we have an IBM portal. So if anybody wants to talk about that we can still do that. We also are using Java 7 and we’re looking at doing a little of Java 8. And then when you look at the OS we have Oracle Linux and we have Red Hat Linux, and I’m pretty sure some of our Linux engineering have some IBM AIX boxes running out there.
Now you might be saying, that’s not too terrible for one environment. But we have five environments. We have a dev, a future testing, an integration, a performance testing, and a prod. So take all of that, multiply that by five. Now okay, still not too bad. What about all of the applications we have? We have about 10 web apps and about 100 services that we’re running at any given time. And really what happens is that, it turns into something more like this.
So at any time you’re like, so hey, where’s the problem at. I don’t know cause this is what we have to do. Cause we have to manage all those systems, all those VMs. So, I know, you’re like, hey Keith, guess what the Chef guys are here, why don’t you go talk to them, they have a solution for you. And yes, we do use Chef. And Chef has helped us out a lot, tremendously, in fact, for procuring all of our VMs. However, we have to talk about the people that make all the magic happen, right? That’s the developers. That’s every one of us that are in this room; that are using a laptop to write all the code. So now, as you can see, we have laptops which are running either NGINX or Apache. I have an app server that’s going to be either Jetty, Tomcat, Spring Boots, the new one. That’s what everyone’s loving. Then I have different versions of Java. Is it the Oracle? Is it the open JDK? And then let’s not talk about the OS that we’re running on, which is going to be either OSX or Windows.
So you can see that the problem keeps compounding. And now that I’ve become the lead of our dev ops team, the most common thing that I hear every day is: hey, works on my machine, why doesn’t it work over in X environment? And it’s like mind-numbingly frustrating cause I’m like, as you can see, figure out where the problem is.
Enter immutable infrastructure. We at cars.com are striving for the ultimate encapsulation of our application, the runtime environment, and OS so that we can have the utmost confidence that the one, the artifact that we create on our machine or on automated machine will be the same one that goes through every single environment and the one that is running in production.
We’re looking to achieve an infrastructure that’s predictable, scalable, and has automatic recovery. You know, we really want to promote that same artifact. I don’t want to have to decide what part is changed. I want to be able to say, this ran here, it’s gonna run over there. I also want to be able to handle rises in traffic, right. I don’t want to have to go and procure more VMs, I don’t want to have to go in and run all the Chef commands and run all that stuff. I want it to just work. And, of course, what happens if a disaster happens? What happens if something just doesn’t work? I want to be able to rollback with efficiency. So how am I going to do this, what’s gonna do it for me?
Well Docker, of course. Right? It’s the magic bullet for everything. So, why Docker? Well, it gives us a nice, state isolation. I can put all my components within one image and make sure that it’s okay in every single environment. I don’t have to — I can build it autonomously. I can get automatic deployments. Where I’m moving this artifact, this image that’s running this container that has an app, a runtime, and an OS all inside of it. And make sure that if any one of those pieces have changed, it’s not a big deal, right? I just rebuild my image and then I can run it in a different environment. I have fast recovery. Right. If something goes wrong, let’s say a container starts going haywire or maybe I don’t like what’s happening in the container, then I just can spin up one — a previous one that was running.
One thing that’s really big for us is blue/green and canary releases. With this, I want to be able to try new features out. I want to say, hey I want 10 percent of my traffic to see this and the rest of the 90 percent to see the other one. How can I do this efficiently? Well just spin up a couple of containers, add it to the load balancer, and boom, there you go.
Also, I don’t know if you guys have this problem, but if you’re trying to change your technology, if you’re trying to change your language. If you run in Java and you’re like, hey I wonder if Node is going to work. We spent a lot of time going through all this discovery to make sure, hey is this going to perform just as well as the Java. Right? Well with this approach, we really don’t have to do that. You can just spin up a couple containers, put them side by side, and then see which one performs better. With that, I just shut it down with the one that’s not working.
And finally, it’s hot. Right? It’s sexy. It’s all over the place right now. Everybody wants to do it. They talked about it this morning. We were just at another presentation with Riot Games. They were doing it too. It seems to be the future.
So, now that you know all this about Docker, how cool it is and how it’s going to help us, I guess the next thing is we really have to figure out what our use case is. You know, like, how can we actually do this. So, we decided to take a look at our rendering engine. Our rendering engine was your typical JDE application. It was running, it was nice, it worked. And then we decided, you know what, let’s put this in Node. And not only that, let’s take the runtime and the environment and let’s all stick it inside of a Docker container. So this is what we want to do. We’re going to take this application, we’re going to give it a runtime, and we’re going to stick it right inside of Docker.
But then when we started doing this, we started realizing that we have a couple other new challenges. And to explain those new challenges and how we were able to solve them, I’d like to invite Zlatko up here to give you more background on that.
[Zlatko] All right, Keith. Thank you for providing some insight into our complex infrastructure. Well, I’m pretty sure that we’re not the only ones with that issue here. So, at least, we are taking some steps to sort of simplify it, a few steps at a time, I mean, we’re doing a pretty good job at it at the moment.
So, now what do we have here? We have a new rendering engine written in Node, we want to Dockerize it and deploy it as quickly as we can to production. And basically how did we start with it is we have a team of developers who are cranking out new lines of code. They’re simplifying the application, refactoring, building, creating new nodes and NPM modules. So they have to put it — or we have to actually ship it with Docker.
And what we noticed with our certain set of tools, which are very basic, I mean, from bash scripts, to tar zips, and so on. We noticed that every distribution that we used was lacking those tools by default. So regardless of which one we used, we still have to go and tweak it a little bit to make it work with our environments and with our requirements. So at the end, we really wanted to have something which was very lightweight, it’s very functional, and reusable by all teams.
So after evaluation and some POCs, we figured that Alpine Linux is the best choice for us. So Alpine Linux provides some great utilities out of the box. It’s very small. It’s about five megabytes in size. You can use Alpine package manager to install additional libraries, which are also provided by APK repository. And so our base image is about 12 megabytes in size. Which we then used to install additional — to layer out additional base images and create different runtimes.
So now we have a Java runtime. We have different versions of Node runtimes. We have Tomcat instances. We have different web servers, if we need them, just in case. And they are all very, very, I mean, very small. As you can see some of those, like Tomcat it went from 357 megabytes down to 181. And then on Node you have two different types of images, latest and slim. I mean, it went from about 300 to 55 megabytes. Our entire image, with the application, is about 80 megabytes in size.
So how did we achieve it? I mean, we used best practices like everybody else is using. That you can find online, which is consolidating commands, basically chaining them together. Installing applications, installing binaries from sources, removing those binaries, cleaning up all the temporary files. And one of the things we also did was to enhance security on those images we downgraded the actual user privileges for the runtime. So we don’t want to run them as root. Yes many people say, hey why don’t you run it as root, it’s just a container. Well, there are some additional privileges which could be utilized in case of a security breach. So, if we take it down — I mean, if we lower the privileges, and run it as a certain user, we also have to make sure that those users can execute runtime so we had to change permissions in those binaries. And we noticed also by carefully rearranging commands, as far as changing to the user and then extracting binaries and what not, we were able to take those image sizes even further down.
So having the runtime, having the actual image created with the application, we have to figure out where to store them. We have the application, we have all those different Node modules, we have the Docker images. Also on the other side we have Java developers building out new applications. Jar files, var files. We also have some static files. And also, some of our Linux admins, I mean, as Keith already mentioned, we use different flavors of Linux. And everyone uses — every one of those uses a different package manager. So we wanted to consolidate all that. We wanted to figure out what can we really use instead of having five, six different applications. Maintaining them. Is there anything we can use as a single solution? And Artifactory came to the rescue. It really turned out to be a one-stop shop for us. It really provided everything that we need.
We were able to save Docker images. We were able to use NPM repository. Maven repository. There are different — it really integrates great with Jenkins. It — one of the very great features is — really great features is the ability to define multiple repositories and combine them. Like what we’re doing here is we’re create local repositories for all our artifacts. Be it snapshots or releases separating it out. We’re also creating remote repositories which are caching all the external artifacts that we are fetching from the internet. And then have the ability to create a virtual repository for that type of the artifact. Where we can combine, or layer those local and remote repositories so when we can configure, for example, our Java — Docker images, or local Maven instances, or Java, I mean, Java environments. All we have to do is point to Artifactory and let the Artifactory make the decision where the artifact is going to be written from.
So with that additionally, we utilize metadata support. We tag images and then use it in the build process as well as utilize REST API to retrieve them. So with those basics — with some of those basic approaches now I’d like to invite Mac to tell us more about some custom scripting that we developed first.
[Mac] Thank you Z for explaining how we set some of this up.
With all this context, the next big question we faced is how were we going to handle some of our builds and deployments. We have all these artifacts, we’ve figured out where to store them, we’ve optimized them, that’s great, and now how the heck are we actually going to use them. I think that’s a problem that everybody’s looking at. And there are a lot of solutions that are out there right now. But all of them are kind of in the early stages of development. And that’s the problem that we faced as we looked at that landscape, you could actually see on these websites that there were disclaimers on the homepage, like, for example Docker’s forum had a statement that’s like initial development has not settled. Like, expect big changes. Right? And each one of the solutions we looked at also only solved part of the problem.
What we really needed was something that could serve our immediate needs, without locking us into a specific tool yet. Something that could buy us some time. Right? Something really simple. Cause we didn’t have a lot of needs in order to rollout this initial Node application. We got one application we want to containerize to prove to the organization we could do this. Right? And what do we need to do? We need to be able to handle some scaling needs, do a few dynamic things with ports, and we need to be able to handle rolling deployments. And to get started, that’s probably about enough. Also, we’re really trying to achieve immutable infrastructure so it’s all well and good that containers can, you know, encapsulate application state but we really want immutability in the thing that’s handling the builds and handling the deployments too. Like, at least to some degree we want that to be the same thing running on local developer machines that’s also running on the machines that handle the builds and handle the deployments. All right.
And finally, it would be really nice if we could have a tool that provided a very simple, declarative syntax. Some configuration file that I can store with my application source gets built with the artifact, so that all that build logic or runtime logic is actually shipping with the image.
All those things would be really nice so I thought to myself, what kind of tool could help me with these really basic needs. And I thought back in time and I happened to remember there was this little tool, the original build tool, called Make. I looked around and, you know what, this should be a Make file. And everybody kind of knew it. That was good enough for right now. We need to fill a gap right now, and Make was the guy.
So, you know, I just sat down for a couple of hours and wrote a Make file. It’s — I don’t expect you guys to be able to read everything there. I had to size it down to a four point font. This is actually an older version. Believe it or not, it’s about twice as big now. And that is a problem we’re going to have to work on at. But looking at this Make file I was thinking, you know, I want this on my local developers’ machines and I want this on the machine that’s going to handle the builds and the deploy. I want this everywhere. But distributing this everywhere, that’s a problem, for sure. It’s way too big and you’re going to get drift. And it’s — that’s the opposite of mutability, right.
So we found a way to wrap it in a simple little entry point and distribute it a little bit more cleanly. And now the file that actually lives in your application source, looks a little bit more like this. This is a very simple example, but this gives us exactly that declarative configuration that I was talking about. It’s deceptively simple. It just has a name. That’s our artifact repository that’s set on that registry parameter there. Some port bindings. It has it own — it has a little bit of its own syntax for port bindings. It may not look familiar if you use Docker run port bindings but some environment variables, some volumes. The actual one that we use for our rendering engine is a little bit more complicated but not a whole lot. It’s pretty basic, right.
So that file is called powertrain.mk file and it sits in your project root.
So I’m going to give you just a really, really high level peek into what the actual CLI on this tool looks like. Which will give you a little bit of context for the demo that we’re going to do in just a little bit. So there’s a powertrain build command. Now, this command, along with the rest, I’m going to show you, you’ll notice you’re not going to specify any flags or any kind of information on them about what you’re running because that’s all in the configuration. That’s the whole point of things so powertrain build is just going to do a build but those parameters there will be filled in from the configuration file.
There’s a powertrain publish command and that is going to go ahead and tag your image with the information that’s in that file and then it’s also going to go and push that into the registry.
And finally the deploy command which will do a few things. It’s a composite task that’s going to go onto the Docker host and remove any exited containers, making sure that we’re able, in like the case of Docker swarm, for example, to allocate ports the way that we want to. It’s going to pull down that image because you’re assuming you’ve built the image on a different box than you’re deploying it on. So the image needs to be pulled from the registry. It’s going to run that image. And then it’s going to go and stop any containers that were there before it. It also can be parameterized on the command line with how many instances you want. And there are dozens of other options that you can put on there for runtime if you want. So now that you have a little bit of insight into this makeshift tool, pun intended, that we’ve created for our builds and deploys, I’m going to invite Deep Mistry up here to talk a little bit more about our CI/CD pipeline and how we’re using this in Jenkins.
[Deep] All right. So thanks Mac for that introduction on how we’re building and deploying these new […] applications that we have, right.
So let me just walk you how we’re going to deploy this to production, right.
So talk about CI/CD, some of the tools that we use, right. So we obviously use the usual suspects when it comes to CI/CD. We use Jenkins as our CI server, we use Bitbucket as our source control, we are using, obviously, Artifactory, as you’ve heard Z mentioned earlier, it’s our one-stop shop for all our registry needs so we use Docker registry, we use the Alpine APK, and also all of our Maven and Gradle artifacts, right. We also use Docker for our CI/CD and not just to run our application container but we use Docker containers to do unlimited testing during your CI/CD pipeline, right. One thing you don’t see up there is Chef. We use Chef for […] off our Jenkins slaves. And also Docker hosts themselves.
So, talking about the continuous integration workflow, right. So it’s typically, you know, a developer will make a change into the codebase and they will commit the code and then push it to our source control, right. So once the code is pushed to Bitbucket, it’s gonna trigger a web hoop with Jenkins, which is then gonna start one of the build jobs. Now what that build job is doing is gonna pull down the repo from Bitbucket and, as Mac showed you in the slide earlier, the source does not — obviously it controls the source — it also has the Docker file as well as our powertrain config files, right. So now Jenkins is going to use the same powertrain build commands onto the CI servers. So as you can see, it can do powertrain build, and that’s going to build our Docker image.
With powertrain you can also overwrite some of the variables which you specify in the config file. In this case on our CI server we actually use a version variable and that’s just to like give it a different version than what’s within the source files. This way we can actually track that version in Artifactory. So we use a, something like the combination of build number for Jenkins and the git hash that triggered this commit, right. That triggered this build. So Jenkins build this image, right, and once the image is built, it’s going to spin up the actual container. The application container. So it’s a web app so it’s going to spin up that web app onto a Docker host and then it’s also going to spin up a test container. To run some automated testing against that application just to make sure everything is fine. And what the changes we’ve made are not breaking some of the — some of the core functionality. So once this test it completes, and if it’s successful, we go ahead and push this image onto the Artifactory, right.
So, now, this is also a pipeline, right. So once the build is complete, it’s going to trigger our deploy jobs into one of our environments so by default we deploy every commit into our dev environment, right. So the deploy job basically, what it does is, it actually, so what we use in Jenkins is we configure all our Docker hosts […] onto Jenkins masters as SSH slaves, right. So what the deploy job does is it SSH onto our Docker host and run the same powertrain scripts that you can run on our local laptop onto the Docker host.
So, as you can see, like, it can do powertrain deploy with the same version that the build number, just build the image with. Right? One of the cool things we can do with powertrain is now, we can extract the config out of the image itself, out of the build image. Right? So keep in mind when the — when Jenkins was building the image, it pulled down the repo, so I already knew what my powertrain config looks like and what are my powertrain files. But on the Docker host, we don’t have any of the source, right. We just have the image. So what we can do is we can use powertrain’s extract config command and that’s going to pull — that’s going to extract all the config files we’ve stored for powertrain for the environment. And then you can just provide which — which config files you want to run with or which run configuration you want to start your container with. Right? So this is pretty powerful. With this, what it can do is you can pretty much run your container in any run configuration from whatever you want to, right. You can do this from your local environment and you can do this on multiple environments, right.
So, this is how we are deploying to our various environments, right. Like Keith mentioned earlier if we had five different environments so this is how we are achieving the immutability for the application when we move across environments.
So, we’re just going to try and show you, like, what are pipeline is like. We’re gonna like – Mac’s going to help me show you just a little bit.
[Mac] So I’m just going to put a commit on this repo here, in order to kick off a build. So this is the actual primary repo. I’m going to add a line to the readme and go ahead and add. Commit that. All of our plat […]. Of course all our developers have to have a […] ID. So I guess I have to put that in there too.
[Deep] All right, so this push should like go ahead and kick off one of our build jobs.
[Mac] Here’s the push and now we can go ahead […]. Come over to our browser here and I’m on the pipeline page for rendering and we’ll refresh. Take it away.
[Deep] So yeah it’s actually doing the same powertrain commands which you just saw earlier in the slides. It’s building the application image. One thing to note that is when we actually build the image. So people who are familiar with Docker, we’re actually building the application and running unit tests for the application within the Docker file. I mean, as Docker instructions in the Docker file. So as part of the Docker build process we are actually building the application and doing unit testing as well. The reason we do that is we want to make sure that the application is built and tested in the environment which is eventually going to run in, right. So this is the same image, it’s going to run in the container on our Docker host, so we should build and test on the same image, right. So that’s what it’s doing right now.
So while these are getting built we can go take a glance at our Docker registry in Artifactory. Oops. So, as we mentioned earlier we have a bunch of base images that we created earlier. And also all of our application images are all available here in Artifactory. Here’s a little slower WiFi. So. So there you go. We — we — pretty much store every commit load so as you see in the CI workflow, every commit gets built into a Docker image. So we have, we can go back to, we can just go, look into the Bitbucket server, go to the commit history and see like which commit, what build is associated with that commit. So if you want to rollback, you can quickly just go to Artifactory and pull that image down and deploy that.
One of the cool things from Artifactory that we really used earlier was the Docker info tab. So earlier when we started building the application, we started with the base packages available like Ubuntu or Debian. And they were obviously a little bigger in size. So we saw like how we can see within Artifactory, like, every layer and we can see what, how much space that layer is utilizing. And that really helped us like troubleshoot how we can, you know, improve our Docker files and reduce our some of our base image sizes.
So as you can see as Z mentioned earlier, the application image is only 88 megabytes. Earlier with a different base image it was almost like 500 megabytes. And since we are actually creating an image and publishing for every commit, it’s really helpful if you can reduce the size even further. So, yeah, there’s a very good tool that we can use from Artifactory.
So we’re going to go back to our pipeline. Hopefully that build is done by now.
[Deep] You just have to refresh it. Oops, it’s still running. The reason it’s also taking a lot of time because as I mentioned earlier it’s also doing some testing. Automative testing with it. It’s spinning of the test container and also the running all the automative tests against it. So this is obviously now going to trigger the deploy to dev so every build automatically triggers our deploy to dev environments. And from there on if you wanted to deploy toward future testing environment we have like a push button deploy rule where QAs will have to go ahead and if they want to test a particular — particular — build that just want to move that build into […] environment by just clicking that button over there.
[Mac] That Jenkins job might be backed up right now. We have 70,000 […].
[Deep] Yeah. Hopefully. So —
[Mac] Do you want me to push the button?
[Deep] You can just push it, that’s fine.
So basically the idea is like you build your application image, you know, whether you build it on your local system or you build it on a CI server. You should be able to just move that image up to any environment you want and with powertrain you can really do that. Because you can run that container with various run configurations. So maybe dev might need some other service dependencies or other database dependencies so you should not be changing application just because you want different dependencies for your environment. So — so yeah. I want to conclude like that this is how we are achieving immutability within our applications. And just —
[Deep] You wanna just, you want to look at the build queue. Like I don’t know how big it is. There we go.
[Mac] Oh yeah, the build queue’s backed up.
Yeah. As Mac mentioned there are developers committing all the time so that’s one thing we are looking at: like scaling Jenkins itself. We just have one master right now. We’re looking into, like using multiple masters and cluster that a little bit.
So to conclude, like, I would like to invite Keith back to just talk a little bit more about what’s the future of Docker at cars.com and how we want to take it forward.
[Keith] Yeah it seems like we’re going to put the emails in the CI/CD pipeline for developers not to touch it when we’re doing the demo.
Well, anyways. So now that you see that we have this brand new tool, we have this great conceptual idea of immutable infrastructure and of course there’s Docker and this hunger to learn and really expand on it. We are taking smaller steps. We’re really trying to iterate on this process. So many things are changing in this space that we just want to make sure to see what’s out there before we, like, jump on the next technology. However, when we find that we need a certain new piece of functionality that’s when we’re going to add it back into the whole CI/CD pipeline.
Well, let’s talk about the future of Docker at cars.com. When I started this talk I said that we have 10 web apps and about 100 services all in Java. We definitely want to be able to move all those over into Docker containers and have a whole Docker ecosystem of all of our applications talking to each other.
But I want to take it further. I really want to take the idea of, you know, it shouldn’t matter what’s running inside of a container, right? If you have this idea that your container can take a web request and then respond accordingly, then it doesn’t matter what’s in there. I can put whatever I want in there. And it just opens up to so many more tools that you can use for any kind of problem. So, maybe you want to do Node. Maybe you want to do Scala. Maybe […] is your language of choice. You should be allowed to use those and not be constrained with your infrastructure that you have in place. If you start building out your containers and your images in this point of view, then you should have no problem spinning something up, tearing it down and trying new things. Which is real exciting for all of us, especially as technologists, to see where this can take us.
And so as we get to the end, and we conclude, you know, we really thank everybody in here for listening to us and hope you guys have learned, or, you know, taken some stuff away from what we’re doing or if you have any pointers for us we’d love to hear it so from this point on I’d like to just open it up for questions, I can bring everybody up here and then we can just take it from there.