Best practices in docker continuous delivery

Abstract:

Carl Quinn / Software Architect at Riot games, May 2016: This talk briefly covers the basics of Docker containers and images, and then delves into detail on best practices in building and deploying with Docker using JFrog Artifactory. Docker introduces a whole new way of packaging and deploying applications. And common with new technologies, there are many knobs to turn and approaches to take when using it in production. We’ll go over how to think about and organize your services into containers, and then talk about the various options in building images and their trade-offs. Finally, we’ll discuss strategies for image publishing and distribution.

 

Talk Transcription:

Good. So today my talk is on Docker best practices, continuous delivery — the slide is not exactly right. The title is not quite right. So a little more about the build and delivery of Docker containers and images.

So I’ve been doing build tools for quite a long time now and continuous delivery tools and Docker came along and we jumped on it and I started playing with it a couple years ago and a lot — a lot — of the practices that you learn doing really good practices around just clean and hermetic management of artifacts still apply to Docker containers. But there’s a few extra little twists and some details that are more interesting that are that I would talk about and share some of the experience I’ve had in the last couple of years sort of building out a production deployment of Docker containers and clustering for Riot games.

So the agenda today I’ll talk a bit about sort of a simple introduction to containers and images for Docker. I don’t know how many people here already familiar with Docker and containers and building images and all that. How many people are not super familiar with it? A few. Okay. So I’ll go through the first section pretty fast then and just – just — to set the vocabulary. And we’ll talk a little bit about organizing containers just some philosophy around what’s the best way to build a container, what to put in it, what not to put in it. How to craft images.

This is what I’ll put a little more time talking about into what to put into the image. What to actually build it and how to construct it. Some different tips there. Oops sorry, I was going to finish this guy. Then we’ll go a little bit more into delivery and images how to get them, manage your repos and set things up that way. Then finally the distribution. So how to — how to — get all of your images out to production in a way that’s efficient. And some things you might want to do and keep track of.

So containers and images. So the Docker container analogy is really great with all the shipping containers and whales and things. We tend to use the term container for a lot of different things. And a lot of times we mean image when we say container. You know that’s okay as long as people understand the context of it. But sometimes when I’m — when we’re talking about things in detail — we want to be a little more precise. So we’ll go into some of the details specifically which part is the container and which part is the image.

So containers encapsulate runtime state. So in Docker the container represents sort of the scope, the sphere, that the process or a set of processes has access to in the operating system. And Docker works with the kernel to set up the environment for your container at runtime. And it can control certain access to certain resources like CPU, and memory, and block IO, and network IO, so that a particular container is limited in how much it can get or how much it can grab from the entire machine, right. So you can use it to constrain, contain, and prevent a particular container from taking over a machine. When Docker first came out it was very limited in what it could do and it’s getting more and more robust in the number of things it can actually manage and control through C groups with Docker.

Another cool thing about containers is that they’re functional. If you build a container and you design it really well, sort of like a microservice, you can think of it as a mathematical function. So once you’ve set up your container you’ve configured it, you’ve given it some environment variables, maybe some arguments and when you do the Docker running — start the container. It’s running. It’s kinda a black box you don’t — files really don’t go in and out, it’s using the filesystem a little bit for storage. But the basic idea is that it’s — it’s a function in a sense of on the network so, you know, data comes in on the network and ports are hit and queries are made and data goes out on the network so it’s, it’s a nice black box that’s functional.

So containers are started from images. So the image is effectively a snapshot of a disk of an operating system or some small subset of an operating system with files added to it. So the process of building an image you start with some kind of base, you start a container on top of that base so that container’s running — it’s actually running — and part of your Docker file that you used to build the next image tells Docker what to put in there. So you can add files to that — to that — running container you can copy files in, you can run commands in there, like apt-get or Yum update to get the files in there you want, and then when you’re finished you sort of snapshot, and you save that image out to disk again. So now you have the next layer on top. So images are like onions, like ogres, and they’re built in layers like that. And each layer that’s above references a layer below it. To say I’m built from the guy below me and that guy’s built from the guy below me all the way down to some kind of root image that’s bootstrapped from an image – from a loaded OS.

So this — this — concept is sort of a chain that’s good to keep in mind cause when you’re loading your images into an artifact repository later on, and we’ll see, all those layers have to get pushed in there and managed too. But you tend to think of the images when you’re working with your particular container and the service you’re trying to run you really care about the top one that you’re referencing, all of the other ones sort of come along with it based on these little back pointers.

And when you’re ready and once you’ve built your image you have to put it somewhere so the images then use a Docker push command to identify the image you want to push, you push it into the registry. That image, that layer, goes into the registry and all the other layers go in as well. If they’re not already in the registry to start with. So the Docker API for saving images has understanding of each layer being sort of a hash of the contents of that layer and the push command for Docker then interacts with the registry. To understand which — which — of those, sort of, binary hashes are already there. So if the layer’s already present it doesn’t have to push it again. So it’s, it can be efficient that way cause it can identify that it’s already there. So if you, for instance, pull down an OS image, add a file to it, then push your resulting image back up, it’s just gonna push the delta, that small layer on top. So the sort of ideas to keep in mind when you’re organizing things.

And then when you’re ready to run and create a container you run it by referencing an image that you want to run so you can pull the image down ahead of time to have it on the machine or you can just say run directly and Docker will pull it sort of implicitly. But in either case it does a pull for you. So the pull is the action of bringing the images down from the registry, put them on the local disk so they’re cached by Docker and then the run starts a container from that image. So you can start many containers from the same image and have them running on the same machine and they’re just sort of ephemeral snapshots — runtime environments based on that image as a starting point.

Okay, any questions I guess I can take questions as we go. We’ll see how the time goes. So I went pretty fast through that part.

So now we’ll talk little bit about organizing containers. I know when Docker first came out people thought. Oh. Oh yeah. Go ahead.

[Audience Question]

Yeah so the question I should repeat is how is the Docker image compared to an AMI? So an AMI, I’m pretty sure, ends up being a complete snapshot of the final state of it in case of a Docker image it — the Docker image can reproduce the entire state because it references all the things it needs, all the way through the chain. There’re actually ways of exporting that image as a single tar ball if you want to. But typically the way you work with it in an ecosystem of Dockers you just keep the layers separate and all the tools take care of grabbing the dependencies for you on the fly. It makes things really efficient and we’ll see that as we go along.

So when — when — Docker first came out everybody thought well Docker is just like a virtual machine. It’s just like a VM image, why is it any different than that. I’m gonna put my entire giant application, entire machine, into a Docker container. So that’s one way to do it. And other people said, no – no — this should be like microservices. We should put tiny little things, we put one process in every container. And so they have these two camps that have been battling it out a little bit. So there’s sort of a more of a conceptual diagram of that. So in the one side you have little microservice that has kind of one process in it, pretty much one process — definitely one service, a thin amount of operating system files that it needs, not a lot, because usually maybe it’s a small runtime, maybe it’s a Go program that just needs […], or maybe it’s Java but you can run just Jerry, headless Jerry in there and not have a whole lot of OS around there.

On the other hand you might have a case where you do want a whole appliance. Like you want to deliver something that’s completely self-contained. There’s pros and cons but in this case, in the appliance world, you can put multiple services in there you might need to run something like System D, take care of supervising your processes and restarting them. In this case the appliance approach treats your container as kind of a self-healing, self-running black box.

Now, sort of, in the microservice world it’s really nice the containers are lean and fast and small but they really depend on orchestration on the outside. So if you’re going to deploy 20 instances, one microservice, 30 of another, 40 of another, and hook them all together you need something to actually do that hooking together, right, so you need some kind of platform underneath. Maybe it’s just Docker composed things together one time or maybe it’s Swarm or Kubernetes or Mesos but you need some kind of orchestration tool to do that. So as long as you’re working in that environment you got that available, this is the best choice. And this is really, I think the future of Docker, the future of, you know, dev ops with containers and delivering things fast and small and replacing them this is the way we eventually want to go.

But there’s sort of this stage right now where the schedulers are competing in kind of a crazy way they’re sort of like cluster wars going on. Different technologies trying to solve this problem. So there’s no clear winner yet. There’s no clear way to deliver like an appliance as something like you’re gonna deliver your appliance with Kubernetes, you’re doing to deliver it with a Docker compose file. So I think it’s okay to deliver appliance container for these certain use cases. If you are wanting to deliver something that’s self-contained, you can hand to a customer and say here’s a Docker image that will run all my stuff. And this […] how Artifactory runs itself with nginx together and Java it all just works that’s a nice use case so I think that’s fair in this time frame. Hopefully these cluster wars will settle down and we’ll have more universal way of describing a configuration, or a topology, of a stack that we want to deploy. So I think that will be coming.

So any questions on that little opinion piece?

No. Okay. We’ll talk a little about crafting images.

[Audience Question] One of the things we’re kind of dealing with is how we deal with events using different microservices and the best way to handle control when one is completely dependent on another service.

Right. So events between them?

[Audience] Yeah

Yeah so I think that’s part of the whole ecosystem of bringing up a cluster is you need a number of, like, infrastructure pieces that are part of the clustering system, you know. Some of them might provide it some might not but you need some kind of way of doing service discovery. And you also need to make sure when you’re designing your microservice apps that they’re really sort of 12 factor, right. So they need to be able to come up and, one of the factors I can’t remember, is having to do with being resilient to a downstream dependency not being there. So becoming passive, waiting for the downstream dependencies to come back. So the order of instantiation of your services shouldn’t matter. So otherwise it becomes so complex for an orchestrator to even deal with starting a sequence of microservices. It’s better if they’re all robust and self-contained and it can just bring them up and glue themselves together automatically. But you do need some kind of elite services discovery which could be DNS based or you can be […] based or you can have your own microservice that does discovery like with Netflix with Eureka.

Okay. Crafting images. So I think one of the main things, so Docker — Docker’s web page and the documentation they have a nice page on tips on Docker file best practices so a lot of the ones I’m mentioning are there. So that’s a good resource to go to. And I’ll put the link up on the screen later on.

But I think, one of the ultimate goal is to keep your images lean and mean. All right, so keep them small. You don’t want to put things into the images that you don’t need. So why is that? I think the three main things you’ll see is security, speed, and size. And letting stuff into the containers you don’t know about or don’t need just because it’s quick, I mean it’s fine when you’re doing development definitely don’t worry about it like you’re building stuff on your desktop, just grab the base you want. But at some point you’re going to go to production. This is something you should be thinking about is let’s start trimming it down and pick the best base to use and pick the best kind — what of the packages we really need to add, which ones do we not need to add. And do some work in getting that pruned out.

So the three why’s. Security. So having more things in your container at runtime than you need increases the chances that some of those things having problems. So just like on stage this morning with a — with — the scary, you know, old libraries and the bad jars, and the scary malware in there. There’s gonna be stuff that may end up leaking into your container that, I mean, you’ll identify as a vulnerability, maybe with X-ray. Say look there’s a thing inside your container, we know that this is suspect and there’s the security flaw in it and you need to replace it. But as a developer you might know I’m not actually using that component. But the app set guys say that you have it in your container. You don’t know at a company-wide level or a tooling automation level you’ll see these risky components inside your containers and you won’t know if they’re being used or not. Because they’re there so the best thing to do is just to not put them there in the first place. So trim down your images so that they only reference the things you actually need so that you’re not churning through a lot of security things. So make it easier for the security guys to find the real problems.

The other why is speed. The bigger and fatter your images are and the less organized they are they just going to take a lot longer to move over the network. So this may end up being, for you, if you’re doing international global deploys there may be significant latency in deployment around the world. This just may be just overloading your internal networks because you’re just moving too much stuff around. Or maybe cost if you’re paying for WAN links that you are now have to upgrade because you’re shoving fat stuff through there that doesn’t really need to be there. So that’s — that’s one — another reason to keep that in mind.

The last one is sort of related to speed but it’s a space. So on the actual Docker host the Docker daemon’s not that great in dealing with lots and lots of layers and it can get clogged up pretty easily if you don’t do periodic garbage collection. Some of the cluster managers, like Kubernetes have sort of a built in garbage collector and probably Mesos as well. They can tell when images aren’t being used on the host anymore and clean them up but if you’re running something simpler and you just want to do simple deploys when your Docker compose and you don’t have automation around that it’s pretty easy to overload the disk, and actually I mean fill the disk up over time especially with build, we’ve had this with build nodes, where just every few days the build node will go down. We don’t know why. Well it’s full of images. So cleaning that up is something that you wanna, that you’ll need to deal with but having lots of fat images that use up too much space just makes the problem worse.

So how do you solve these problems? So, it’s all about the base. All your base. So the first, there’s few rules about selecting a base image. A base image is really important. You want to get an official base and earlier this morning they talked about putting your bases in Artifactory and I think that’s fine. That’s a good idea too. We haven’t gone that far yet. We’re still grabbing official bases from Docker hub. And I think that’s okay as long as you’re really conscious about only grabbing the official ones.

So if you want CentOS, get CentOS from CentOS, right. Don’t get some guy’s copy of CentOS that might be a little bit neater and trimmer and he put some cool stuff in it but you don’t really know where it is. Maybe you tested once and you don’t know what it’s gonna be like six months from now when you pull the next version of it. So you just want the standard ones. You do want to lock to a specific version and there’s a couple techniques for doing that. And if you’re just building directly from, like, Docker hub you wanna just specify and pin down onto the tag that’s the complete version so that you know what you started with and so you have effectively a version — version chain you can follow.

Now Docker unfortunately has this ability, it doesn’t have immutable tags. They’re always mutable, effectively they can change but something that’s tagged version 7.2, you know next month might be still 7.2 but could be different bits. But you do the best you can and tag to the finest grain version that’s available. You can also pull, like, so if you are using Artifactory or some other local internal repo as a cache, you can control that snapshot and do that yourself. So you can pull down the official version, that’s the latest one you want, and lock it down, maybe add your own tag to say this is the one we used on this day. And then build from there. Always build from one that you know about.

And then standard tips about stay near latest. So always try to keep up and always continually update and rebuild against newer versions to make sure you have all the security updates and performance fixes and everything. And also choose the best base to start from. So don’t always just so grab the biggest Ubuntu or CentOS you can find that has everything you need. Makes it easy, cause all the stuff you might ever need on your host — on a real — host machine is just gonna be in the container. You probably don’t need all of it.

So you might want to look at some of the suggested bases. I gave a talk like this internally a couple of years ago when we were first grabbing Docker and started to work with Docker and we created a list of recommended bases internally. And now I notice when I reviewed this the other day they’re a lot smaller than it was two years ago. So I think these guys who are building the official builds of standard distros are getting smaller and smaller. Cause they’re aware they don’t need to put everything in the kitchen sink into the base. So they’re making them leaner. These traces are all really good. You know if you’re a CentOS Red Hat shop, the CentOS is really good. We’ve typically used Debian for a long time. What we thought was a small […] had everything. But it turns out that the newest Ubuntu is smaller than Debian so — I don’t — those are really good choices.

Now Alpine is great if you really don’t need a lot of OS at all. So this is the base image that we use for all of our Go tools that we built. So a lot of our internal infrastructure runtime is built in Go and then we just build an Alpine based container and deploy it. So we end up with a Go binary that’s maybe 10 megabytes and Alpine is another five so we have these images that are 15 to 20 megabytes for the entire image, for the whole runtime. So that’s pretty good.

And also some people have played around with just building from scratch. So you can say I just want to build from scratch, no image at all, you have to put every file on there that you need. So you can do it with some languages, like Go, or you can build statically. Pull everything into your binary. There’re often other files that are missing. You will often will need your CA certs, some things like that, it’s a bit of an exploration to get that done. Figure out what you need at runtime, that you’ve missed, but if you really, really trying to make something tiny this is, that’s the way to start.

Yeah. Question.

[Audience Question]

Right so if somebody like what you’re saying the language providers might have a base. I think different languages will be different in this case so with Go, there really is no runtime other than […]. So, cause Go’s compiled so you have a binary that’s native code. So you don’t need — you don’t really need a Go runtime. You could pick, maybe if the Go guys had provided one, like, here’s a stripped down Debian that has just the G lib and the other libraries that Go needs that might be a good, that might be a good choice. I’m not aware of that. Other languages, like Java, need the jvm. So having like jvm running on top of Alpine, if it’s coming from an official distro, maybe from Oracle, that would be very interesting. Very compelling to look into.

[Audience Question]

No that’s true, that’s a good point. You may end up with different roots that way. You have to trade that off. I will mention that in a couple slides. So I suggest generally building a small number of root bases, or I call them your own base. So this is where you might grab the particular Debian or CentOS. Just do a really simple build, do the update and add any kind of small things that your company might need. Everybody in your company. And then save that as one image. And then put that in your prod repo area for everybody else to use as the base and go from there. So this is one good technique for sort of locking down, and having a well-known root image that everybody can reference.

Maybe have two. So maybe some guys like Debian some guys like CentOS based ones. Cause different teams use different tools. So then you can have two. But, you know, two is better than having 25 because everyone did their own at different times and they’re all a little bit different. This way you don’t want equivalent but binary different because that’s just a waste. And as you mentioned, this is the slide. I don’t know how easy that is to read. The idea is that when you’re building Docker images, they reference their parent so at the end of the day you have something that looks like a tree. Where your container image that you’re actually running your applications in are at the top and they reference maybe a base image that has some library things in it and then another one which is sort of your company’s blessed image which then references the official image. They go on this tree. If you can consolidate to a small number of roots and shared intermediate bases you end up having a lot less copies that are similar but binary different. So that’s your point so reducing it to a small number, if you can build everything on top of the base distro that you like and you prefer then that’s great.

[Audience Question]

I talk a little about some things we can ask questions at the end. I’ll try and go through and then if I haven’t covered everything.

[Audience] That’s very interesting.

Yeah okay. Was there another question in the back? Yeah.

[Audience Question]

Yeah. Right. Okay. So keeping track of the versions of things. What my team has typically done and this practice is sort of common around our company but it’s not set in stone is to just put suffixes on it. So like if you’re gonna grab CentOS, you know, 7.2.1511 and you’re gonna build it and add some stuff to it then we’re gonna call that 7.2.1511 dot one. So that’s our extra thing on top of that. And then as you build the layers up above you just, I think, just build incremental build number suffixes at the end to keep track of what you’ve done. And sort of, you know, the history of which version of Java on top of which version of CentOS, with which version of some other thing, it can get really complex. So you don’t have to encode everything into it as long as you determine the sequence and uniqueness of it. I think that’s sufficient.

[Audience Question]

Yeah. Okay I’ll talk a little bit about that going forward. I have suggestions but no, like, definitive answer but —

[Audience] Why?

JFrog’s got some nice new tools that we can talk about later on.

Another trick to kind of reduce the number of tiny — tiny — little images that really, layers sorry, that don’t add much value is to squish all your commands and your Docker file into sort of add them together. So each run command in the Docker file, or each add command in the Docker file, ends up being a layer. And so if you do add this, add this, add this, you end up with twenty layers that just adds more and more indirection. It ends up — if they’re small well you think they’re not very big but there’s a bunch of metadata for each layer and each layer ends up requiring a transaction to go get it across the network. And if you’re doing things in kind of a haphazard sequence in your Docker file, everybody’s Docker file’s gonna be a little different and all the layers are going to be a little different so the recommendation from the Docker guys is your basic stuff should be kind of sorted, you know alphabetically, squished together into a small number of commands. If you think about it by now we’re building our images with shell scripts. In some way it seems like we’ve gone backwards. But it is very flexible and simple and the Docker file is so linear that this is really a nice, reasonable bite size to deal with.

Okay so now that we know how to craft our images and put them together, we’ll have to deliver them. That was very creative with the clip art, don’t you think?

Yes, one question?

[Audience Question]

Right, yeah, so transferring the work. I think they approach the problem differently […], Chef, Puppet. They’re approaching a slightly different problem, they’re trying to take a very large system, like an entire host machine that’s maybe running many apps with many shared libraries and converge it into a new state. Where with Docker, you’re always building images from the ground up. So you never have to worry about mutating any files or changing things or the fact also you’re only running one thing on that image, in that image, so that’s why you want the container to be small. You don’t have to worry about conflicts because there’s only one use of it. So I think because it’s a much simpler problem the solution can be much simpler so really it’s more like a make file. It’s just saying start from here and add these things. So you can trace it back from zero bytes all the way to the stuff you put on there. And you know what’s gonna be on there is exactly what you said. You don’t have to worry about mutating anything. So, I mean, I think you can learn from what you’ve done with Chef and Puppet and how you’ve built your systems and understand how it’s gone but probably won’t get much value from trying to reuse those files.

So delivering images. So how are you going to deliver the image into a way where it can be shared? So not deploying but just the delivery part. So it’s important, everyone knows here at this conference — everybody knows — you need to continuously integrate, build things fast and get them out the door. But part of this is some, as it was mentioned just a bit ago, how do you keep track of who depends on what. And there is not really any magic solution for that now. Some tools like X-Ray can do this, you can also use Artifactory and keep track of metadata and dependencies so that or you can even create your Jenkins builds to actually follow the chain of dependencies for your base images, right. So when you have, when you find a vulnerability, or you just decide we need to move to the newest version of Debian or CentOS that’s everybody who’s depending on us needs to get rebuilt.

So you want that to be automated. So you can either do that with a chain of Jenkins jobs, or you can do that with Artifactory, and looking for tags, and trigger your builds again, but it’s important to set some process up around that. So you can find all the apps that depend on the things that you realized needs to be changed. And you want that to just be automatic. If the team that’s in charge of the base image says, oh we have a vulnerability, we have a patch, let’s supply the patch and rebuild. It should be automatic all the way out to the app teams — dev teams — get a new version of their image automatically. Without having to be involved if they don’t want to be involved.

So then, where do you put the images when your, when you’ve built them? So there’s a few choices. People here probably already don’t know these. So there’s open source registry that’s available that came out, been around, it’s been updated since Docker first put it out. They now have an enterprise product, the Trusted Registry. Which is a good choice. But I like Artifactory and this is the right place to say I like Artifactory. For me?

I’m not even getting paid to say that, although I should be, right?

For me, the JFrog guys know how to build binary repositories that are efficient and scale. And Docker is just another kind of artifact. Even though it got some weirdness’s, some funniness with its API, at the end of the day it’s just some binary artifact that you can identify by its hash, and put metadata on it, and move it around. And so Artifactory does a really good job with all that. If you use Artifactory, it’s a great Docker repo. If you’re starting from scratch, you’re a small company or if you’re just some guy who wants to set up something small, you know the open source one is fine. And maybe if you’re a shop that’s already getting some of the enterprise products from Docker, then maybe their Trusted Registry would be a good choice.

So the first thing you want to set up is decide where your source of truth is going to be for your registry. Your repository. Artifactory. And make sure that has sort of an appropriate level of availability for you. At least have some kind of resilience back in store so something with maybe RDS and S3 if you’re an Amazon or have a good link to Amazon. Resilient file system and resilient database. And if you need it to be available to all your developers, you know, around the clock, then you want to use the HA version of Artifactory which will give you enough redundancy and active fail over so you’re not going to lose your master repository when you need — when things are going crazy and you need to push a new thing into production. So you want a nice robust source of truth in your master server.

So this is what we have at Riot. We have one set up in our main data center that we run in Las Vegas and that’s where we keep all our masters and our main copy of stuff. We’ve actually organized our repo sort of around kind of the use case. We have an Artifactory cluster just for Docker because that’s kind of the use case for that. We have other Artifactories we use for other purposes and we’ve divided it up that way, I don’t think that’s necessary that kinda depends on the size and scale of your teams and who’s running what. So that worked for us because the team that, my team and our large team kinda does the clustering and the Docker stuff so we were running the one that we wanted to just do Docker with.

There’s one thing that we’ve discovered the hard way. This is kinda funny. That it’s a good idea to divide up your repository into sort of a dev section and a prod section. The dev section will have, that is just the default place where stuff goes out when it’s produced in the build and it’ll have a lot of churn, developers are building things CIs running fast. And it will be full of things that you don’t necessarily need to use. They may be just broken. They may be totally fine but completely replaced, you know, an hour later when the next CI rebuilds so it’s just gonna be noisy and churny. And you want to create and carve out a little area, a separate little logical repo that this is where our prod stuff goes. And so once you edit things and you’ve run through testing you can then have a promotion process that — that — copies stuff over — 10 minutes okay — that copies things over into the prod repo. So this will give you some benefits later on. Makes it easier for people to find things and then when we get to distribution that makes that a lot simpler too.

Yeah question.

[Audience Question]

So yeah. So we’re building — uploading — I would say you would use Artifactory APIs to put a label on it and then promote it to the other logical repo within Artifactory so it’s a zero copy. So generally you don’t need to rebuild it. It’s up to a […] and you don’t need to upload it so it’s good.

There’s another thing that we found out the hard way too. Team permissions. We manage the central repository for lots of different teams. This is really hard to read. But we figured out that — in case people are not aware of this feature in Artifactory — the permission object is really powerful. And that’s really all you need to do is to create the representation of a team. So in the permission object we can assign a permission object for a Docker org, sort of the Docker image organization. With wildcards here — that’s definitely hard to read — and then add the users to that. Can add them with scripting. In that case I made a team. That D team. And I put myself in it and we have permissions to do everything we want to in our repo. There’s a group object in Artifactory which is really handy for integrating with LDAP but if you’re not doing it with LGAP — LDAP — then you actually don’t need it. And it saves a step, makes things less complicated. We found out, yeah, so for us that’s something we learned.

Yeah. Yeah go ahead.

[Audience Question]

I’m just gonna mention it right here. I don’t. Yeah.

[Audience Question]

The permissions that are available in Artifactory are what we leverage. So I really didn’t get a good shot of it in here but so the team members can have different — you can manage, you can delete, overwrite, deploy. There are a few different permissions that are built into Artifactory and we just use those so that’s fine. We definitely have like dozens of different teams and each team has all the engineers and members of the team that have access to the repo and they pretty much all have rewrite, full access to it. And then we have tooling that is sort of more role based. And the tooling then ends up being read only. So our actual cluster deploying tool is read only access to all the repos. So they can deploy.

I’m gonna go a little faster and then we can talk, get more questions later on.

It’s important to clean things up I don’t have a perfect solution for it. We had this problem, crops up everywhere I go and you kind of come up with some — with some — basic ideas. So splitting the repo into two helps a lot. Because your dev repo has a lot more churn and you need to be a little more aggressive about cleaning things up. And you can do things like detect if a particular image is never promoted, never referenced and completely made obsolete by a new one that’s been superseded. That can be deleted right away. You can set up a nice TTL for things once they’re unused for a certain period of time you can clean them up. And if you segregated images into prod then, okay, your prod TTL can be much longer. You can reference into production to make sure that things that are still running still have their image available. So you can cross reference that way. So, these are not complete solutions but just things to check.

[Audience Question]

All the metadata is available inside Artifactory so that you can find out all this information. So you can either write a user plug-in to do it or what I’ve done in the past is write a Jenkins job that gathers that metadata. So I think at Netflix we did one was a combination of user plug-in that gathered the metadata, and produced a dump, and then we pulled it into Jenkins, and examined it, then did a bunch of deletes with a command line tool. So the CLI. So now Artifactory has got its own really nice CLI. You can do probably a lot more of the work in the user plugins to find out the candidates for deletion and tag them and then have the tool go and get rid of them. Yeah.

[Audience Question]

Well okay there you go. Very good. Okay so the Artifactory plugin can give you an expiration. That’s very good. And then I guess you would supersede it if you promoted something, right, and you want it to last a little bit longer.

[Audience Question]

So I can, let me finish talking about distribution and that might answer your question. So I’ll talk now about how you distribute the images to prod. So remember, I sort of recommended what we’re doing is we want one master source of truth. So that’s where all the metadata, really that’s a source of truth for the metadata. In our case we have lots of global data centers that we push our images out to all over the world. We have our little clusters running and we need to get to the images. So we do push replication and we use, I’ll show you the details of that but the idea is we want copies of the images we’re gonna need in prod to be pushed out into prod and be available so that when we’re bringing up nodes and Docker needs to run or the machine goes down and we need to restart the same containers on a new machine — that — we want that to be fast.

So we want to reduce the latency for a redeploy or the cluster manager needs to move the image around a starter container somewhere. And we also want to reduce the WAN bandwidth so we don’t want to be pulling images over international network continuously we want to push it once and have it cached on the remote.

We use the Artifactory multipush replication so whenever an image gets pushed into the master we replicate it out with sort of a trigger based push. Works really well. One of the mistakes that we made. I’ll show that in a second. This is just really handy you can also have a cron-based. In case some of our — missed — dropped in the network, the cron-base will do a sweep every night or every few nights to make sure that all the images get pushed over. So that’s just a handy feature that’s built onto Artifactory. And we push out, we’re now to seven remotes and we’re going to double that so all over the world we’re pushing.

So this is a key thing, so one of the reasons that we’re going back and re-architecting the way we built our master is that we didn’t do the segregation initially. So we’re replicating this all over the world which is clogging our pipes. And we recommend only replicating that all over the world. That’s one of the reasons that it’s just so slow to push all that stuff we want to bring up a new small datacenter, maybe in GCE, for a team that needs a copy of everything. It’s gonna take a while to move those big piles of stuff over they don’t really need. So having it lean and mean makes it more agile and quick to bring up new over repositories where you need them. In our case we didn’t think about it we brought it up, it was working and then our own success everybody started using our repo and it got really full. So we’re running now like 125,000 images and we can’t tell what’s what. So we’re going through the exercise of cleaning this up.

And that’s it. Questions?