How did we end up with containers?
Last week we looked at Maven and the design of build tools. The key takeaways were:
- The best focus is on building artifacts as a whole rather than on compiling files.
- With artifacts in mind, dependency management comes into scope.
- Using a fixed set of versioned dependencies means you’re using the same tested artifact as everyone else, and not putting together a never-before-seen combination and hoping nobody made a mistake.
Today, I want to look at containers. The thesis here is pretty much that containers bring these same benefits to everyone, no matter what tools or language you’re using.
The further benefits of artifacts
I’ve already brought up how building an artifact instead of an executable is beneficial because it puts all users in the same configuration. Containers have essentially the same advantage. When distributing a container, the user is getting exactly the same versions the developer used. This means that whatever testing the developers have done should apply equally the actual software the user is running. This dramatically improves the ability to troubleshoot and provide support.
For containers deployed to servers, this comes with added advantages to the deployment process. For one, the build process actually becomes less important. If you’re building a container artifact, testing it, and then deploying it, it matters less if your build process is held together with spit and chewing gum. As long as someone can work the magic and get a container spit out, and it works, then it works.
Obviously, a more reliable build process is better, but I’m exaggerating for effect a bit here. My point is simply that how you get containers is less important. Contrast this with a typical deployment method prior to the “devops” era. Your server has some global state, and you run a script that brings down the old service, mutates the server state to be ready for the new service, and brings it up. Hopefully. You know, unless something went wrong in there. Hope you can roll things back somehow? If you even have a rollback script, hopefully it’s tested? Of course, how can you test it any better than that deployment script, which just failed, so who knows what happens next! What fun.
With the container approach, many of these problem points happen largely at the build time for the container, instead of at deployment time in production. So now, when they appear, it doesn’t hurt. A build failed, or it produced a broken container, that’s all. We’re able to deal with this problem long before anything touches a production system. And even if something does fail when starting the service in production, “just start the old container up again” is a rollback mechanism much more likely to succeed. (Obviously, we haven’t addressed other inherently stateful things like schema migrations here, but at least we’ve reduced the scope of where problems can appear.)
Decoupling dependencies
Once again, we look at dependencies as a major focus for containers. Containers obviously bundle their exact (non-external service) dependencies within the container itself. But the even more important part is the second-order effects of this. We end up in a very Maven-like situation, regardless of what tools we’re using.
Outside of the few (usually language-specific) package managers that support it, most package managers do not permit more than one version of a package to be installed. As a result, because each container is a separate project, each project gets isolated from each other in terms of its dependencies.
No longer do you need to ensure you get the same environment in all your applications, all your servers, and all your development machines. All these things can become decoupled from each other. That is a LOT of decoupling.
Programmatic documentation, and security
Containers typically offer an isolated-by-default experience.
This is profoundly beneficial to documentation, as we don’t get working code unless we’ve accurately characterized our dependencies.
Instead of finding just any old thing laying around in /usr/include
or /usr/lib
, we only get precisely what we asked for.
This ensures the necessary dependencies are accurate.
(I’ve shipped a lot of code with what I thought was good documentation until a user informed me that a script needed wget
or something and this apparently wasn’t installed by default. While it’s not containers exactly, Vagrant was huge for helping me actually verify things like this. And make things more precise… turns out -headless
is helpful in avoiding unnecessary dependencies for Java in Debian-derivatives.)
This default isolation also helps further document behavior. If you want to expose a port to the outside world, that needs to be documented in the container metadata, otherwise nothing will be able to get in. Likewise, if your container needs access to external resources, you need a mechanism to start passing in those configurations. This also helps discourage building these assumptions into the application.
Something else I don’t see commented upon often: I think containers offer the best security model we’ve had in a long time. Previously, besides the standard DAC permission model, we’ve had tools like SELinux to more stringently lock down the permissions of the code we run. But this has had adoption problems, in part because it’s complicated.
And things that are complicated, besides having trouble getting adopted in the first place, are also things that are easy to get wrong.
But containers offer essentially the same benefit, but in a completely obvious way. Instead of trying to label files appropriately and ensure an application does not open files it’s not labeled to access, we can confine the application inside a container that doesn’t even see files it’s not allowed to look at. The security benefit is near-identical, but the design is such that it’s obvious how everything works, and what’s wrong when something doesn’t. (There’s not some complicated security policy denying you access, you just didn’t map it into the container, duh!)
So are containers great?
It’s not enough to compare containers to traditional sysadmining and say this is better in comparison. To learn more about design, the important part is why it’s better.
Here’s the thing: other than the security model, everything that makes containers good is really a workaround for our other tools doing things badly. Partially this is a long way to say: yes, containers are great, that security model bit is nothing to scoff at. But consider a better world:
- There’s nothing about building native code (or any code) that prevents us from building artifacts. We’d need a build tool that’s designed to produce
dpkg
/rpm
/other packages the way Maven is designed to producejar
s, but that’s completely possible without containers. - There’s nothing about our libraries or compilers that requires us to dump everything into a pile of global state in
/usr/include
and/usr/lib
. We could absolutely have our build tools be packaging-aware. - There’s nothing about our libraries that requires us to hard-code magic paths to
/usr/share
, there’re perfectly capable of figuring out where they were loaded from and finding their data files in the appropriate relative locations. - There’s nothing about our distro’s package managers that requires there to be only the global system state and no other local installations whatsoever.
- There’s nothing about native packages that requires packages to be non-relocatable, and perhaps simply installed to
/usr/pkg/arch/name/version/
by default rather than having a required installation location.
In short, there’s no reason why apt
and dnf
or whatever couldn’t work in a similar way npm
or bundle
do, except that Linux distributions seem to be disinterested.
(And occasionally throw a specious fit about “bundling” when features like this are proposed. Or are unwilling to understand why the ability to install multiple versions of a package is necessary.)
I’ll have to stop here before I write a rant.
End notes
In the past, I’ve had a few discussions with people who didn’t understand why people are excited about containers. I hope for those people this pair of essays was illuminating. There are certainly problems with (e.g.) Docker, but the underlying idea has a lot of merit. For people who already got why containers were a thing, I hope I managed to bring some interesting design aspects to your attention.
In the somewhat distant future, I might continue this line of thinking and do a case study about the design of Kubernetes. (It seemed obviously The Future even before I saw Kelsey Hightower do some amazing canary deployment demo I can’t find the youtube video for anymore.) But next week, I’ll get back to some more practical software design advice.