Deconstructing the "Unix philosophy"
I’m still working on figuring out how to organize my thoughts about designing software with composition in mind. This might be one of those cases where I have to start writing up a few more concrete essays before the more general ideas start to become clear to me. If that’s the case, I may have to write up some things about category theory before I’ve really fully motivated why you should have any interest in category theory. (Which reminds me, today I discovered there’s a new journal called Compositionality getting started, and it specifically calls out category theory as a tool for thinking about designs that compose. Glad to see this idea is starting to become more well-known.)
But for the category theory-averse: no worries, none today.
A few weeks ago I took a look at how compilers are designed. The big idea there was a very simple composition primitive: just pipelining functions together. For compilers, the design process is a bit more top-down. You want to build something big, like a compiler, but that’s far to big to understand. So you break things down until you’re tasked with something more manageable, like building a parser, and now the problem is small enough to just go solve it on its own.
What I’d like to look at today is a design that works the other way around, for all its similarities. It enables us to break problems down into small ones just the same, but now we want to break things down into small problems that are already solved. Our focus instead is on this question: how do we design such a system? How can we end up with the right pieces around so the small problems are, in fact, already solved?
The “Unix philosophy”
I’m going to pick what I think was the best formulation of the Unix philosophy, which Wikipedia credits to Salus:
- Write programs that do one thing and do it well.
- Write programs to work together.
- Write programs to handle text streams, because that is a universal interface.
Since I’ve already brought up composition, it’s probably no surprise to you that this philosophy is entirely about composition. We want manageable pieces, uniform mechanisms for putting pieces together, and we also want to make sure as many of these pieces work together as possible. Basically, “make Legos.”
One of the reasons I like the above break down is that it’s hard to really take issue with it. We can maybe start to question what “do one thing” means, and there’s the particular choice of text streams as a universal interface, but beyond that it’s completely unobjectionable.
Other formulations start to get into more direct guidance (also from the above Wikipedia link):
Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”. Expect the output of every program to become the input to another, as yet unknown, program. Don’t clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don’t insist on interactive input.
Sometimes this advice might be reasonable and sometimes not.
There’s been much controversy made over whether the GNU core utilities have “violated” this philosophy by having so many flags and the like.
After all, does tar
need a flag for every compression format?
Why can’t we just use a separate decompresser and pipe like The Unix Philosophy obviously intended?
But this criticism largely misses the point. It might be nice to have very small and simple utilities, but once you’ve released them to the public, they’ve become system boundaries and now you can’t change them in backwards-incompatible ways. The old style of figuring out how to remove unnecessary features and make things simpler requires refactoring your toolbox; have fun convincing everyone that their shell scripts will never work next month.
The only time you’re really “violating” the philosophy is when you’re writing shell utilities that don’t work like anything else.
I mean, what is your deal, dd
.
But the risk you take when you go too far in trying to “do only one thing” is that you can end up falling into a programmer-centric “feature-oriented” mindset, and not the user-centric “task-oriented” mindset.
Nobody should ever really be criticizing tar zxvf
for doing what everyone reaches for tar
to do most of the time: decompress a tar.gz
file.
Taking the most common case that people have to deal with and requiring them to reach for a more complicated combination of tools, instead of just directly addressing the problem, isn’t good design either.
Of course, it might be nice if we didn’t have to memorize flags like zxvf
too.
“Everything is a file”
Although I think people frequently think of it as part of the Unix philosophy, the notion that “everything is a file” is actually a separate idea. Likely, it gained some prominence because it was embraced heavily in the design of Plan 9. While the Unix philosophy is about composition, this idea has a more narrow scope.
Part of the Unix philosophy was that utilities should consume and/or produce streams of text. The benefit of this is that more utilities can be used in conjunction with each other. To compose pieces into a greater whole, those pieces need to fit together.
But “everything is a file” is very much disjoint from this. This is less about composition, and more about “re-use.” The idea here is we have a separate interface with a common set of operations (open/close/read/write) we can use to interact with it. The motivation behind “everything is a file” is simply that we have such a large set of utilities for operating on this interface, that anything you can jam into it gets all those utilities for free. And most importantly, that means you don’t need to learn a different set of utilities. You can bring more existing knowledge to bear.
There’s nothing about the Unix philosophy that requires us to in any way embrace the notion that “everything is a file.”
We can have non-file interfaces just fine; all we have to do is build utilities that make use of them in a conventional way.
While Plan 9 can claim some kind of ideological purity because it used a /net
file system to expose the network to applications, we’re perfectly capable of accomplishing some of the same things with netcat
on any POSIX system today.
It’s not as critical to making the shell useful.
Oops, that shouldn’t happen
And treating everything like a file is not without its drawbacks, either.
Back in 2016 we got treated to another semi-regular open source community flame war blow up in the form of rm -rf /
bricked my computer.
The root of the trouble (after we got over everyone’s favorite init system) turned out to be “efivars,” a file system for exposing system firmware variables.
Buggy firmware meant deleting some variables could leave the system in a unbootable, unrecoverable state.
The kernel developer who originally designed efivars weighed in:
With hindsight, it should absolutely not have been a filesystem. There’s very little metadata associated with EFI variables and the convenience of exposing this in a way that can be handled using read and write is huge, but real-world firmware turns out to be fragile enough that this was a mistake.
And eventually a work-around was put into the kernel to protect buggy motherboards from users who thought they were just deleting some files they weren’t going to need.
Well, it’s the easiest way to represent a tree
Meanwhile, Linux continues to use sysfs
to communicate information to user-space; this, after procfs
turned out to be somewhat ill-suited to having such interfaces dumped into it.
But this also seems rather suspect.
The primary rationale for sysfs
design appears to be that the kernel needs to communicate tree-like data back to user-space.
The filesystem seemed like the natural way to do this… objects are directories, keys are files, values are file contents, and arrays are just numbered subdirectories. Anyone who has read my previous article lamenting poor support for just data in mainstream languages is probably spotting something suspicious here. Are we just abusing the filesystem to represent tree-like data because we don’t have the facilities to just… actually communicate that data as a tree? Well, yeah, probably.
And this shows up.
Actually using /sys
directly is usually complicated enough to get right that we still use tools like lspci
instead of ever bothering to look at the filesystem ourselves.
Who cares if it’s a filesystem then?
And as a filesystem, you start having issues if you need to make atomic, transactional changes to multiple files at once. Good luck.
Going beyond text streams
When Microsoft finally became embarrassed enough to do something about how awful it was to administrate Windows machines, they came up with a rather brilliant solution. PowerShell does a pretty decent job of replicating the good parts of the Unix philosophy.
It certainly has its downsides: instead of just any old programs, you usually need to write a special “cmdlet” to make operations available within PowerShell. This means it’s not quite as universal, though it still has the ability to run any process and pass in command line arguments all the same.
But PowerShell largely supports the same kind of compositional design, just with streams of objects instead of streams of text. It inter-operates acceptably with text streams by having some implicit conversions (one of which is almost always used, to display results to the console.)
By and large, I’d say that PowerShell is a pretty tremendous success. It tries to shoot for a somewhat more complicated design by necessity (it has to work with Windows APIs as they already exist, after all), but it manages to deliver the extra features that make that complexity tolerable. PowerShell can actually work with tree-like data, albeit through the lens of objects.
Meanwhile the “everything is a file” philosophy is nowhere in sight. Since everything is an object instead, you end up liberated from having just the one, single file interface. Instead, you can work with objects that implement almost any interface. More complexity, but more powerful, too, and sometimes that can be a worthwhile trade.
I’d take other issues with the design of PowerShell, but I think it actually nailed it as far as “Unix philosophy: the good parts” is concerned, given that extra complexity was necessary for compatibility reasons. Composition is a hugely useful tool in designing a system like this. Re-use is a secondary goal. “Everything is a file” doesn’t benefit our designs nearly as much as the more fundamental ability to compose together a suite of useful tools.
Composition is about types
To compose things together, we need to know how they fit together, and that’s all about types. We can start with type of the basic piece: perhaps a process, with an environment, command line arguments, stdin, stdout, stderr, and a return code. And that’s the basic unit we want to manipulate.
Then we can start figuring out all the ways we might want to compose such a thing. Usually, though not always, we want to be able to create a bigger value of that type out of smaller values of that type. Sometimes, we want to glue together pieces of different types.
With typical shells, that goes so far beyond pipes |
, it’s a bit mind-boggling, really.
You can use the return codes to do logic with &&
and ||
.
You can capture output as variables with $(cmd)
.
You can redirect to files or create temporary pipe “files” with >(cmd)
, so that files can be redirected back to commands without even writing to disk.
And so on.
# paste takes filenames as arguments
$ paste <(echo -e "a\nb\nc") <(echo -e "q\nw\ne")
a q
b w
c e
Once we have figured out all the different ways things can be put together, we just need to start trying to use such a system. Over time, we’ll build up a standard library of all the pre-made little parts we should have commonly available. (Don’t be afraid to refactor, before these APIs became system boundaries, an agile-like iterative refinement process was a big part of the early development of the Unix shell.) In the end, when putting bigger pieces together, we’ll frequently find we already have the smaller pieces we go looking for.
When we think about composition in this way, in terms of the composition operators and the types they accept and produce, it’s no wonder I emphasize tree-like data as an underrated concept in programming languages. Almost any composition is going to look like a tree (or at least a DAG, and maybe sometimes a graph, though more rarely). This is one of the most basic structures we need to be able to represent.
Instead, despite the Unix philosophy giving us nifty little composition-oriented shells for many decades, we have mainstream languages like Java that didn’t get the facilities to compose together small operations in a similar way until 2014 with the streams API.
Sometimes I wonder about us, you know?
End Notes
- Some people respond better or worse to Torvalds’… style. But here’s some of his comments about “everything is a file”