On 'function coloring'
A modern classic is the 2015 blog post “What color is your function?” Let’s talk about it, because it’s about async, and that’s been our theme for a bit.
The argument
The “color” of a function is a metaphor for segmenting functions into two camps: async and normal functions. The idea is that async functions can call any function, but from a normal function, you can’t call an async function. This creates an artificial divide between functions, complicating… everything.
I’m sympathetic to this argument, because I encountered a similar problem when I was first learning Haskell many years ago. I’ve always leaned towards “printf debugging,” so of course, as soon as I needed to get some insight into what was going wrong with a function I was writing, I wanted to add a print.
But adding a print makes that function do I/O.
So now that function’s type changes, because it needs to return IO a
instead of a
.
And all functions that call it need to be changed, because those are now doing I/O, too.
And all functions that call those functions, and so on.
It’s infectious.
I understand the skepticism about programming this way. The infectiousness is legitimately a problem. It’s just… this particular problem is also a “newbie” problem with Haskell. Eventually, the problem goes away. Or at least, it seemed to for me.
My theory on why this problem eventually stopped being much of a problem is:
-
Over time, as I became a less neophyte Haskell programmer, I adopted a better testing methodology for figuring out what’s going on. I stopped relying on a “run it on some inputs, and oops, maybe add some prints to figure what’s going on?” strategy, and instead wrote tests against smaller units. There’s nothing more easily testable than a pure function. So I no longer experienced the “pure function suddenly needs to become impure” problem anywhere near as much.
-
Over time, I developed a better appreciation for separating I/O from the rest of the program logic. Thinking of programs in terms of a relatively small amount of I/O code, supported by the “meat” of pure (or at least purer) functions, helps alleviate this problem. You tend to go into writing a function having already paid more attention to whether it will involve I/O at all. This division also has other design benefits, too. We’ve discussed some of these previously on this blog.
But even though the “coloring” problem with monadic I/O is much reduced once you mature as a Haskell programmer, it does still linger in corners.
Haskell has map
and then also needs mapM
to “do map, but monadically.”
This repeats for almost every generally useful higher-order function.
Say hello to filterM
and foldM
and zipWithM
and… and these are just for lists.
So it’s obviously not perfect. While Monads are Haskell’s thing, these kinds of problems are a pretty integral part of why researchers are looking for other ways to manage effects in a purely functional setting.
So is this a problem for async?
Let’s rewind a bit.
An async function is just one that returns a Promise<T>
instead of a T
.
So… isn’t this just a function that returns a different type?
Does every function have a color, if we’re distinguishing between those that return String
and those that return Integer
?
Well, not really. The problem here is all about composition. There are a few different kinds of composition problems:
-
The usually stated problem is that you can’t usefully call one type of function from another. Sure
Promise<T>
might just be another type, but if what you wanted from it wasT
, you’ve got a problem. You can onlyawait
for it from an async function. You can’t write a perfect function fromPromise<T> -> T
. That’s a greater restriction than the function “just returns different types.” -
Function combinators (higher-order functions) may no longer do something useful with the different types. If
filter
wants something that returnsBoolean
and you’ve got a function that returnsPromise<Boolean>
, you’ve got a problem. -
These types start nesting. You may not be looking at
T
vsPromise<T>
but instead facingList<T>
versusList<Promise<T>>
. You may have just even wantedPromise<List<T>>
, but again, it didn’t work out that way.
Escape hatches
Much of this argument rests on the inability to call async functions from a non-async context.
That is, we can go from T
to Promise<T>
, but without being in an async function, there’s no easy way to go from Promise<T>
to T
.
But there are actually a couple of ways to do exactly that.
The less interesting version is that you do always have to option to just block. You can call a task runner on that future, and synchronously block until that specific future completes itself. This isn’t the greatest approach, because we probably don’t want to block, but it is an option.
Another interesting escape hatch comes from a lack of absolute purity. Unless we’re writing Haskell, we probably always have a bit of global state about, such as a (perhaps thread-pool based) task runner. If our non-async function doesn’t have to return a value based on the results of the async function, then it can happily give the global task runner a new async task to complete later. Thus, the non-async function can nevertheless cause async work to happen later.
And there’s always the trick Go uses behind the scenes in its green threading runtime. If you’re going to block, first signal the task pool executor to create a new task-running thread, because this one is about to go out of commission for awhile.
Functions have color anyway
But one last thing bugs me about this function coloring argument. There does appear to be a small obstacle to just composing any old function together. But the trade-off here is one of forcing the issue vs potentially silently doing the wrong thing.
Two weeks ago we looked at a short function that tries to perform two concurrent queries:
async function example() {
async function branch1() {
return compute1(await query1());
}
async function branch2() {
return compute2(await query2());
}
var (x, y) = await futures_join(branch1(), branch2());
return compute2(x, y);
}
How do we write this in Go, using green threads instead of promises?
The first thing we need to do is remember that query1
and query2
potentially block.
If we forget, we would naturally write code that will silently do those queries sequentially instead of concurrently.
And since we want to aggressively call e.g. compute1
when its query comes back, we still have to create functions like branch1
.
Then we can go
each branch, and wait for the two separate results.
(I’ll ignore the added fuss about channels vs return values here.)
So this looks a lot like await
.
So we’re definitely not talking about anything simpler here overall. Green threads aren’t easier than async functions. The function colors don’t go away, it’s just that exposing them in types creates an occasional hindrance.
Exposed asynchrony
But in the end, the function coloring argument has unavoidable legitimacy: you can’t perfectly call async functions from non-async ones, and this creates the problems described above.
The next question is: are those problems worth it? Just because an approach has drawbacks, doesn’t mean other approaches don’t have worse problems.
Last week, we talked about different concurrency models. One of my points is simply this: the world is asynchronous. When the CPU communicates with the disk, that’s asynchronous. When it talks to the GPU or NIC, that’s asynchronous. When it communicates with other machines, or even when your program is just communicating with other processes and threads on the local machine, that’s all asynchronous.
And yet we get this programming model where everything is synchronous.
And then we get these arguments that tell us, oh no, there’s this function coloring problem, let’s definitely just still continue to pretend everything is synchronous.
I’m not sure I buy it. Exposing asynchrony in our programming model does create a distinction between two different kinds of functions. But that distinction always existed, it just wasn’t visible. It was just always handled on our behalf.
For OS threading, that handling was blocking. For green threading, that handling is hoping that a true blocking case doesn’t sneak in somehow, because this is all handled behind the scenes.
Promises and async/await make it explicit. They do so with very small runtimes, and a relatively minimal “function coloring” problem. (Especially minimal if you buy the argument the I/O should be more segmented off from the rest of a program.)
It’s absolutely debatable whether making something explicit is better or not. One is not always universally superior. But when making it implicit means that we’re writing programs in a synchronous world-view that’s entirely at odds with how the world actually works, then I become skeptical that implicit is better.
Maybe it’s better if our mental model for how our program works is more in line with how the world actually works.