The Perils of Implementing Functor in Java

I’ve recently been spending a lot of time re-imagining the core constructs I would want from the Java programming language to consider it truly fit for enjoyable functional programming. One of the most glaringly missing interfaces in Java 8 is something like Haskell's Functor type class - a theoretical super-interface of Function, Iterable and friends, Optional, and many, many other parameterized types - and its corresponding #fmap function, a conceptual implementation of which in Java might look like:

public interface Functor<A> {  
           <B> Functor<B> fmap(Function<? super A, ? extends B> f);

The implementation is straightforward enough: any parameterized type that implements Functor must provide an implementation of #fmap that can take a function from its parameter to a different parameter, and produce a new Functor around that new parameter. An example for Optional might be something like:

public <B> Optional<B> fmap(Function<? super A, ? extends B> f) {  
           return map(f);

That’s right - Optional already supports this operation. It’s Optional#map, and, in fact, fmap is really just a generic #map that isn’t coupled to a particular concrete type like Optional or Stream. However, although both Stream and Optional (and again, conceptually many others) define their own respective #map operations with effectively identical signatures, there is no common interface that they conform to that guarantees this capability.

This is where a Functor interface comes in handy - once this core interface is defined ("core" albeit only in the conceptual sense, since Java doesn’t support ad-hoc polymorphism, fundamentally excluding core constructs like Stream and Optional from Functor membership without byte-code manipulation), more useful concepts like BiFunctors (for things like tuples) and ProFunctors (for things like variadic functions 2-arity and up) can emerge.

Looking again at the function #fmap takes, it can be seen to be covariant from A to B: it expects the current Functor’s parameter A, and produces some other parameter B that will be wrapped by a new Functor. The important point about covariance in this example is that the unknown type by the current Functor is B, not A (remember, A is the Functor’s own parameter - it’s known). This isn’t a problem because #fmap’s function is in charge of producing the unknown B type at each call site, since the method itself is parametrically polymorphic. However, this will be important later, so keep this in mind.

Now, the concept of Functors - and more specifically, #fmap - exists in many other languages, although - as is usually the case - Haskell arguably implements it best. This is #fmap’s beautiful signature in Haskell:

fmap :: Functor f => (a -> b) -> f a -> f b

{- and some examples: -}

instance Functor Maybe where  
  fmap _ Nothing = Nothing
  fmap f (Just a) = Just (f a)

instance Functor [] where  
  fmap _ [] = []
  fmap f xs = map f xs

Briefly explained[1], this function states that it takes a function that produces some value of type b from some value of type a, and takes an instance f of Functor parameterized with a, and produces the same instance f of Functor, parameterized with b. This is apparent from both Maybe (anemically implemented as Optional in Java) and [] (Java's closest counterpart being perhaps LinkedList, although this is a terribly generous stretch).

I mentioned above that another example of a Functor is Java 8’s Function interface - but Function has two parameters. How can this be reconciled with Functor, which has one parameter? Which parameter is it? Traditionally (which is to say, I’ve never seen an implementation that diverged from this, but I can’t speak in proofs), a Function<A, B> (where A is the input and B is the output) is a Functor<B> - so it is a Functor over its return type, not its argument. This actually makes a lot of sense in practical usage, but what does this mean for #fmap? It might be implemented as follows (using the interface MonadicFunction, since, of course, Function is Java’s and its signature is therefore sealed):

public interface MonadicFunction<A, B> extends Functor<B> {  
    B apply(A a);

           default <C> MonadicFunction<A, C> fmap(MonadicFunction<B, C> fn) {
               return a -> fn.apply(apply(a));

Seems fine, right? It certainly compiles, and it acts as a form of function composition: we started with two functions, and now have one that executes both in succession. However, we have a problem.

Look again at Haskell’s fmap. Now look at our Functor's #fmap. Now back at Haskell’s. See the difference?

The arguments are reversed.

Because fmap in Haskell is a function, and not a method, it needs to receive both the function to resolve B from A as well as the Functor instance it operates on. This is a key distinction between the way functions receive arguments and the way methods receive arguments: with functions, the ordering of the proverbial "subject" of the function and its arguments can be defined arbitrarily due to the blurring of the lines of the importance of each (they’re both just arguments, after all). With objects and methods, however, the method needs to be invoked on a particular object, and that object needs to be resolvable deterministically (this is referred to as method dispatch, and is a separate can of worms for a future blog post). In Java, the object a method is invoked on is implicitly resolved (by preceeding the ., as in a.toString()), which is where the symmetry between Haskell’s fmap and Functor#fmap in Java begins to break down.

Since in the Java implementation #fmap is defined on a Functor interface, the Functor itself is the implied first argument to #fmap, and the mapping function is the second argument (or only argument, as it were). In Haskell, the mapping function is the fist argument and Functor is the second argument.

If it’s not yet obvious how this is a problem, let’s look at how a function is an instance of Functor in Haskell:

{- (->) is "function" -}
instance Functor ((->) r) where  
           fmap = (.)

What is that (.)? It’s the compose operator, and looks like this:

(.) :: (b -> c) -> (a -> b) -> a -> c
f . g = \x -> f (g x)  

It’s possibly the most commonly used operator in Haskell[2], due to function composition being such a fundamental operation in functional languages (arguably the core building block to code reuse).

Here's an example of (.) in action:

doSomeMath :: Num a => a -> a  
doSomeMath x = ((+3) . (*2)) x  

What does that function do? Does it double the result of adding 3 to its argument? Or does it double the argument and add 3 to the result? It does the latter, because (.) is right-infix (evaluates right to left). This feels normal when reading code, but throws a giant wrench into the symmetry between our #fmap in Java implemented on MonadicFunction and fmap in Haskell implemented for -> (aliased to (.)).

Here’s why: because Haskell’s fmap receives the mapping function as the first argument and the Functor instance as the second argument, and because the function being mapped over is the Functor, fmap can be implemented as simply an alias for (.) and everything works. In a mathematical notation of functions f, g, and their composition f(g(x)), this makes the function to the left of (.) f, and the function to the right of (.) g.

However, because Functor#fmap's mapping function is covariant from Functor<A> to Functor<B> (and therefore from MonadicFunction<A, B> to MonadicFunction<A, C>), and because we're already starting with MonadicFunction<A, B>, the reverse compositional operation must be performed to maintain type compatibility.

So for a MonadicFunction f, f.fmap(g) is actually evaluated as g(f(x)), whereas in Haskell, fmap f g is expectedly evaluated as f(g(x)). This is an unfortunate and frustrating asymmetry that will likely cause confusion at first, due to the previously established convention of function composition evaluating right to left, and the fact that as a method on Functor, #fmap is inherently conceptually left infix.

We can’t even logically decouple #fmap from Functor and make it a standalone Method object (arguably the closest thing to a first-class function we can get to in Java, and most other OO languages), leaving Functor as marker interface, without coming up with some way of unifying the general interfaces and delegating to subtype specific implementations - which is, of course, the whole purpose of Functor and #fmap in the first place. And here, our mental stack overflows.

So I suppose the only option is to acknowledge the reverse composition of #fmap in Java - or else, wait for the human brain to support tail-call elimination. Who knows, at this rate, it might even come before HotSpot’s.

[1] Less-briefly explained, Haskell’s fmap is actually much more powerful and useful than the best we can do with Java due to Java’s limitation of non-parametric type parameters: i.e. given two type parameters, T and V, V cannot be parametrized with T, resulting in V<T>. This is a [serious] limitation of Java's support for parametric polymorphism, although not necessarily a non sequitur thanks to overloading (Java allows increased return type specificity in subtype implementations without clashing: i.e. Optional#fmap returning Optional<B> more specifically than Functor<B>, Stream#fmap returning Stream<B> more specifically than Functor<B>, etc.).
[2] According to the statistics I just pulled out of thin air.

A tale of type variance and the asymmetry between method dispatch and function invocation.