Playing With Pointers

One step closer to reflective equilibrium.

Tree Traversal in O(1) Space

I’ve been reading some texts recently, and came across a very interesting way to traverse a graph, called pointer reversal. The idea is this — instead of maintaining an explicit stack (of the places you’ve visited), you try to store the relevant information in the nodes themselves. One approach that works on directed graphs with two (outgoing) arcs per node is called the Deutsch-Schorr-Waite algorithm. This was later extended by Thorelli to work for directed graphs with an unknown number of (outgoing) arcs per node.

The Deutsch-Schorr-Waite algorithm does not require adding any extra fields to the nodes. Adding fields to the node structure sort of spoils the fun — you could, for instance, have a pointer pointing to the parent of every node making stackless traversal a cup of tea. And you haven’t really saved any space! Thorelli’s approach, however, requires adding a integer to each node that can hold at least the maximum number of outgoing arcs a node can possibly have. But it can traverse graphs with more than two outgoing arcs per node. And you do save space here — if you have three outgoing arcs (maximum) per node, you could do with just a two bit integer. On x86, that’s the last two bits of a pointer.

Note that both the approaches are quite a bit more complicated than a vanilla DFS. Nevertheless, there are applications where such characteristics (of consuming only constant space) becomes important. One of the more visible ones is a GC. A GC has to traverse a (potentially) very large and complicated graph of objects, and it can’t use up too much memory (or call-stack space) when collecting — a time when memory is already a scarce resource.

I like implementing things, and while I did not implement the full algorithm proposed by Thorelli, I did implement one which traverses a tree in constant space. What more, it does not need the extra bits Thorelli’s algorithm requires.

The code is here (I believe in making code talk instead of giving long, winded descriptions): https://github.com/sanjoy/Snippets/blob/master/Traversal.cc.

The snippets repo is an idea that I recently got — I often code small things when I’m bored or want to try something out. Having a public “snippets” repo where I upload all that smelt good.

Haskell’s Fixed Point Combinator

About this post

This is one of those posts I write to ensure I actually understand a concept. Read at your own peril. Some knowledge of Haskell is required.

The Problem

One Hello World problem that is sometimes used to demonstrate a programming language is generating the Fibonacci series. The nth Fibonacci number is given as:

fib 0 = 1
fib 1 = 1
fib x = fib (x - 1) + fib (x - 2)

This, as it turns out, is one way to define the nth Fibonacci number in Haskell. Save this code as Fib.hs, start the GHC REPL as ghci Fib.hs and you’re done — fib 5 evaluates to 8 and fib 7 evaluates to 21.

However, as cute this may be, it isn’t elegant enough. Fibonacci is essentially a series, and while mathematicians will tell you that a series is essentially a function (and that “is essentially” is a transitive relation) this definition does not, yet, sit quite well. Of course, you say, something like this can be done:

fibSeries = map f [0..]
            where
              f 0 = 1
              f 1 = 1
              f x = f (x - 1) + f (x - 2)

Not only is this approach a hack but is also quite slow. One nice, canonical, way to define fibSeries is this:

fibSeries = 1:1:zipWith (+) (tail fibSeries) fibSeries

Note the recursion here — the series of Fibonacci numbers remains the same if we take a list of two 1s, and append it with the elementwise sum of the list of Fibonacci numbers and the list of Fibonacci numbers less the first element. Let this sink in.

Enter the fixed point combinator.

There happens to be another, more elegant, way of expressing fibSeries, using fix, Haskell’s fixed point combinator. The above definition of fibSeries refers to itself in its definition. Let’s say you don’t want this, perhaps for pure intellectual pleasure (like here :P) or for deeper reasons — untyped lambda calculus, for instance, does not have named values, and so you cannot write an expression that refers to itself. In such cases, we use a fixed point combinator.

The first interesting thing about fix is its type: :t fix in ghci (fix is in Data.Function) will tell you its type is (a->a)->a. This should strike as counter-intuitive — how can a function possibly take a, say, Integer->Integer (let’s call this function foo) and return an Integer? It needs some value to apply foo to, right? The short answer is that fix calculates foo‘s fixed point. A slightly longer answer follows, which ends at TL;DR.

The fixed point of foo is the value v such that foo v equals v. It is fixed in the sense that no matter how many times we apply foo on the fixed point of foo, it remains fixed. One way to calculate the fixed point of foo is to apply it to infinite number of times. This works because a type is a complete partial order and that foo is (Scott) continuous. In general the person defining foo needs to ensure this, but, thankfully, in Haskell, all functions you can possibly define are Scott Continuous. Relevant theory is here. Keeping this way of looking at fix in mind (infinite applications of foo on undefined), you could implement fix as:

fix x = applyInfinitely x undefined
  where
    applyInfinitely f x = f (applyInfinitely f x)

There are other ways to define fix, of course. In an untyped lambda calculus, you can define it as fix f = \x -> f (x x)) (\x -> f (x x) (this is the famed Y combinator). If you’re well-versed with types, you could implement the Y combinator in Haskell too; but then you’ll have to depend on recursion at the type level. I like to think of fix as something that encapsulates recursion, allowing you to write expressions that don’t have any explicit recursion, but are recursive nevertheless.

TL;DR

Coming back to the meaty aspects of fix, how do you write the function in whose fixed point is the value you want to compute? In this case we want fix to produce a [Integer]. This means the function we will write (foo) needs to have the type [Integer]->[Integer]. Which means it will take an [Integer] and return an [Integer]. But what [Integer] does it take? The same [Integer] it is supposed to return! As a first attempt, with this knowledge, you could write:

fibSeries = fix (\x->x)

I mean, since fix‘s argument (we’ll call it foo consistently) is somehow magically the same [Integer] foo is supposed to return, what could be more straightforward? However, something smells wrong. Both because of the word magically and the fact that this definition of fibSeries simply does not contain any information specific to the Fibonacci series (well, except the name fibSeries, but I think that is expecting too much of GHC :D). One thing I had not mentioned is this: when we say “fixed point” we really mean “least fixed point”. Deep in his or her heart, everyone who plays with types knows that a type is a poset with a bottom (). The ordering in this poset is essentially information based — A > BA has more information than B. So, in the poset of Integers, 1 and 2 are incomparable, since both of them contain an equal amount of information. And then you have ⊥, which contains the least information of all. What is the least value (in terms of the “least” defined above) that can be the fixed point of \x->x (the identity function)? , of course! Since (\x->x) ⊥ is and no doubt, contains the least information of all by definition!

A more intuitive way of looking at this is that when constructing foo, you need to make sure it adds some information. You need to embed some extra information into foo to ensure that, given the value you’re supposed to return, you also do some identity preserving operation on it, assuming that it is the same value you’re supposed to return. This is not as hard or tricky as it sounds. For instance, calculating fibSeries using fix is a direct extension to the self referencing definition of fibSeries:

fibSeries = fix (\x->1:1:zipWith (+) (tail x) x)

Another cute example is using fix to define a factorial function:

factorial = fix (\f x->if x == 0 then 1 else x * f (x - 1))

By the way, why won’t this

fibSeries = fix (\x->1:zipWith (+) (tail x) x)

work?