Playing With Pointers

One step closer to reflective equilibrium.

Macros In Haskell

Haskell too has macros. The subsystem is called Template Haskell. It is a Lisp-like AST generation framework that comes as a GHC extension (that is, it isn’t a part of the language as of now). This blog post is a (hopefully) simple introduction to TH, and how it can possibly be used. As usual, it is less of an instructive guide, and more like a chronicle of my own personal adventure. Moderate competence in Haskell is required.

Unlike Lisp’s macro system, TH is typed — like everything else in Haskell. You have typed constructors like FunD (function declaration) and InfixP which are used the build the AST of whatever form you wish to splice into your regular code. You can find the current TH user manual at [1].

The AST constructors are in the module Language.Haskell.TH, which looks quite overwhelming at the first glance, and, as such, writing TH expressions from scratch can get quite difficult. Thankfully, we also have a nice runQ function which allows us to write TH expressions without really grokking everything in the module. I’ll try to demonstrate this by getting TH to generate a small (read useless / toy) string matcher.

Objective

We’ll try to generate a function with the type String->Bool. The function’s job will be to check if the String it receives belongs to a list of Strings we provide at compile time. To make things a little more interesting, we’ll allow the strings we have at compile time to have a regex-like "?" in them, which can match any character. As an example, if the list of Strings is ["asdf", "a?foo", "P"], the generated function should match "asdf", "P", "amfoo" but not "afoo". We don’t address the issue of how to match a literal "?". Heck, out end product boils down to ten lines of code.

Once we’re done writing our TH code, we should have a function generateMatcher, which we shall invoke from the top level of a source file, like:

  generateMatcher "generatedF" ["asdf", "a?foo", "P"]

It will then generate the matcher, with the name generatedF and the behaviour we’re just described. It can then be called like a regular function. So a source file can look like:

  generateMatcher "generatedF" ["asdf", "a?foo", "P"]

  main = do
    str < - getLine
    putStrLn $ show $ generatedF str

Starting Off

The first thing to think about is how the compiled code should look like. Given the above problem instance, we could generate something like

    generatedF ('a':'s':'d':'f':[]) = True
    generatedF ('a':_:'f':'o':'o':[]) = True
    generatedF ('P':[]) = True
    generatedF _ = False
  

Haskell’s pattern matcher is, AFAIK, intelligent enough to convert the cases into a tree structure and quickly jump to the correct clause. Now the second question: how should the AST look like? Here we take the help of the runQ function. First we start GHCi as ghci -XTemplateHaskell (since TH is not enabled by default). Once in GHCi, we import the required module by :m + Language.Haskell.TH. The scene is set! We can now see the AST for generatedF ('a':'s':'d':'f':[]) = True. For that, we evaluate runQ [d|generatedF ('a':'s':'d':'f':[]) = True|]. The [d| ... |] is used to quote a declaration. Detailed information (about this, and generally about TH) is in [1]. We will now see some (rather incomprehensible) output:

[FunD generatedF [Clause [InfixP (LitP (CharL 'a'))
GHC.Types.: (InfixP (LitP (CharL 's')) GHC.Types.:
(InfixP (LitP (CharL 'd')) GHC.Types.:
(InfixP (LitP (CharL 'f')) GHC.Types.:
(ConP GHC.Types.[] []))))] (NormalB
(ConE GHC.Bool.True)) []]]

A little indentation, of course, goes a long way in making things clearer (when does it not? :D):

[FunD
 generatedF [Clause
             [InfixP (LitP (CharL 'a'))
              GHC.Types.:
              (InfixP (LitP (CharL 's'))
               GHC.Types.:
               (InfixP (LitP (CharL 'd'))
                GHC.Types.:
                (InfixP (LitP (CharL 'f'))
                 GHC.Types.:
                 (ConP GHC.Types.[] []))))]
             (NormalB (ConE GHC.Bool.True)) []]]
  

I usually use guesswork from this point, leaving it to Haskell’s type system to correct my incompetence. Haskell’s AST has the convention of suffixing most data constructors with a letter which hints at what type it belongs to. We can guess (and purely guess, I have not looked up the documentation) that InfixP is an infix pattern, a LitP is a literal pattern, a NormalB is a normal body, a ConE is a constructor expression, a FunD is a function declaration and so forth. The AST for the match-anything clause is simpler:

  [FunD
   generatedF [Clause [WildP]
               (NormalB (ConE GHC.Bool.False)) []]]

You can guess what a WildP is, especially since it derives from a _.

The core idea

The basic idea behind TH is that allowing the user to splice in a bit of custom-generated AST into an otherwise normal Haskell program can lead to all kinds of awesomeness. A splicable AST has to be typed as a Q Exp, a Q Type or a Q [Dec]. The last one stands for a list of declarations, which is the one we need right now. To splice in an expression as an AST, we place it inside a $( ... ). To keep things clean, top level declarations don’t need the $( ... ) nesting — anything of the type Q [Dec] at the top level of a source file is automatically interpreted correctly. Q is a Monad, which is sometimes needed to generate unique identifiers, like Lisp’s gensym. We don’t need that right now.

Getting back to the point, we need a function that takes the specified input, and outputs the AST of the code we want to generate. Since we need to generate something of the type Q [Dec], and we will be given the name of the function to generate, and a list of Strings, there is at least one line we can write:

  generateMatcher :: String -> [String] -> Q [Dec]

We already have seen the basic structure of the AST. We now write some code to generate it. We will generate only one Dec (i.e. the [Dec] in Q [Dec] will only have one element), corresponding to the one function we want to declare. The Dec will be a FunD (just like in the AST runQ showed us), with multiple Clauses, each representing one pattern (which itself corresponds to one String we were given at compile time).

One thing though — for some reason I don’t really understand, while runQ prints them, things in the GHC module (like GHC.Bool.True, GHC.Types.:) don’t really work in a generated AST. A token in TH is essentially a Name, which we can get using mkName :: String -> Name or by using the quote (') operator. So one way to get around the problem is to represent things like GHC.Bool.True as Names — this seem to work well in practice. The quote operator does not seem to work for symbols — we replace things like GHC.Bool.True with 'True and things like : with mkName ":". We also need to use mkName for the functions’ name, which needs to be a (surprise) Name.

Okay, phew! Time to see some code that actually works. With the little Haskell-foo I have, I wrote the following piece of code:

generateMatcher name strs =
  return [FunD (mkName name) (map thMagic strs ++ baseCase)]
  where
    thMagic str = Clause [transform str] (NormalB (ConE 'True)) []
    baseCase = [Clause [WildP] (NormalB (ConE 'False)) []]
    transform [] = ConP (mkName "[]") []
    transform ('?':rest) =
      let colon = mkName ":" in (InfixP WildP colon $ transform rest)
    transform (x:rest) =
      InfixP (LitP (CharL x)) (mkName ":") $ transform rest

Ten lines, as promised. I used return to lift the [Dec] into the Q monad. Other than, most of it is a direct translation of the input we get into something that looks like the output runQ dumped on me. The code is in my snippets repo, as usual. TH does not, as of now, allow you to declare an expression and splice it in the same file. You need to have, say, a FooTH.hs where you have all the functions for generating the AST and a Foo.hs where you actually use them.

TL;DR

This was a long and meandering post. In short, here are the two basic points I want to make:

  • TH is awesome. It can save a lot of time, if used properly.
  • Don’t beat yourself up trying to write ASTs from scratch, play around with runQ, steal and modify.

[1] http://www.haskell.org/ghc/docs/latest/html/users_guide/template-haskell.html

Threading A Graph

One operation that moving garbage collectors (among other things) have to concern themselves with is updating of references. Once you move object X from position P0 to P1, you’ll also need to update all references to X accordingly. To be able to do this, you first need to somehow maintain a list of words that point to a specific object. There are obvious ways of this, of course — maintaining a hash-table that maps a object to a list of words that store references to it is one. There is a more elegant way to do this, however; if the set of objects you’re working with satisfies the following constraints:

  • Each object has a (header or otherwise) pointer-sized field which stores some information that can be differentiated from a pointer. On a system that is aligns words to two or four byte boundaries, we could set the LSB of non-pointer data stored in the field. We’ll call this field info.
  • Pointers to objects always point to the beginning of some object. No pointing to a field stored somewhere in the middle.

The algorithm is extremely simple and elegant. All you have to do is this: every time you see a slot (a field, if you like) S point to an object O, change Os info field to point to slot S and set S to whatever value (pointer or non-pointer) info held before it was set to point to S. If you try this out with pen and paper (I’m too lazy to draw a diagram here), you’ll see that this simple action, when applied to every slot in the graph, transforms the predicate “Slots S0, S1 and S2 point to object O” to “O‘s info slot contains a pointer to S2, S2 contains a pointer to S1, S1 contains a pointer to S0 and S0 contains the data O‘s info field originally held”. Since we can always distinguish a data word stored in info from a pointer, we can traverse this list and get a list of the slots. The slots can then be filled with new location of the object.

This is one of the main concepts behind Jonker’s compaction algorithm, except that it intelligently divides the compaction into two phases of updating forward pointers and updating backward pointers while moving objects.

I wrote a simple and tiny implementation of threading a graph, which you can find at https://github.com/sanjoy/Snippets/blob/master/Thread.cc. Note how the main threading algorithm only takes thirteen lines of C++.