Support for writing GHC passes in Coq

16-Apr-2011: Now branching from the official ghc base git repo
19-Mar-2011: Things should be working again; I did my own darcs-to-git on the base repo
12-Mar-2011: The “base” repository has moved from darcs to git
10-Mar-2011: The repository names have changed; see below

In order to implement heterogeneous metaprogramming for GHC, I have produced a complete formalization of  System FC1 in Coq. The extracted Haskell code links directly into GHC and works at the level of Core expressions.

There are four distinct representations for a given chunk of Haskell code: Core, Weak, Strong and Proof. Think of them like a pipeline.

  •  HaskCore is a Coq representation which extracts exactly to GHC.CoreSyn, on-the-nose. This can be very cumbersome to manipulate in Coq (due to the fact that Coq functions must be total and recursion must be structurally-decreasing), so we try to move to another representation as quickly as possible.

  •  HaskWeak is a slightly more Coq-friendly version of HaskCore; constructs which are treated differently by the typing rules (applications of the “function space tycon” versus other tycons) or have their own introduction forms (type-lambda/coercion-lambda) each get a separate alternative in the data type. Variables are represented as a type which wraps a CoreVar and is indexed by the variable's sort (expression variable, type variable, coercion variable).

  •  HaskStrong is a very-dependently-typed version of HaskWeak; any value of type HaskStrong is a valid, well-Haskell-typed Haskell term. Type variables and coercion variables are represented using PHOAS since we must perform substitution on types. Value variables are represented in first-order style, and any type with decidable equality and an infinite supply of fresh values may be used to represent variables (since we do not currently need to perform substitution on values).

  •  HaskProof is an explicit natural-deduction proof tree (see  NaturalDeduction), in System FC1, of the well-typedness of a given Haskell term. The Haskell term may, of course, be recovered from the deduction. Types and coercions are represented in the same way as in HaskStrong, but value variables are represented Gentzen style with explicit structural operations – neither DeBrujin/first-order nor HOAS/PHOAS.

Conversions in both directions between all adjacent representation pairs are provided. You can also turn a HaskProof into a LaTeX document showing the actual proof tree (here's  an example). The emitted LaTeX code requires trfrac.sty and mathpartir. Keep in mind that for anything more than a trivial program the proof trees will get quite large (here's an  example) and may require expanding your TeX limits.

Some parts of the HaskCore-to-HaskWeak translation are written in Haskell, and therefore no more trustworthy (from a type-system-provided guarantee perspective) than the rest of GHC. From HaskWeak onward everything is in Coq.

Invoking GHC with the new flag -fcoqpass will run all the top-level bindings through the function coqPassCoreToCore defined in Extraction.v immediately after the desugarer; the bindings returned by that function are then sent through the rest of the compiler. Invoking with the new flag -ddump-proofs will pass those bindings instead to coqPassCoreToString and dump the string it returns. Finally, -ddump-coqpass will dump the core expression returned by coqPassCoreToCore.

Getting It

The gitweb is here

git clone http://git.megacz.com/coq-hetmet.git

There is a branch which has the extracted Haskell code “baked in”; use this branch if you don't have the Coq compiler installed on your local machine:

git clone -b coq-extraction-baked-in http://git.megacz.com/coq-hetmet.git

You can find nice pretty-printed PDFs (the scripts use weird unicode characters) of the key parts of the formalization here.

Trying It

If you're in a hurry, paste these commands into a shell window and come back after lunch:

git clone http://git.megacz.com/ghc-hetmet.git
git clone -b coq-extraction-baked-in \
   http://git.megacz.com/coq-hetmet.git ghc-hetmet/compiler/coq
touch ghc-hetmet/compiler/coq/build/CoqPass.hs

cd ghc-hetmet
chmod +x sync-all
git clone http://git.megacz.com/ghc-base.git libraries/base
./sync-all -r http://darcs.haskell.org/ get
perl boot
./configure

echo 'GhcStage1HcOpts = -O0 -fasm' >> mk/build.mk
echo 'GhcStage2HcOpts = -O0 -fasm' >> mk/build.mk

cp mk/build.mk.sample mk/build.mk
make BuildFlavour=quickest inplace/bin/ghc-stage2

echo -e 'main = putStrLn "Hello, World"'      > HelloWorld.hs
inplace/bin/ghc-stage2 -ddump-proofs -ddump-to-file HelloWorld.hs

mv HelloWorld.coqpass HelloWorld.tex

pdflatex HelloWorld.tex

… and you should get  this

The echo commands are very important; the compiler will crash if built using optimizations. I suspect this is because Coq needs to circumvent the Haskell type system in order to extract dependent types, and the GHC optimizer uses typing information to justify certain optimizations.

Organization

All of the code for Coq-in-GHC resides in files whose name starts with Hask. In general:

  • Prelude.v and General.v contains the code which is global to everything.

  • HaskKinds.v contains the Coq code for Kinds (same across all four representations)

  • HaskXXXX.v contains the Coq code for representation XXXX

  • HaskXXXXTypes.v contains the Coq code for the types and coercions of representation XXXX. There is no HaskProofTypes because HaskProof uses the HaskStrong representation for types and coercions.

  • HaskXXXXToYYYY.v contains code to convert from representation XXXX to YYYY.

The HaskXXXXToYYYY files are rarely (goal: never) imported by any other file; they are the leaves of the dependency tree. I try to keep all of the other files as clean and readable as possible, and let the HaskXXXXToYYYY get messy and convoluted at their expense.

Necessary Coq Patches

See this.