Alex Aiken CS 264 Spring, 1995 Code Generation via Conversion to Continuation-Passing Style ------------------------------------------------------------ I. The Problem with Stacks In C or FORTRAN, all procedure calls can be stack allocated. This remains true even in a language such as Pascal where procedure definitions can be nested, so long as one does not allow procedures to be returned as the result of procedures. In a language in which functions are first class (i.e., can be passed as arguments and returned as results) and function definitions can be nested, it is no longer clear how to stack allocated all values. The following ML example illustrates why: (let fun f x = let fun g y = x in g end in f 3 end) 5 Now, the function "g" is returned as a result of evaluating "f x". Note that even though "g" is defined lexically inside the scope of "f", the value returned by the outer let-expression is actually "g" in an environment where x = 3. The result of the entire expression is, of course, 3. The important point of this example is that a value referring to "g" has greater extent than the function "f" that creates it, so the calls to "f" and "g" cannot be stack allocated. In LISP, this situation has been named (not very mnemonically) the "upward funarg problem". The standard implementation of escaping functions is to create a *closure*. Formally, a closure captures the function and the environment in which it was defined. Operationally, a closure is a pair consisting of a pointer to the code for the function and a pointer to the activation record defining the variables free in the function definition. A more refined implementation is to make a closure a flat record, with the first element being the code pointer and the remaining elements the environment variables. For example, in the execution of the code above the sequence of events might be: evaluate (...) 5 evaluate f 3 create activation record for f, slot x = 3 evaluate body of f define function g with free variable x. Since g will escape, create closure C with pointer to g and a pointer to x. return C pop AR for f evaluate 5 (returns 5) invoke C with y = 5 evaluate body of g end Note that neither the environment containing "x" nor the closure for "g" can be allocated in "f"'s activation record; these values must be allocated on the heap. Compiling languages with functions that escape can be painful. To do a good job, one must distinguish between functions that may escape (e.g., g) and those that cannot escape (e.g., f), because more efficient implementations are possible for the latter than the former. In addition, there are many choices to be made in the representation of a closure and its environment. For example, it may be desirable to avoid allocating the closure at all if the value will be immediately used; in this case, it is better to store the two values of the closure in separate registers. Traditionally, the decisions about the representations of closures and environments have been made during code generation. This design has at least two disadvantages: (1) it makes the code generator complicated and therefore more difficult to debug and retarget, and (2) it prevents optimizations from being applied to the choice of closure and environment representations, because the optimizer works on a higher level form in which these representation decisions are hidden. A consistent theme of modern compiler technology is: MAKE RUNTIME REPRESENTATIONS EXPLICIT IN THE COMPILER'S INTERMEDIATE FORM. That is, if one is going to use closures, have a representation for closures in the intermediate language. Making all representation decisions explicit makes code generation very simple (it's just a change of syntax from the intermediate form to object code) and, most importantly, exposes the runtime representations to the compiler, where program analysis and optimization can be used to make improvements. II. Conversion to Lambda Calculus The first step in the Appel/Jim compiler for ML is to translate ML into a version of the lambda calculus. The reason for this translation is that ML is a big language, with many constructs. The translation reduces the number of constructs drastically, while preserving the semantics of the language. Thus, the translation simply reduces the number of cases that subsequent phases must consider. For example: fun (x) => e ==> \x.e let x = e in e' ==> (\x.e') e III. Continuation-Passing Style In the case of closures, the runtime property we are making explicit is control-flow. That is, during execution control jumps around from point to point in the assembly code. In C or Fortran, the order is easy to compute from the program text. In a language like ML, the order is not so easy to see, because a closure may be created, passed around in a data structure, and then invoked at some place far removed from the context where it was defined. Continuation-passing style makes arbitrary control explicit. The idea is simple: every function takes an additional argument, the address of the instruction it should jump to when the function completes. In very concrete terms, one can think of a programming language in which the completion of every function involves a "goto" to jump to the next instruction to be executed. The difference is that the continuation is an *argument* to the function, so that it may jump to different locations in different evaluations. What have we just described? The return address of a function invocation! Normally this is a "hidden" argument, known only to the code generator, but CPS makes it explicit. We'll consider a little lambda-calculus language for our presentation of CPS: e ::= x | e1 e2 | \x.e1 | if e1 e2 e3 | e1 + e2 | i CPS conversion takes an expression e and produces a function that takes a continuation k. The idea is that we should evaluate e, and then continue with the execution of k. But what happens to the value that e produces? The answer is that it is the input to the continuation. That is: CPS(e,k) = k e This definition tells us what we want: evaluate e and pass its result to the rest of the computation k. Of course, we also want to recursively convert subexpressions of e. We will allow ourselves a little bit of syntactic sugar in the target language to help make things clear. We add the expression let x = e1 in e2. The rule is that e1 must be a syntactic value---either an abstraction \z.e or an integer. The standard conversion algorithm is: CPS(x,k) = k x CPS(\x.e,k) = let y = \(x,k').CPS(e,k') in k y CPS(e1 e2,k) = CPS(e1, \f. CPS(e2,\x. f(x,k))) CPS(if e1 e2 e3) = CPS(e1,\b.if b = 0 then CPS(e2,k) else CPS(e3,k)) CPS(e1 + e2,k) = CPS(e1,\a.CPS(e2,\b.let c = a + b in k c)) CPS(i,k) = k i Note that as a result of the conversion, every function takes an extra continuation argument k'. This extra argument is exactly the return address of the call---where the function should return when it is done. Note also that the CPS conversion doesn't get rid of the primitive operations such as conditionals, addition, and function application, but those primitives now appear in a very restricted form: their arguments are always variables. In addition, lets always bind syntactic values; nothing needs to be evaluated. In operational terms, this representation forces all intermediate values to be given names, which can be conveniently thought of as registers or memory locations. Because of their restricted form, code generation is very easy for CPS converted terms. Here is an example: (\x.x + 1) 3 CPS((\x.x +1) 3,k) = CPS(\x.x + 1,\f.CPS(3,\x.f(x,k))) = let g = \(x,k').CPS(x+1,k') in \f.CPS(3,\x.f(x,k)) g = let g = \(x,k').CPS(x+1,k') in CPS(3,\x.g(x,k)) = let g = \(x,k').CPS(x+1,k') in \x.g(x,k) 3 = let g = \(x,k').CPS(x,\a.CPS(1,\b.let c = a + b in k' c) in g(3,k) = let g = \(x,k').(\a.CPS(1,\b.let c = a + b in k' c) x) in g(3,k) = let g = \(x,k').CPS(1,\b.let c = x + b in k' c) in g(3,k) = let g = \(x,k').(\b.(let c = x + b in k' c) 1) in g(3,k) = let g = \(x,k').(let c = x + 1 in k' c) in g(3,k) = This CPS term would convert into the following pseudo-machine code: (* k contains the address of the current continuation on entry result of function is always stored at address a *) g := LABEL (* code for g(3,k) *) store 3 as first argument of g store k as second argument of g jump LABEL ... LABEL: (* code for \(x,k').let c = x + 1 in k' c *) a := first argument + 1 jump to second argument The CPS conversion given above is a sugared version the traditional one in the literature. The vanilla CPS conversion does not use let expressions, which, while operationally equivalent, makes the code quite a bit harder to read. Appel also uses a different syntax, which is also hard to read but more suited for implementation as a datatype in a compiler. IV. Closure conversion Although CPS conversion makes call/return addresses for functions explicit, it hasn't done anything to help with generating code for closures. A function with free variables must be able to access those free variables at runtime. The idea behind closure conversion is to make the environment explicit by passing it as an extra argument to each function. This basic idea can be improved in the following way. Def. A function f is "known" if the only uses of are as f x; i.e., f always appears in function position. Known functions can have all of their uses statically identified, and they do not escape. Thus, at every call site of a known function, the set of free variables of that function are also in scope. To see this, consider the expression let f = ... in .... f x .... Clearly, all the free variables in the body of f are still accessible inside the body of the let, modulo values that are shadowed by bindings of the same name. To handy constructs are used heavily in closure conversion: record and select. record(x1,...,xn) makes a record of n things and may be bound in a CPS-let. select(i,m) selects the ith component of record m and may also be bound in a CPS-let. Records are used heavily to represent the closure vallue of a function. To make closures explicit, every function is modified as follows: - For an unknown function f, an extra closure argument is added to the definition of f. That is, let f = \(x,k).e in ... is converted to let f = <\(c,x,k).e,v1,...,vn> in ... where v1,...,vn are the free variables of the function. Every use of an unknown function f(x,k) (i.e., application of a variable not syntactically bound to a function body) is replaced by f.0(f,x,k) By storing the code pointer at offset 0 inside the closure we don't have to know how big the record is to find the pointer. References to free variables in the body of f are replaced by appropriate selections from the closure record. - For a known function f, as many additional arguments are added to the function as it has free variables. That is, let f = \(x,k).e is converted to let f = \(x,k,v1,...,vn).e where v1,...,vn are the free variables of the function. Every use of a known function is replaced by f x1,...,xn where x1,...,xn are the names for the free variables of f in the current context. Note that x1,...,xn will in general be different names, because all definitions and uses are converted to have no free variables. V. Optimizations and Simplifications Once CPS conversion and closure conversion are performed, a large number of optimizations/simplifications can be usefully peformed. The most important are: 1. Unnest all nested function definitions. Since there are no longer any free variables, this is trivially correct. 2. Replace every let x = record(...,vi,...) in ..... let y = select(i,x) in e by let x = record(...,vi,...) in ..... e[vi/y] 3. Beta-reduce any function that is called only once or whose body is small. 4. Perform eta-reduction f(x,y) = g(x,y) => replace all uses of f by g 5. Perform register allocation. Here the idea is that an expression requires a number of registers equal to the number of free variables it refers to. If it has more free variables than the number of available registers, then some free variables must be "spilled" by packing them into a record and referencing them using selects (note that these selects cannot be optimized using 2!). VI. Register Allocation The code generator is easy except for register allocation. In principle, this phase could also be performed on the CPS and closure-converted code by renaming all variables to from a fixed set r1-rn (the names of actual machine registers). The principle ideas here are to: 1. Register targetting: arrange for a function to expect an argument in a particular register where all calls can place it. Successful register targetting makes a function call as cheap as a jump instruction. 2. Register anti-targetting: if an expression will call function f and f expects a particular argument in register i, avoid putting anything else in register i to avoid a potential move instruction prior to the function call.