TLDI 2007 - Modular Information Hiding and Type-Safe Linking for C

What's the Deal with Dependent Types?

Conor McBride, University of Nottingham

The ACM SIGPLAN Workshop on Types in Language Design and Implementation (TLDI 2007)
Nice, France, January 16, 2007

Abstract

In this talk, I shall identify the key ways in which dependently typed programming confounds our common expectations, and then transcends them. I shall expose the language design issues where dependent types necessitate a new deal. I hope to persuade you that this is not such a big deal, but that it is a good deal. I shall discuss the impact of these issues on the evolution of the Epigram language, initiated by myself and James McKinna, and still in progress with help from Thorsten Altenkirch and our colleagues in Nottingham and St Andrews.

It is standard to identify the syntax of a programming language, say lambda-terms or pattern-matching equations, then equip that syntax with an operational semantics. We might then try to identify a system of types and type annotations which strikes an appropriate balance between liberty, security and bureaucracy. What happens if we follow this approach, adopting a dependent type system?

We can start by writing down a lambda-calculus with a suitable operational semantics and a dependent type system. I shall show you what we have now done for Epigram, structuring the syntax of terms specifically to clarify the flow of type information and reduce annotation.

Typechecking involves checking equality of normal forms of types, not just syntactic equality, so we need an operational semantics which preserves type information and computes under binders. This certainly requires more caution than we are used to, but it would be hasty to presume that there must be bad consequences for the run time operational semantics. In fact, the run time story is better than usual: as well as erasing types which play no part in compuation, we can erase values which are uniquely determined.

Once we have our lambda-calculus, can we encode the rest? I am pleased to report that we cannot readily do so. We are used to Chruch-style encodings of data, representing distinct choices of control flow. For example, it is fine to encode the type Bool so that case analysis is effectively given by

caseBool : Bool -> forall P : Type. P -> P -> P
caseBool b = b

Such an encoding can never account for an analysis which actually discovers the value, as well as choosing between branches:

caseBool = forall b : Bool, P : Bool -> Type. P true -> P false -> P b

The benefit is the ability to distinguish values which require distinct treatment. The price is that we must introduce datatypes directly, doing in theory what we do in practice. It is not a major effort to add data. I shall sketch what we now do in Epigram, introducing a whole system of datatypes at a single stroke, together with native support for datatype generic programming.

How are we to program with these data? The traditional case construct makes sense if we can consider the possible patterns for an individual value, independently of other values. This cannot fail to be the case in simply typed languages, but it cannot hope to be the case with dependent types. What happens is much better! Inspecting one value can give you information about other values for free, guaranteed by the typechecker! For example, when you inspect a vector (a list indexed by its length), you learn not only whether it is empty but also whether its length is zero.

Our old type systems were never strong enough to give us `information for free', so the possibility has not figured in language design until now. When we write programs by pattern matching with dependent types, our patterns show us both the information we explicitly test and the information we receive in return. Consequently, patterns can contain arbitrary terms, not just constructor forms, and a conventional rewriting semantics does not quite work. Fortunately, it is easy to fix: these non-constructor forms are always guaranteed to match by the typechecker, so they can and must be ignored operationally. We need `already know' patterns to complement `don't care' patterns. If we update our treatment of pattern matching to reflect this interaction with the type system, we can make sure that compiled code does not go wrong by working harder than is necessary.

So, without changing anything really fundamental, we can have a language of programs which supports an efficient run time operational semantics, an effective partial evaluation semantics and a system for checking dependent types. The distinctions between term and type, dynamic and static, explicit and implicit, have moved out of alignment with each other, but have not disappeared altogether.

Having gone to the trouble of establishing this relatively ordinary programming language, I shall spend the rest of my talk attempting to persuade you not to program in it. Rather, you should get your computer to program in it for you, and work in a much higher-level language where types have a much stronger part to play. Epigram has an ordinary core language, executable without typechecking but straightforwardly checkable. Epigram source code, by contrast, relies on typechecking to deliver its translation to the core. In effect, the presence of types supports program inference!

This approach is not at all peculiar to Epigram. Advances in overloading mechanisms (especially Haskell's type classes) and datatype-generic programming techniques are increasing the extent to which types in high-level languages are driving the construction of code in low-level languages. The opportunity which presents itself is to take the expressive power of types as the basis for programming language design, rather than a stroke of good luck. Crucially, the idea that types should act solely as discriminators of good programs from bad is a chain from which we are now free. I hope to encourage you to face your freedom with exhilaration, rather than fear.

START Conference Manager (V2.53.7)

Maintainer: rrgerber@softconf.com