Typing Ad Hoc Data
Kathleen Fisher - Invited Talk
The ACM SIGPLAN Workshop on Types in Language Design and Implementation (TLDI 2007)
Nice, France, January 16, 2007
Abstract
Traditionally, types describe the internal data manipulated by programs. To accommodate the variety of desired data structures, language designers and type theorists have developed a wide variety of types and type constructors. But not all useful data is in programs; in fact, enormous amounts of it sit on disks or stream by on wires in a dizzying array of encodings and formats. It turns out that many of the types developed for internal data can be used to describe external data: tuples, records, unions, options, and lists come to mind as obvious examples. Perhaps more surprisingly, recursive types, singletons, functions, parametric polymorphism, and dependent types are relevant as well. Using types to describe external data leads naturally to the insight that we can reuse the same type to define an internal data structure and to generate parsing and printing functions to map between the two representations. The PADS project has exploited this idea, building data description languages based on the type structure of C (PADS/C) and on ML (PADS/ML) and exploring the theoretical basis for such languages with the Data Description Calculus (DDC). Continuing the analogy, it turns out that other concepts from the types world are also relevant to ad hoc data processing, including generic programming, type inference, type isomorphisms, and subtyping.
In this talk, I will describe the domain of ad hoc data processing and explain how types enable precise descriptions of such data. I will then explore the question of type inference, describing quantitative techniques we are currently developing to construct a description of ad hoc data given example instances.
Many people have worked on the PADS project, including Robert Gruber, Yitzhak Mandelbaum, and David Walker.