This page documents cc_type.h and cc_type.cc, the files which make up the type representation module in Elsa. You should look at cc_type.h as you read this file.
The broad intent of cc_type is to represent types in a way that's easy to understand and manipulate. The abstract syntax of C/C++ type declarations is very much not amenable to either, because it shares type specifiers among declarators in a declaration, and because declarators are inside-out (see the description of Declarator in cc.ast.html). The type checker has the responsibility of interpreting the C/C++ syntax, and constructing corresponding Type objects so that subsequent analyses don't have to deal with the actual syntax.
Another goal of the type representation is general independence from syntax. Ideally, cc_type would not refer to cc.ast at all, but I haven't quite been able to achieve that. Nevertheless, it remains a goal to minimize the knowledge cc_type has about cc.ast. Types are concepts that exist independently from any particular syntax used to describe them, and I want the module dependencies to reflect that basic fact to the extent possible.
One major decision I made for this module was to translate typedefs away entirely. This means I don't have a node for some kind of arbitrary "named type", which would have to be "unwrapped" every place a type is inspected. The potential disadvantage is that I won't use the programmer's names when referring to types in error messages, but there's an easy solution to that too: every constructed type could have a field saying what name, if any, the programmer had been using as an alias for this type. That's not implemented, but I'm confident it would eliminate any advantage to retaining typedefs. (Update: It has recently been implemented, see Type::typedefAliases.)
Types are divided into two major classes: atomic types and constructed types. Atomic types (my terminology) are types atop which type constructors (like "*" or "[]") might be applied, but which themselves do not have any constructors, so cannot be further "deconstructed". They consist of built-in types like "int", enums, structs, classes and unions; I regard the aggregation of structs/classes/unions to form an atomic wrapper around their members. Each kind of atomic type is explained in Section 2.
Constructed types, by constrast, are whatever is built on top of atomic types by applying type constructors such as "*". Each type constructor is explained in Section 3.
AtomicType is the root of the atomic type hierarchy. It just contains methods to figure out what kind of type it actually is.
SimpleType is represents simple types built-in to C/C++, like "int". cc_flags.h contains the definition of SimpleTypeId. Note that there is no attempt to break down primitive types along dimensions like "signed" or "long", since the set of types is not orthogonal w.r.t. those dimensions (e.g. there is no "signed float" or "long char"). If an analysis wants (e.g.) an "isSigned" function, it should provide it itself, and make the associated decisions about what "isSigned(ST_FLOAT)" returns.
There are some SimpleTypeId codes that do not correspond to primitive C/C++ types, but are used instead by the front-end for various intermediate communication purposes. For example, ST_ERROR represents the type of an expression that contained an error, and is used to try to reduce the amount of error cascading. There are also some ids used to help implement section 13.6 of the C++ standard. Analyses generally won't have to concern themselves with such oddball values, since they shouldn't be visible outside the type checker implementation.
CompoundType is certainly the most complex of all the type classes. It represents a class, struct, or union. Storage of class members is done by inheriting a Scope (cc_scope.h). Thus, CompoundType itself is mostly concerned with representing the inheritance hierarchy.
To get the name lookup semantics right, we must be able to tell when names are ambiguous and also know the inheritance path used to arrive at any base. This is complicated by the presence of virtual inheritance. Each class contains its own, independent representation of its particular inheritance DAG; this representation is a graph of BaseClassSubobj objects. A base class subobject corresponds one-to-one to some particular (contiguous) range of byte offsets in the final object's layout.
CompoundType knows how to print out its DAG to Dot format; see CompoundType::renderSubobjHierarchy. For example, in/std/3.4.5.cc yields 3.4.5.png.
EnumType represents an enum. It has a list of enumerator values, which are also added to the Env (cc_env.h) during type checking.
Type is the root of the constructed type hierarchy. Like AtomicType, it has methods to find out which kind of type a particular object is (roll-my-own RTTI). It also has a variety of query methods (like isVoid()), and entry points for printing types.
CVAtomicType is always the leaf of a constructed type. It just wraps an AtomicType, but possibly with const or volatile qualifiers.
PointerType represents a pointer or reference type, possibly qualified with const or volatile.
FunctionType represents a function type. The parameters are represented as a list of Variable (variable.h) objects. Nonstatic member functions have the isMember flag set, in which case their first parameter is called "this" and has type "C cv &", where "C" is the name of the class of which the function is a member, and cv is optional const/volatile flags. FunctionType also has a flag to indicate if it accepts a variable number of arguments, and a way (ExnSpec) to represent exception specifications.
Finally, function templates have a list of template parameters. As I think about it, it's kind of strange to imagine a function type being templatized, so maybe I should have put that list someplace else (Variable?). On the other hand, maybe it will work to treat templates as having polymorphic types. Until the template implementation matues, it's not clear what the best strategy is.
ArrayType represents an array type. Types with no size specified have size NO_SIZE.
PointerToMemberType represents a C++ pointer-to-member. It says which class' member it thinks it points at, the type of the referred-to thing, and some optional const/volatile qualifiers. This is used for both pointers to both member functions and member data.
Note that creating a Pointer to a Function that happens to have isMember be true is not the same thing as a PointerToMember to a Function (without isMember). The former is a concept that I think would be useful, but does not exist in C++. The latter is C++'s pointer-to-member (to a function), and has the "heterogeneous array" semantics.
The template design is still somewhat incomplete. I'd like to have a pass that can fully instantiate templates, and so some of this is looking forward to the existence of such a pass.
TypeVariable is used for template functions and classes to stand for a type which is unknown because it's a parameter to the template. It should point at its corresponding Variable, rather than just having a StringRef name...
TemplateParams is a list of template parameters. I pulled this out into its own class for reasons I now don't remember...
ClassTemplateInfo is intended to contain information about template instantiations. It's not used right now.
When the type checker (cc_tcheck.cc) constructs types, it actually does so via the TypeFactory interface. This is to make it possible for someone to build annotations on top of my Types, instead of going in and mucking about in cc_type.h. It has several core functions that must be defined by a derived class, and a variety of other functions with default implementations when such an implementation is "obvious" in terms of the core functions.
The present form of TypeFactory is driven by the existence of one project that has such an annotation system. As new analyses arise that may need to customize the way types are built, I'll add new entry points to TypeFactory and modify cc_tcheck.cc to use them.
I consider the presence of the SourceLoc parameter in most of the TypeFactory functions to be a wart on the design. I don't even know what that location should be, if the type is constructed outside the context of interpreting a type denotation in the AST. The base Elsa front end itself always ignores that value, and passes SL_UNKNOWN whenever there isn't a location handy. I'd like to eventually revisit the decisions that led to the presence of those parameters.
BasicTypeFactory is an implementation of TypeFactory that just builds the types defined in cc_type.h in the obvious way.
While the TypeFactory mechanism makes it easy to annotate the leaf types, like FunctionType and PointerType, it doesn't offer a way to effectively annotate Type itself, because that class is in the middle of the inheritance hierarchy.
So, Type is actually split into two classes, BaseType and Type. Type inherits from BaseType, and by default, adds almost nothing to BaseType. However, clients of Elsa are allowed to replace Type's definition with one that includes additional annotations.
To replace Type's definition, make a header file that contains the
new class definition (make a copy of the existing Type definition as a
starting point). Then arrange for the preprocessor symbol
TYPE_CLASS_FILE
to be set to the name (in quotes) of
the file containing the new definition.
The key requirement of class Type is that it must not allow the
name "BaseType" to leak out into its interface. I do not want
to have to deal with more than one name for "generic type" in
analyses like the type checker. Some care is required to achieve
this, as I don't want to make the inheritance private (since that
would necessitate repeating all of BaseType's member declarations).
This requirement is the reason that Type, and not BaseType, has
the declaration of anyCtorSatisfies()
. Were it in BaseType,
then TypePred
would expose the BaseType name.