Good and Bad Code Snippets. This research came out of my work on Prospector, a system for generating code for Java APIs in response to programmer queries. (See paper and web demo. See also Liviu Tancau's JavaSketch.) Prospector works well for many APIs, but one problem is that it does not do a good job with code that uses the String type. The reason is that String is nonspecific: there are really many different kinds of strings, so composing API methods that use them is not usually valid. Consider these code snippets that use a file dialog UI class:
FileDialog dialog = ...; FileReader reader = new FileReader(dialog.getFile());
Font font = ...; FileReader reader = new FileReader(font.getName());
The "good" code snippet passes the filename selected by the user as the filename argument of the FileReader constructor: this is a common code snippet a programmer might want to write. The "bad" code snippet passes a logical font name as the filename argument: this code is almost certainly not what the programmer wants.
Empirically, there is a huge number of "bad" code snippets that use String. Thus, if Prospector is set up to use String in its synthesis, it generates so may "bad" code snippets that the good ones are crowded out of the results. The alternative (used in the current implementation) is not to use String, which means Prospector cannot synthesize useful code snippets that use String.
Classifying Code Snippets. Thus, to maximize Prospector's applicability, we need a way of classifying code snippets using String as "good" or "bad". We are doing this using algorithms that, given an API using String and sample code for that API, automatically discover a type hierarchy on the strings in the API. Code snippets are then considered "good" only if they type check according to the discovered type hierarchy. Because of Prospector's type-based design, we can build the types directly into Prospector's API query database, and Prospector will automatically generate only the "good" code snippets.
Human-Readable Results. A side benefit of this approach is that the type hierarchy discovered by the algorithm is human-readable, so it provides documentation of how strings are used, and could even be used to guide refactorting the API to use types directly, thus allowing the compiler to check the types.
Our approach is based on mining source code. The idea is to collect a sample corpus of source code for the API, and then try to discover a type hierarchy that is consistent with both the corpus and our expectations (e.g., we would not expect a library designer to create an API with 900 incompatible kinds of strings).
The following sections will explain more detail about our approach alongside example output from our prototype implementation. The API being studied is from Prospector's query engine, which uses several kinds of strings. We use this API because we understand it well, so it is easy for us to evaluate the performance of the algorithm, but please note that complex string APIs are not unique to Prospector: Eclipse and another (internal) IBM project both use strings in complex APIs. (Interestingly, the designers of Firefox actually created a string type hierarchy--this was easier for them because they used C++.)
String 2-jungloids. Our goal is to classify code snippets like the ones above as "good" or "bad". We call these code snippets string 2-jungloids. Our algorithm starts by collecting the set of string 2-jungloids from the sample corpus. (A 2-jungloid is simply a code snippet with 2 method calls, where the result of one call is passed as an argument to the other; i.e., a composition of 2 methods. A string 2-jungloid is a 2-jungloid where the value passed between the methods is a string.
For this example, we are using a subset of the Prospector search API that has 60 2-string jungloids, 42 of which are "good" and 18 "bad". We illustrate this with a Venn diagram:

Figure 1. Good and bad jungloids in our example API.
Collecting string 2-jungloids using Wala. The first step in our algorithm is to extract the string 2-jungloids that occur in the corpus. We have built code on top of Wala's program analysis that finds all the assignments where string values flow from one variable to another. The results are collected in an assignment graph with nodes that represent program variables and directed edges that represent value flow via assignment or parameter passing. Here is an assignment graph output from our prototype:
We can try to classify string 2-jungloids using the assignment graph alone: if there is a path from one method to another, we say that 2-jungloid is "good", otherwise it is bad. The "good" jungloids classified in this way are exactly those found in the corpus. In this particular case, 26 of the 2-jungloids were found in the corpus. As a Venn diagram:
Simplifying the assignment graph. Note that the assignment graph is too large to be easily readable. Also, it has many nodes for local variables, separate caller and callee parameter nodes, and other nodes that are not directly important for our results.
We can make the graph simpler without losing information making use of data-flow facts. For example, if variable A always flows to variable B, and B is always assigned to A, then A and B must have the same values, and thus the same "types" as well in our type discovery problem. Again on top of Wala, we have defined program analysis properties that allow us to unify program variables as being of the same "type". Here is a simplified assignment graph:
Creating a type hierarchy. At this point, our algorithm has created a simplified representation of the 2-jungloids that appear in the corpus, but we have not actually created a type hierarchy. Transforming the simplified assignment graph to a type hierarchy is the final step. We are still working on this step of the algorithm. So far, we are using a transformation based on the same data-flow facts we used in the previous step, but interpreting them in terms of their implications of value subsets, which imply value subtypes. We also allow methods to return values of multiple types, which are then accompanied by 'downcasts' when they are used. Here is the type hierarchy our prototype produces from the assignment graph:
As you can see, this is a much simpler view than the original assignment graph. Also, it is a more accurate classifier: it correctly classifies 90% of the 2-jungloids in our set of 60:
Future work will focus on improving the accuracy of 2-jungloid classification and on scaling the algorithm up to larger API subsets.
The code for the prototype is available here. At this time, the code is unsupported and might explode or something if you tried to use it your own APIs. Read at your own risk.