Home Research Teaching Resume Bio

Current Previous

  • WordSeer: Exploring Language Use in Slave Narratives

    More and more source text gets digitized in the humanities every day. Scholars who want to study these new collections in depth need computational assistance because of their large scale. To help, we built WordSeer, a text analysis tool that includes visualizations and works on the grammatical structure of text.

    We focused on exploring language use patterns in a collection of American slave narratives, but the technique is applicable to any text collection. Our user studies with humanities scholars are showing that WordSeer makes it easier to translate their questions into queries and find answers to their questions compared to a standard search box. Here is a blog post with more detail, and some slides.

  • Investigating the New York Times Linked Open Data Set

    The New York Times linked open data set is an index of people, places, organizations and topics, along with the articles in which they appeared, since 1981. An API is available for querying the dataset for people, place, organization, and topic, along with keywords and co-occurrence information.

    The data set is so thoroughly annotated that a lot of interesting questions, that would be difficult on other data sets because of the intermediate problems of named entity extraction, coreference labeling, and temporal order identification, can be asked. We are looking at applying some graph-based, network-based, and natural language processing techniques to this data set to learn relationships and trends.

    Here is a visual explorer I made, still very much in alpha.

    Image http://data.nytimes.com