Sequential Document Visualization
Guy Lebanon
Purdue University
Abstract
Documents and other categorical valued time series are often
characterized by the frequencies of short range sequential patterns such
as n-grams. This representation converts sequential data of varying
lengths to high dimensional histogram vectors which are easily modeled
by standard statistical models. Unfortunately, the histogram
representation ignores most of the medium and long range sequential
dependencies making it unsuitable for visualizing sequential data. We
present a novel framework for sequential visualization of documents
based on the idea of local statistical modeling. The framework embeds
categorical time series as smooth curves in the multinomial simplex
summarizing the progression of sequential trends. We discuss several
visualization techniques based on the above framework and demonstrate
their usefulness for document visualization.