CS 298-2
Theory Seminar

Eric Kuo
UC Berkeley

Dissertation Talk: Combinatorics of Viterbi Sequences

Monday, May 16, 2005
4pm-5pm
306 Soda Hall

Statistical models have important applications in computational biology.
The problem of parametric inference is to identify the explanations of
maximal likelihood simultaneously for all parametric settings of a
statistical model. One statistical model of interest is the Markov chain.

The Viterbi path(s) of length n of a Markov chain is the sequence(s) of
n+1 states that have the highest probability of occurring in the Markov
chain. We divide the space of all Markov chains into Viterbi regions in
which two Markov chains are in the same region if they share the same
Viterbi paths. These regions correspond to the vertices of a polytope
in which we represent each sequence by an ordered tuple where each
coordinate corresponds to the frequency of a transition in the sequence.

As the length n increases to infinity, the number of vertices remains
bounded. This results from the ability to decompose each Viterbi
sequence into a prefix, suffix, and periodic interior. Viterbi
sequences for two- and three-state Markov chains are characterized.

A toric Markov chain is a Markov chain with a non-stochastic transition
matrix. When the chains have 2 or 3 states, there are some sequences
that are Viterbi for a toric Markov chain, but not for any Markov chain.
However, when the chain has 4 or more states, the sets of Viterbi
sequences are identical for both Markov chains and toric Markov chains.

Other results have been obtained for various graphical structures such
as cycles, toroidal arrays, and "approximate alignments", a structure
that arises from computing optimal alignments of sequence pairs.