CS 298-2
Theory Seminar
Eran Halperin
International Computer Science Institute
The recent release of the Haplotype Mapping project (Nature, Oct. 26,
2005 - see also, e.g. NY Times, Oct. 27), and the rapid reduction in
genotyping costs open new directions and opportunities in the study of
complex genetic disease such as cancer or Alzheimer's disease. The
datasets collected for these studies are DNA sequences, with some noise
and ambiguous information.
In this talk I will discuss some of the algorithmic issues of
disambiguating these DNA sequences, and the current and potential impact
of these algorithms on genetics and medicine. In particular, I will
briefly discuss some of the problems in the field, such as genotype
phasing, tag SNP selection (e.g. feature selection), and population
stratification issues (e.g. clustering). I will also discuss in some
more details one specific application, HAPLOFREQ, that uses mathematical
programming formulations to estimate haplotype frequencies under a
random mating assumption (Hardy-Weinberg Equilibrium).
I will explain all the biological terminology in the talk, but if you
are curious about some of the keywords, here is a one page explanation,
which I found to be very useful: http://www.hapmap.org/whatishapmap.html
HAPLOFREQ is a joint work with Elad Hazan from Princeton University.