×

Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. (English) Zbl 0864.62076

Summary: A wealth of data concerning life’s basic molecules, proteins and nucleic acids, has emerged from the biotechnology revolution. The human genome project has accelerated the growth of these data. Multiple observations of homologous protein or nucleic acid sequences from different organisms are often available. But because mutations and sequence errors misalign these data, multiple sequence alignment has become an essential and valuable tool for understanding structures and functions of these molecules. A recently developed Gibbs sampling algorithm has been applied with substantial advantage in this setting. We develop a full Bayesian foundation for this algorithm and present extensions that permit relaxation of two important restrictions. We also present a rank test for the assessment of the significance of multiple sequence alignment. As an example, we study the set of dinucleotide binding proteins and predict binding segments for dozens of its members.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
92C40 Biochemistry, molecular biology
92D20 Protein sequences, DNA sequences
62F15 Bayesian inference
Full Text: DOI