Analysis of information content for biological sequences

J. Zhang

Research output: Contribution to journalArticleAcademicpeer-review

2 Citations (Scopus)
12 Downloads (Pure)


Decomposing a biological sequence into modular domains is a basic prerequisite to identify functional units in biological molecules. The commonly used segmentation procedures usually have two steps. First, collect and align a set of sequences that are homologous to the target sequence. Then, parse this multiple alignment into several blocks and identify the functionally important ones by using a semi-automatic method, which combines manual analysis and expert knowledge. In this paper, we present a novel exploratory approach to parsing and analyzing such kinds of multiple alignments. It is based on a type of analysis-of-variance (ANOVA) decomposition of the sequence information content. Unlike the traditional change-point method, this approach takes into account not only the composition biases but also the overdispersion effects among the blocks. The new approach is tested on the families of ribosomal proteins and has a promising performance. It is shown that the new approach provides a better way for judging some important residues in these proteins. This allows one to find some subsets of residues, which are critical to these proteins.
Original languageEnglish
Pages (from-to)487-503
JournalJournal of Computational Biology
Issue number3
Publication statusPublished - 2002


Dive into the research topics of 'Analysis of information content for biological sequences'. Together they form a unique fingerprint.

Cite this