DNA sequence modeling based on context trees

C.J. Kusters, T. Ignatenko

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

99 Downloads (Pure)


Genomic sequences contain instructions for protein and cell production. Therefore understanding and identification of biologically and functionally meaningful patterns in DNA sequences is of paramount importance. Modeling of DNA sequences in its turn can help to better understand and identify such patterns and dependencies between them. It is well-known that genomic data contains various regions with distinct functionality and thus also statistical properties. In this work we focus on modeling of such individual regions of distinct functionalities. We apply the concept of context trees to model these DNA regions. Based on the Minimum Description Length principle, we use the estimated compression rate of a genomic region, given such models, as a similarity measure. We show that the constructed model can be used to distinguish specific genes within DNA sequences.
Originele taal-2Engels
TitelProceedings of the 36th WIC Symposium on Information Theory in the Benelux and the 5th Joint WIC/IEEE Symposium on Information Theory and Signal Processing in the Benelux, May 6-7, 2015, Brussels, Belgium
RedacteurenJ. Roland, F. Horlin
Plaats van productieBrussel
ISBN van geprinte versie978-2-8052-0277-3
StatusGepubliceerd - 2015

Vingerafdruk Duik in de onderzoeksthema's van 'DNA sequence modeling based on context trees'. Samen vormen ze een unieke vingerafdruk.

Citeer dit