From typical sequences to typical genotypes

Omri Tal, Tat Dat Tran, J. Portegies

Research output: Contribution to journalArticleAcademicpeer-review

1 Citation (Scopus)
2 Downloads (Pure)

Abstract

We demonstrate an application of a core notion of information theory, typical sequences and their related properties, to analysis of population genetic data. Based on the asymptotic equipartition property (AEP) for nonstationary discrete-time sources producing independent symbols, we introduce the concepts of typical genotypes and population entropy and cross entropy rate. We analyze three perspectives on typical genotypes: a set perspective on the interplay of typical sets of genotypes from two populations, a geometric perspective on their structure in high dimensional space, and a statistical learning perspective on the prospects of constructing typical-set based classifiers. In particular, we show that such classifiers have a surprising resilience to noise originating from small population samples, and highlight the potential for further links between inference and communication.
Original languageEnglish
Pages (from-to)159-183
Number of pages25
JournalJournal of Theoretical Biology
Volume419
Issue numberApril 2017
DOIs
Publication statusPublished - 21 Apr 2017

Keywords

  • Classification
  • Population cross entropy rate
  • Population entropy rate
  • Typical genotypes
  • Typical sequences
  • Genetics, Population
  • Gene Frequency
  • Genotype
  • Algorithms
  • Information Theory
  • Computer Simulation
  • Models, Genetic
  • Polymorphism, Single Nucleotide

Fingerprint

Dive into the research topics of 'From typical sequences to typical genotypes'. Together they form a unique fingerprint.

Cite this