Abstract
We demonstrate an application of a core notion of information theory, typical sequences and their related properties, to analysis of population genetic data. Based on the asymptotic equipartition property (AEP) for nonstationary discrete-time sources producing independent symbols, we introduce the concepts of typical genotypes and population entropy and cross entropy rate. We analyze three perspectives on typical genotypes: a set perspective on the interplay of typical sets of genotypes from two populations, a geometric perspective on their structure in high dimensional space, and a statistical learning perspective on the prospects of constructing typical-set based classifiers. In particular, we show that such classifiers have a surprising resilience to noise originating from small population samples, and highlight the potential for further links between inference and communication.
| Original language | English |
|---|---|
| Pages (from-to) | 159-183 |
| Number of pages | 25 |
| Journal | Journal of Theoretical Biology |
| Volume | 419 |
| DOIs | |
| Publication status | Published - 21 Apr 2017 |
Keywords
- Classification
- Population cross entropy rate
- Population entropy rate
- Typical genotypes
- Typical sequences
- Genetics, Population
- Gene Frequency
- Genotype
- Algorithms
- Information Theory
- Computer Simulation
- Models, Genetic
- Polymorphism, Single Nucleotide
Fingerprint
Dive into the research topics of 'From typical sequences to typical genotypes'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver