The inference of haplotype pairs directly from unphased genotype data is a key step in the analysis of genetic variation in relation to disease and pharmacogenetically relevant traits. Most popular methods such as Phase and PL do require either the coalescence assumption or the assumption of linkage between the single-nucleotide polymorphisms (SNPs). We have now developed novel approaches that are independent of these assumptions. First, we introduce a new optimization criterion in combination with a block-wise evolutionary Monte Carlo algorithm. Based on this criterion, the 'haplotype likelihood', we develop two kinds of estimators, the maximum haplotype-likelihood (MHL) estimator and its empirical Bayesian (EB) version. Using both real and simulated data sets, we demonstrate that our proposed estimators allow substantial improvements over both the expectation-maximization (EM) algorithm and Clark's procedure in terms of capacity/scalability and error rate. Thus, hundreds and more ambiguous loci and potentially very large sample sizes can be processed. Moreover, applying our proposed EB estimator can result in significant reductions of error rate in the case of unlinked or only weakly linked SNPs.
|Publication status||Published - 2005|