|
Abstract:
High-throughput genotyping technologies for single nucleotide polymorphisms (SNP) have stimulated much interest in studying genomewide linkage disequilibrium (LD) patterns. Conventional LD measures, such as D' and r2, are two-point measurements and their relationship with physical distance is highly noisy. We propose a new LD measure, Δ, defined in terms of the correlation coefficient for shared haplotype lengths around two loci, thereby borrowing information from multiple loci. A U-statistic-based estimator of Δ, which takes into consideration the dependence structure of the observed data, is developed and compared to an estimator based on the usual empirical correlation coefficient. Furthermore, we propose methods for inferring LD decay rates and recombination hotspots based on the new LD measure Δ. The results from coalescent simulation studies and analysis of HapMap SNP data demonstrate that the proposed estimators of Δ are superior to the two most popular conventional LD measures, in terms of their close relationship with physical distance and recombination rate, their small variability, and their strong robustness to marker allele frequencies. The haplotype block models, supported by both human genetic theory and empirical studies, have recently received much attention to describe LD patterns in genomic regions. However, the models proposed to date tend to be overly simple for the complex LD patterns presented by human genomic data. In addition, the identified block boundaries are usually very sensitive to the operational definition of a haplotype block and to marker density and allele frequency. We propose a hierarchical haplotype block model, which accommodates nested block structures, and a recursive partitioning algorithm, DHPBlocker , to identify the corresponding hierarchical block structure. The proposed algorithm is applied to two sets of human SNP data and our results reconcile the seemingly inconsistent results from existing methods, suggesting that the hierarchical block model is a more precise and flexible model, with the capacity of dealing with the complexity of LD patterns throughout the human genome. Moreover, analyses performed on different subsets of markers indicate that the DHPBlocker generates consistent global block structures which are robust to marker allele frequency and density.
|