The genomic era has provided an opportunity to address a fundamental question in genetics: which genetic loci are responsible for complex diseases and physiological differences we observe among individuals? Genome-wide association studies (GWA or GWAS), which identify significant correlations between genetic markers and phenotypes, have become the first step in answering this question. This is an exciting time for GWA, both because of the many reasonable candidate loci that are being identified and because of the opportunity to discover even more loci by applying computational techniques that make use of richer statistical models.

The statistical challenge in GWA is to find the few true cases, among thousands to millions of genetic markers, that associate with height, disease, gene expression, etc., when constrained by study sample sizes that are not particularly large. Members of our group develop scalable computational methods for tackling this problem and we also study the performance many techniques (or own and others) that are being proposed for this purpose. Recent projects include development of algorithms for multiple locus GWA analysis when simultaneously analyzing all markers in a GWA study, data mining techniques for incorporating factors that can improve study power, and methods that incorporate pedigree or breeding information, including classic linkage analysis and mixed model approaches. Our recent collaborative GWA analysis efforts have resulted in the discovery of candidate loci affecting gene expression in humans, diseases in dogs, and basic physiological traits in yeast and Drosophila (and more to come - see our recent publications).


Quantile-Quantile plot of the results of a single marker analysis of simulated GWA data including over a million markers. Each blue point indicates the log10 P-value associated with a single marker. The loci with phenotype associations are indicated in black squares.The loci identified with our simultaneous marker - multiple locus analysis technique “V-Bay” are indicated in red. V-Bay is able to detect true associations that are undetectable with a single marker analysis. The insert plot shows one of the hits from V-Bay that does not lie exactly on the marker in tightest linkage disequilibrium with the associated locus but is six SNPs away. From Logsdon and Mezey, submitted.
Volcano plot indicating the effects of smoking and genetic ancestry on genome-wide gene expression in the small airway epithelium in the human lung (top) and a Manhattan plot of the results of a GWA single marker analysis of one of the gene expression traits (bottom). This study was performed in collaboration with Dr. Ronald G. Crystal's group at Weill Medical College (Gao et al. submitted).