One of the most challenging problems in systems biology is network discovery. All biological processes and outcomes, such as metabolism, development, and disease, are a function of interacting components that may be quantified as a network (or pathway). There is information concerning these networks that can be extracted from genomic data and a broad spectrum of approaches have been suggested for inferring network connections.

Computational approaches for network representation and analysis vary widely depending on the goals of the researcher. Our goal is to develop methodologies for discovering previously unknown network structure from population genomic data using a framework that provides clear predictions that can be experimentally validated. In line with this goal, we use probabilistic graphical models to represent networks, which reflect the conditional relationships among measured variables. A correctly inferred network graphical model makes specific predictions concerning the experimental consequences of altering a network component. An appeal of these models for network discovery is the linear form, which provides an optimal balance between interpretation and a representation that is not overly rich, such that conflicting network structures can be resolved. Our current work in this area includes development of Bayesian network algorithms that can be used to efficiently analyze tens of thousands of genes, a convex optimization algorithm for directed network discovery that leverages genetic perturbations, and a variational Bayes algorithm for sparse undirected network recovery that maintains a very low false positive rate when analyzing thousands of genes given a limited sample.

A network, including directed relationships, discovered among gene expression products measured in a Saccharomyces cerevisiae cross between a wild strain and lab strain (data described in Brem and Kruglyak 2005). This network was reconstructed using a convex, adaptive lasso algorithm by leveraging the effects of cis-eQTL (where the latter are not shown). From Logsdon and Mezey 2010.

A network among genotypes (blue), expressed genes (red) and downstream obesity-related phenotypes (green) reconstructed from the application of our undirected variational Bayes algorithm to data collected for the F2 progeny of a cross between the mouse strains C57BL/6J (B6) and C3H/HeJ (C3H) strains on an apolipoprotein E null background (data described in Ghazalpour et al. 2006 and Wang et al. 2006). This network is highly enriched for genes that have been experimentally demonstrated to have direct relationships with obesity-related phenotypes and the network also includes a number of novel genes with connections to these phenotypes (Logsdon et al. submitted).