Large-scale reverse engineering of regulatory circuitry

(Positions available: This project is currently recruiting PhD and BSc students)

Gene expression regulation is the process responsible for controlling what genes are turned on (expressed) or not in response to specific environmental cues. A regulatory interaction involves a regulatory element, such as a protein or an RNA, that interferes with the gene expression machinery (the molecular machinery responsible for assembling a gene product from the information encoded in the DNA) to turn on or off the expression of a specific gene.

Gene expression regulation was first studied by Jacob and Monod in a seminal paper published in 1961, and they won the 1965 Nobel Prize in Physiology or Medicine “for their discoveries concerning genetic control of enzyme and virus synthesis.” Since then, we have learned that the processing of environmental cues in complex environments requires a convoluted network of regulatory interactions whose wiring has been shaped by evolution through millions of years. Despite all the progress in the study of gene regulation, the principles governing the structure and evolution of regulatory networks are yet not fully understood.

Abasy Atlas (a database actively developed by our lab) contains the most comprehensive collection of bacterial regulatory networks having enough quality to allow system-level analyses. A lesson learned from Abasy Atlas construction was the poor diversity of bacteria for which a high-quality, experimentally-supported regulatory network reconstruction is possible: only nine species of which the 71% have a low genomic coverage (<20%). We define the genomic coverage of a regulatory network reconstruction as the fraction of the genome that the network contains, and it is the best measure of regulatory network completeness available as no estimations of the total number of regulatory interactions exist.

The poor diversity of bacteria in Abasy Atlas is a limiting step for the development of a large-scale comparative system biology and for the study of the organizational landscape of regulatory networks. To increase the utility of Abasy Atlas, it is necessary to expand its bacterial diversity, to increase its completeness and to map the systems composing these new networks. To accomplish this, we will use a mix of strategies including: 1) Actualization of regulatory networks of bacteria already included in Abasy. 2) Curation of new regulatory datasets from literature. 3) Extraction of regulatory interaction predictions from specialized databases. 4) Large-scale de novo inference of regulatory interactions using gene expression and DNA-sequence data.

Inspired by results obtained by DREAM challenges, we aim to apply a “wisdom of the crowd” philosophy. To succeed, we will review the literature to evaluate the predictability and usability of reported methods for predicting regulatory interactions from gene expression data, enabling us to identify the top methods. By integrating the predictions from the top methods with predictions from phylogenetic footprinting and other DNA-sequence-based methods we will obtain a high-confidence set of predictions.

The large-scale inference of regulatory networks will allow us the opportunity to propose for the very first time reconstructions for organisms having none. When the development of the inference method is in a mature stage, we will build a web service making available to the international scientific community our crowd-based regulatory networks inference method to facilitate the discovery of new networks.

  • Zorro-Aranda, A., Escorcia-Rodríguez, J.M., González-Kise, J.K., Freyre-González, J.A.* Curation, inference, and assessment of a globally reconstructed gene regulatory network for Streptomyces coelicolor. Scientific reports 12(1):2840 (2022) doi:10.1038/s41598-022-06658-x