The first atlas of virus-human protein interactions

Viruses are intracellular obligate pathogens that rely on the molecular machinery of the host to fulfill their life cycle. In order to achieve this, viruses interact with host proteins, rewiring the molecular network of the host for their own benefit. Therefore, understanding how viruses interact with the host protein network is of particular interest to understand the molecular principles behind viral infection.

Despite the relevance of these interactions, our knowledge is exceedingly sparse, and very few viruses have been studied extensively experimentally. Partly, due to the difficulty to study such interactions in an experimental setting.

With this in mind, I have tuned PrePPI, a well-known algorithm developed at Barry Honig’s lab, to predict protein protein interactions (PPIs). PHIPSTER (Pathogen Host Interactome Prediction using STructurE similaRity) evaluates potential direct PPIs between a viral and a host protein by considering domain-domain contacts and peptide-domain contacts.

We applied our predictive algorithm on a viral dataset comprising 1,001 human-infecting viruses (12,237 viral proteins) against the human proteome (~20,000 human proteins), generating ~282,000 high confidence viral-host PPI predictions.

Overall, our high confidence predictions have a good predictive performance and overlap with high-throughput experimental methods. Our approach can have multiple applications, such as understanding the role of each viral protein, revealing the host biological pathways underlying human infection, and discovering functional relationships between viruses.

Our predicted Zika virus (ZIKV) protein interactome involves human proteins whose biological role are very much related to the phenotype of the Zika virus infection observed in the latest outbreak in the Americas. Moreover, we identified a particular druggable human protein acting as rheostat of viral infection that might lead to a new strategy to treat zika virus infection.

We also identified particular viral-host PPIs that are able to classify the known the oncogenic potential of human papilloma viruses (HPVs). These viruses are the leading cause of cervical cancer in women, but not all HPVs have the same ability to induce cancer. Our predictions discern between high risk HPVs and low risk HPVs, enabling the classification of other HPVs whose oncogenic potential is not known yet and hopefully bringing some additional light (since HPVs have already been widely studied) into the molecular processing underpinning cervical cancer.

Having now all viruses described by their (predicted) set of viral-host PPIs and host biological functions important during viral infection, there is some cool stuff we can do such as identifying functional relationships between viruses independently of their evolutionary origin. Throughout evolution viruses diversify, by-passing the host’s antiviral mechanisms, and explore new infection routes that can give them an adaptive advantage over the host (the so-called arms-race). Following this, it is possible that unrelated viruses have converged into certain routes, which could be an interesting target in drug therapy to treat multiple infectious diseases at once.

It is important to remark that this work reports predictions and further experimental work should follow up to explore the hypotheses generated here. Theoretical and experimental work beautifully together when theory generates hypotheses that are tested experimentally and these results are used to further refine the theoretical model in an feedback loop.

Considering that “All models are wrong, but some are useful” (George Box), I would like to believe that some of our predictions are correct and contribute to understand viral pathogenesis.

Pyruvate Carboxylase, towards a consensus view on the conformational landscape during its catalytic cycle

One of the projects I was leading in my previous lab (at CICbioGUNE) was recently published in Structure. Pyruvate Carboxylase (PC) is a multifunctional tetrameric enzyme that is involved in several biosynthetic pathways. This enzyme has two catalytic domains and it is known that a domain, known as the BCCP domain, travels between both catalytic sites; thus linking the corresponding chemical reactions. The malfunctioning of this important enzyme is related to diseases that predominantly manifest with lactic acidemia and neurological dysfunction.

When I started working on this protein back in 2008 there were two crystallized structures of this enzyme, both with a similar quaternary structure yet with a fundamental difference that was not well understood: the different arrangement of a domain, key for the function of the protein, known as as PC tetramerization (PT) domain or allosteric domain. This domain showed a symmetrical arrangement in S. aureus while it showed an asymmetrical arrangement in R. etli. The lack of consensus on this functionally important domain led to the assumption that such difference was due to speciation. However, I found this difficult to believe considering that i) PC carries out exactly the same molecular function in both species; ii) how important the PT domain is for the function of this enzyme and; iii)  how similar the overall quaternary arrangement of the tetramer is in both S. aureus and R. etli. On top of this, a recent work on a truncated PC from R. etli showed, unexpectedly, a symmetrical arrangement of the PT domains opening up the door for new answers to this unresolved question.

In this published work, we precisely tackled this question and showed that such conformational difference corresponds in fact to different conformational states of the enzyme rather to speciation. For this, we perfomed cryo-EM on working PC enzymes from S. aureus. Given the fact that each copy of PC would be in a different stage of this catalytic cycle, the challenge here was to sort out the single particle images into different conformational states. Luckily for us, existing computational methods are capable of separating conformations within conformational heterogeneous datasets. Our results showed that PC in S. aureus not only mapped to the known symmetrical conformation but also to the assymetrical conformation. Furthermore, we were able to assign each conformational state to a particular  enzymatic reaction (PC carries out two enzymatic reactions, each taking place in a diferent catalytic site) and suggest the conformational transitions happening throughout the catalytic cycle.

Our work has brought together different structures and gives light into how this protein, and similar ones, carry out their function. Also, I think that this work is a good example of how important combining different techniques can be to achieve one goal. Here, we combined X-ray crystallography, cryo-EM and molecular dynamics into one integrative modeling pipeline in order to get a bigger picture of our biological question.

Plant evolution: A promiscuous intermediate underlies the evolution of the transcription factor LEAFY

Gene duplication is a general and widely accepted mechanism for the acquisition of new gene functions. Some genes, however, are under selective pressure to remain single-copy genes, and their evolution cannot be explained by the gene duplication paradigm. Sayou et al. tackle this question using the LEAFY (LFY) gene of plants as the evolutionary model {1}, carrying out a wide range of experimental and computational analyses. The authors propose a molecular evolutionary mechanism where an ortholog – intermediate in the evolutionary history of plants – innovates new DNA binding specificities while preserving the original specificity and, later in evolution, diverges to retain the binding specificity for only one particular DNA binding motif.

The LFY gene encodes a transcription factor (TF) that is essential for reproduction and cell division and mostly exists as a single-copy gene in land plants. Previous experiments suggested that DNA binding specificity might have changed during land plant evolution {1}. Sayou et al. expand the evolutionary history of this gene {1} by orthology detection in algal species. Phylogenetic analysis is combined with SELEX experiments and reveals three different DNA binding motifs that correspond to the described evolutionary transitions. Representative LFY orthologs are found to preferentially bind one particular DNA binding motif, thus demonstrating that LFY specificity has changed during plant evolution. X-ray crystallography, protein modeling, mutagenesis and DNA binding assays reveal that LFY specificity can be defined using only three residues. Two of these residues are found to determine the half-sequence identity, while the remaining residue influences the dimerization mode and the requirement for a spacer between half-sites.

The discovery of an evolutionary intermediate ortholog, capable of binding all three types of DNA binding motifs, suggests a molecular evolutionary mechanism that explains how LFY has evolved to capture different specificities without the need of gene duplication. This very much goes along the lines of the innovation-amplification-divergence (IAD) model in the sense that innovation of new gene functions precedes gene duplication events {2}.


1. The floral regulator LEAFY evolves by substitutions in the DNA binding domain. Maizel A, Busch MA, Tanahashi T, Perkovic J, Kato M, Hasebe M, Weigel D. Science 2005 Apr 8; 308(5719):260-3

2. Evolution of new gene functions: simulation and analysis of the amplification model. Pettersson ME, Sun S, Andersson DI, Berg OG. Genetica 2009 Apr; 135(3):309-24