The protein universe in the science week of Pamplona

Last Wednesday I went to Pamplona to take part in the science week events organized by Universidad de Navarra. This is the university where I studied biochemistry so it was a great pleasure to be back, this time as a speaker.

I was to give a talk to the general public on the protein sequence-to-structure-to-function paradigm. Over 150 people turned up, most of them teenagers but I did recognize some familiar faces from my student years there.

My talk started at the beginning of everything. At a point where all atoms in the universe where compacted into a single point, of the size of a single atom. The point that led to the so-called Big bang, a huge explosion that originated the majority of proton and helium nuclei in our universe. Gravity collapsed these atoms into stars, which function as a nuclear reactor by combining nuclei into bigger nuclei. Our solar system was created from a stellar explosion known as supernova and gravity merged dust into planets and other celestial bodies. At that point, one planet, our planet, fell into the habitability zone that will permit life to arise. The life, as we know it, is based on the chemistry of carbon. Carbon atoms are the spinal cord of all organic molecules because they can set up to four stable interactions with other carbon atoms or non-carbon atoms, leading to a wide variety of organic compounds.

Proteins are composed of a basic unit called amino acids. Amino acids dictate how the protein will fold, and it is its folded structure that ultimately determines protein function. Amino acids share a basic structure but they all differ on their lateral chain. This chain gives to each amino acid particular physicochemical properties, which play a key role in protein folding. Protein folding occurs in various stages. First, as alpha helices and Beta sheets and then by establishing intra-molecular interactions between the secondary structure elements. One of the forces driving the protein folding is known as the hydrophobic effect, according to which hydrophobic amino acids will cluster inside the molecule away from water molecules… but is the structure sufficient to dictate the molecular function of a protein? No, in most of the cases. The great majority of proteins do have an intrinsic flexibility that is essential in order to carry out the designated function. Proteins carry out most of the jobs a cell needs to do. Protein function can be classified into: enzymatic, immunological, transport, signal transduction, structural and motility. Proteins do work in an orchestrated manner and if a single protein fails to carry out its function a whole biological pathway can be affected.

One of the causes that can lead into a disease state is precisely the malfunction of a protein. Such malfunction can be originated by a mutation in the corresponding gene. Mutations might affect amino acids that are essential for the function of a protein or for the folding. For example, cystic fibrosis occurs because a mutation in the chloride channel gene does not permit for chloride channels to transport this ion through the cell membrane. This affects the water balance and results in a think mucus coating. The lungs is one of the most affected organs by this malfunctioning protein, increasing the risk of infections and difficulty to breathe. P53 , also known as the guardian of the genome, is a tummor supressor that avoids damaged DNA to be transmitted to the next generation of cells. More than 50% of human cancers have shown that P53 contains mutations that alter its function.

And to finish off my talk, I discussed the importance of science in our world. Not only as a source of progress and economic wealth but as a powerful tool to educate younger generations and make a better world. Science is one of the basic pillars our society is based on and consequently governments should do a better job in protecting and promoting science regardless of the circumstances.

Overall, this was a great experience. Not only for contributing to make science more popular among youngsters (the best audience I could ever had) but also for returning to my home university.. walking through the corridors brought back so many memories that I almost felt as if I was still studying there.I have made the presentation freely available to everyone, just follow the link

Systems biology is one step closer

In this outstanding work, Karr and colleagues describe the first computational model of an organism and set up grounds for future developments in systems biology. 

One of the key challenges in modern biology is to model an organism as a whole. Doing this will enable us to predict an organism’s responses to certain conditions (e.g. exposure to drugs, gene modifications, or environmental conditions). Therefore, such a holistic approach will have a tremendous impact both in medicine and biotechnology. To model an organism requires a deep understanding not only of all pathways taking place but also of the inter-relationships between them. The work carried out by the scientific community to achieve this goal has been hampered for many years by the little knowledge extracted from experimental methods and the lack of a suitable computational approach. However, recent advances in high-throughput methods and in both software and hardware development have enabled us, for the very first time, to tackle this question with a stronger confidence. 

These authors have simulated the cell cycle of 128 Mycoplasma genitalium cells (containing 525 genes) in a typical culture environment. For this, over 900 publications with more than 1900 experimental observations were considered. The total functionality of the cell was divided into 28 different sub-models (each pertaining to a particular biological process), which were assumed to be independent between themselves on short timescales but dependent in terms of variables determined by other sub-models, based on a given timescale. 

The obtained results correlate well with other experimental data not considered for the training of the model, providing new insights into the molecular understanding of this organism and demonstrating that it is possible to classify cellular phenotypes by their underlying molecular interactions.

Recognizing protein-ligand binding sites

In this paper, Roy and Zhang describe COFACTOR, which predicts protein-ligand binding sites. COFACTOR is a comparative and hierarchical approach that uses structure modeling and a global-and-local similarity search. This method outperformed all other methods in the CASP9 competition, thus highlighting the importance of their approach.

The authors first carry out structure modeling based on the I-TASSER algorithm. Following this, a global similarity search is performed to identify template proteins with bound ligands using TM-align. During the local similarity search, sequence and geometrical information is considered in a step-wise manner: first, considering evolutionary information by position-specific iterative basic local alignment search tool (PSI-BLAST) and the Jensen-Shannon divergence score, and, secondly, by structurally aligning the candidate binding-site motif to the template motif using Needleman-Wunsch dynamic programming. Ligand conformation is ultimately refined using a quick Metropolis Monte-Carlo simulation. 

Evaluation of this method showed that COFACTOR accurately identifies 65%-69% of ligand-binding pockets and interacting residues with a Matthews correlation coefficient (MCC) of 0.55-0.58. Furthermore, it was shown to perform better that all other methods in the CASP9 competition. The authors argue that its success resides on the combination of both local and global structural alignment.