Bioinformatics Article Reviews: October 2010

Tuesday, October 19, 2010

An Integrated View of Molecular Coevolution in Protein–Protein Interactions

Hi Guys,

Recently an article giving an overview of protein coevolution theory is published in the Molecular Biology and Evolution journal by David L. Robertson and Simon C. Lovell from University of Manchester. We can find useful references of current work on this topic such as Hakes(2007), Suel(2003), Socolich(2005), Halabi(2009), Wang and Pollock(2005) and Fares(2006) .

This article talks about the action of evolutionary pressure on the regions of the interacting proteins that contribute to binding. Basically this paper talks about how mutation of a protein at one binding site may lead to mutation of another residue involved in same binding site.

Following figures represents the effect of lowering fitness at one interdependent site leads to increase in fitness of other site.

They also explain the definition of Coevolution from 1950s as "reciprocal evolutionary change in interacting species". There is well stabilized field of evolution(correlation) in RNA structure prediction but in terms of protein structure prediction using coevolution method is still on the stage of development. But it has been proved that coevolving residues are present in many interacting or non-interacting protein domains. We can see an example of the coevolving residues from the same article but is derived from the David Hausslers (2007) article, which calculates coevolving residues based on the the single parametric model of double amino acid pair substitution and then expands their work on whole Pfam database.

An example of sites that demonstrate intermolecular coevolution. (a) Cyanobacterial and (b) human superoxide dismutase. The residues highlighted are at structurally equivalent positions and exhibit strong covariation (Yeang and Haussler 2007). In the proteins shown, the Phe and the Asn/Gln residues have exchanged positions. (c and d) Sequence profile for the equivalent positions. In each case, the cyanobacterialsequence corresponding to panel (a) is at the top and the human sequence corresponding to panel (b) is at the bottom. For other sequences, the Pfam family names are used.

It is known that there are only few residues in each protein domain which may show coevolution. In the next table they have sorted few of the already existing methods for detecting coevolving residues and then compared their prediction of residues involved.

There are several mechanisms that contribute to the degree of correlations of replacements on amino acids either within one protein chain or between chains. Waddell et al. (2007) are explicit: ‘‘correlated evolution is what is detected,

whereas coevolution is the hypothesized cause.’’

There are, however, a set of causes that may be hypothesized:

(i) Site-specific coevolution between interacting proteinshas been detected in a range of systems (Moyle et al. 1994; Atchley et al. 2000; Mintseris and Weng 2005; Travers and Fares 2007; Yeang and Haussler 2007; Madaoui and Guerois 2008). It is relatively strong on a ‘‘per-residue’’ basis, indicated by its identification from the analysis of a handful of residues. The signal is most easily detected in the ‘‘rim’’ regions surrounding the interaction interface (Travers and Fares 2007; Yeang and Haussler 2007; Kann et al. 2009) rather than the core of the interface itself (Hakes et al. 2007). This is probably
because the interface itself can be somewhat conserved.

(ii) Correlations of evolutionary rates between interacting proteins when measured over the entire protein length (Williams and Hurst 2000; Fraser et al. 2002). The evidence suggests that these rate correlations are unrelated to coevolution; rather they are due to external factors. This suggestion solves the puzzle of evolutionary correlations between spatial distant sites within protein structures (Hakes et al. 2007) and between proteins that do not directly interact (Juan et al. 2008a). It also explains the relative strength of the observed correlated rates. For obligate complexes (i.e., those that are constitutively bound to their interacting partners), the rate correlation between proteins distant in the complex is as strong as for those directly interacting (Hakes et al. 2007). By contrast, for proteins with a more tenuous functional link, the correlation is much weaker (Juan et al. 2008a).

It is clear that site specific molecular coevolution not only exists but it is also necessary to maintain biological function. Authors argue to improve methods for predicting coevolving residues and they also ask for including the fact that sequences considered for detecting coevolution might have come different origin.

Sunday, October 17, 2010

An Integrated view of protein evolution (review in nature 2006)

Hi Guys,

This article is discuss a very important issue in protein evolution and it well stated in the abstract as " Protein evolution is not determined exclusively by selection on protein structure and function, but is also affected by the genome position of the encoding genes, their expression patters, their position in biological networks and possibly their robustness to mistranslation". So before we go deep into this paper which basically talks about evolutionary rates of different sites, we should take a look on some basics.

I know that most of you understand definition of transition matrix if not then it is a matrix that contains probabilities of each type of amino-acid substitution for a given period of evolution. I highlighted the last part of definition as it is very important to understand we can't use any transition matrix until it is specified for the evolutionary distances according to your data used in multiple sequence alignment. One can use PAM1, PAM120, PAM250 etc. depending on your data. If you don't know which one to use then just use PAM120 which considered to be optimal.

Good point to notice that protein encoded by genes under high recombination rates should evolve quickly. Well it makes sense ;). In terms of applicability, mutations at the most conserved sites of disease-associated genes are those most likely to be involved in pathology but no one knows if genes related to disease class evolve slower or faster than rest of genome. If you remember the molecular clock hypothesis which told us that protein evolution proceeds at an approximately constant rate over time, has been proved wrong by current research. In real, evolutionary rates of the proteome vary considerably across species.

I have introduced few figures from the article and those give precise information about rate of protein evolution with gene dispensability and expression level of gene.

First figure is obtained by Wall et al. using sequences from four yeast species of the Saccharomyces genus. The rate of protein evolution is weekly associated with the severity of the fitness effect of gene deletions in yeast.

In the above figure, we can see that gene expression level correlates strongly and negatively with the rate of evolution in yeast. These are again calculated by Wall et al. on the same data set as for gene dispensibility.

In last figure, we can have an evolution rate affecting factors. I am going to describe each process step by step-

a) Transcription causes increased spontaneous mutation rates in Sacchromyces cerevisiae and E.coli, probably by exposing the non-transcribed ssDNA to mutagenic chemicals

b) Recombinational repair of double stranded breaks in S.cerevisiae increases the frequency of near by point mutations.

c) Genes that are close to recombination hotspots in S.cerevisiae are expressed at higher levels during vegetative growth than most other genes.

d) Essential genes are clustered in region of low recombination in S.cerevisiae and Caenorhanditis elegans.

e) Proteins that are more dispensable tend to be expressed at lower level than less dispensable onces.

f) More protein–protein interactions have been reported for highly expressed proteins than for low-bundance proteins in S. cerevisiae. However, this correlation is not supported by all interaction-detection methods, and might reflect a detection bias towards high-abundance proteins.

g) It has been reported that essential genes have more protein–protein interactions than non-essential genes. However, this correlation might be an artefact of biases in certain interaction data sets

As in conclusions this articles states that genomic data has a range of influence on protein evolution but there is still plenty of empty stacks to fill in. These studies can incorporate further information based on duplecation and functional divergence of genes, protein domain shuffling, and horizontal gene transfer across species. These studies can improve our understanding of protein evolution and then this knowledge can be used to validate protein interaction data by comparing evolutionary rates, or when identifying potential drug targets in microbes, under the assumption that they are slowly evolving.

This article might have some minor mistakes if you want more details then you can find it on article by Martin J. Lercher.

Thanks.

Best Regards,
Vikas Gupta