Sunday, October 17, 2010

An Integrated view of protein evolution (review in nature 2006)

Hi Guys,

This article is discuss a very important issue in protein evolution and it well stated in the abstract as " Protein evolution is not determined exclusively by selection on protein structure and function, but is also affected by the genome position of the encoding genes, their expression patters, their position in biological networks and possibly their robustness to mistranslation". So before we go deep into this paper which basically talks about evolutionary rates of different sites, we should take a look on some basics.

I know that most of you understand definition of transition matrix if not then it is a matrix that contains probabilities of each type of amino-acid substitution for a given period of evolution. I highlighted the last part of definition as it is very important to understand we can't use any transition matrix until it is specified for the evolutionary distances according to your data used in multiple sequence alignment. One can use PAM1, PAM120, PAM250 etc. depending on your data. If you don't know which one to use then just use PAM120 which considered to be optimal.

Good point to notice that protein encoded by genes under high recombination rates should evolve quickly. Well it makes sense ;). In terms of applicability, mutations at the most conserved sites of disease-associated genes are those most likely to be involved in pathology but no one knows if genes related to disease class evolve slower or faster than rest of genome. If you remember the molecular clock hypothesis which told us that protein evolution proceeds at an approximately constant rate over time, has been proved wrong by current research. In real, evolutionary rates of the proteome vary considerably across species.

I have introduced few figures from the article and those give precise information about rate of protein evolution with gene dispensability and expression level of gene.


First figure is obtained by Wall et al. using sequences from four yeast species of the Saccharomyces genus. The rate of protein evolution is weekly associated with the severity of the fitness effect of gene deletions in yeast.


In the above figure, we can see that gene expression level correlates strongly and negatively with the rate of evolution in yeast. These are again calculated by Wall et al. on the same data set as for gene dispensibility.



In last figure, we can have an evolution rate affecting factors. I am going to describe each process step by step-
a) Transcription causes increased spontaneous mutation rates in Sacchromyces cerevisiae and E.coli, probably by exposing the non-transcribed ssDNA to mutagenic chemicals
b) Recombinational repair of double stranded breaks in S.cerevisiae increases the frequency of near by point mutations. 
c) Genes that are close to recombination hotspots in S.cerevisiae are expressed at higher levels during vegetative growth than most other genes.
d) Essential genes are clustered in region of low recombination in S.cerevisiae and Caenorhanditis elegans.
e) Proteins that are more dispensable tend to be expressed at lower level than less dispensable onces. 
f)  More protein–protein interactions have been reported for highly expressed proteins than for low-bundance proteins in S. cerevisiae. However, this correlation is not supported by all interaction-detection methods, and might reflect a detection bias towards high-abundance proteins.  
g) It has been reported that essential genes have more protein–protein interactions than non-essential genes. However, this correlation might be an artefact of biases in certain interaction data sets

As in conclusions this articles states that genomic data has a range of influence on protein evolution but there is still plenty of empty stacks to fill in. These studies can incorporate further information based on duplecation and functional divergence of genes, protein domain shuffling, and horizontal gene transfer across species.  These studies can improve our understanding of protein evolution and then this knowledge can be used  to validate protein interaction data by comparing evolutionary rates, or when identifying potential drug targets in microbes, under the assumption that they are slowly evolving. 

This article might have some minor mistakes if you want more details then you can find it on article by Martin J. Lercher. 

Thanks.

Best Regards,
Vikas Gupta

No comments:

Post a Comment