Bioinformatics Article Reviews: How to map billions of short reads onto genomes (Nature Biotechnology)

Hi Guys,

I am taking a course in Next Generation Sequencing so as a thought I am trying to summarize some of articles we have been following. This particular article was published in May 2009 by Cole Trapnell & Steven L Salzberg. It gives us brief idea about tools available for dealing with huge amount of short reads obtained by deep sequencing technologies. In this paper, we can also find a well explained and concise concept used behind making tools such as Mac and Bowtie. I will try to explain more as we progress through this log.

Lets talk about a bit on so called 'read mapping problem'. After using Next Generation Sequencing Technologies, one will get millions of reads as an output and then it is a challenge to map these reads against a known or predicted genome. As we have huge amount of small reads and have to find target for each read, software such as BLAST might take a huge amount of time. Such problems led to development of tools those are based on search for short reads efficiently in both time in space. Few of such tools are given in the article.

We used Bowtie in our exercise because it gives alignment results in less time than other competitive softwares but it does not guaranty based results at the same time. But as far we learned it leads to pretty extensive results.

As explained in article, both methods Bowtie and Mac, use a linear transformation of genomic information in such way that we can find aligned reads in less time and more/less space. Mac uses Spaced seeds technique where it divides reference genome into small seeds and then store such seeds in indices. Similar way, read is also divided into small seeds and then we compare smaller lengths and if matched then we match rest of the read sequences. In simple words chop-off both reference genome sequence and read sequences then look for smaller pieces. If there is no smaller piece matching then there will no bigger either. You can understand this flow chart from the following figure. In Bowtie, they follow Burrows-Wheeler concept, which was originally used to compress data of big files into smaller size by using transformation. In Bowtie, we simply look for matching of first character and then continue this matching till last character. This is done in time efficient way but only drawback is that it does not allow gap insertions and will return only perfect match.

Many challenges and questions remain for developers of read mapping software. As all the sequencing machine vendors are trying to produce longer reads, will the short-read mapping programs scale well as the reads get longer? Mac, Bowtie and several other short-read packages support reads longer than 100 bp, but at some point, software designed for longer reads, such as BLAST, may be a better fit for downstream analysis. Furthermore, when mapping reads from an organism that has diverged significantly from its reference genome, how should a program’s parameters be adjusted, and can that adjustment happen automatically? How useful is mapping quality in downstream analysis, and should it be computed while aligning reads, as Mac does, or later? The answers to each of these questions will depend on the type of assay and the scale of the analysis, and as long as the technology continues to change, the programs will have to change rapidly to keep up.

Bioinformatics Article Reviews

Sunday, November 7, 2010

How to map billions of short reads onto genomes (Nature Biotechnology)

No comments:

Post a Comment