Reference-free transcriptome assembly in non-model organisms from next generation sequencing data
Philippe Gayral, Institut des Sciences de l'Evolution, Montpellier, http://162.38.181.25/PopPhyl/
- Abstract
Next-generation sequencing technologies give the opportunity for genomic
study of non-model organisms sampled in the wild. The transcriptome is a
convenient and popular target for such purposes. Assembling gene coding
sequences out of short transcriptome reads, however, is a complex task,
owing to gene duplications, genetic polymorphism, alternative splicing, and
transcription noise. Typical assembling programs return thousand of
predicted contigs, with unclear connection to species true gene content.
This is especially problematic in taxa lacking a fully-sequenced, closely
related genome. Here we use two animal species for which a reference genome
is available to assess the potential for proper transcriptome assembly in
absence of a reference. The transcriptome of Ciona intestinalis
(Urochordata) and Lepus granatensis (Mammalia) are assembled from
newly-generated 454 and Illumina sequence reads. A new procedure is
introduced to annotate each predicted contig as full length, partial,
chimera, allele, paralogue, DNA or alien, based on the number and overlap
of/between BLAST hits to appropriate reference transcriptomes and genomes.
A similar strategy is conducted using computer-generated data in human.
Analyses shows that (i) optimal assemblies are obtained when 454 and
Illumina data are combined, (ii) existing assembling programs differ in
their ability to correctly split paralogues and group alleles, (iii)
typical de novo assemblies include a majority of irrelevant cDNA
predictions, and (iv) assemblies can be appropriately cleaned by filtering
contigs based on coverage and length. We conclude that robust,
reference-free assembly of thousands of genes from transcriptomic
next-generation sequence data is possible, which opens promising
perspectives for transcriptome-based evolutionary genomics in animals.
Date: Feb 2011
HTML generated by org-mode 7.01h in emacs 22