Differential expression with RNASeq: length and depth does matter.
Ana Conesa, Gene Expression Lab, Centro de Investigaciones Principe Felipe, Valencia, Spain aconesa@cipf.es
- Abstract
The RNA-seq technology has revealed in the last few years amazing new
data on isoform and allelic expression, novel splice junctions, 3' UTR
regions, antisense regulation and intragenic expression. RNA-seq is
also increasingly being applied to the quantification of gene
expression, as the number of mapped reads to a given gene or
transcript is an estimation of the level of expression of that
feature. How to process RNA-seq data to refine gene models or
accurately portrait the dynamics of gene expression is today an area
of active bioinformatics research. Although at the dawn of the RNA-seq
applications it was claimed that this technology would render
unbiased, ready-to-analyze expression data, the reality has turned out
to be completely different and several shortcomings of the technology
has been reported, such as the transcript length dependence of
expression levels, or the influence of reads distribution in the
assessment of differential expression. An underlying factor in RNA-seq
analysis is the amount of reads generated in a given experiment. The
more it is sequenced, the more transcripts are identified, the higher
is the value of the expression level and the greater the power to
identify differential expression. This leads to the question of how
many reads should be generated in a RNA-seq experiment to obtain
robust results.
In this presentation we discuss the effect that sequencing depth has
on the detection of expressed genes, the distribution of reads in
transcript types and the number of differential expression calls. We
show how sequencing depth influences the transcript length dependency
of RNAseq. We present a novel approach for the analysis of count data
that does not rely on parametric assumptions and shows robust
behaviour for these biases.
Date: Feb 2011
HTML generated by org-mode 7.01h in emacs 22