Short presentation

For more information about the project, see the detailed presentation.

Plasmodium falciparum is the pathogenic agent responsible for malaria, which causes close to 3 millions human deaths each year in the world. Its genome, published in 2002, remains poorly understood. Among its ~5000 genes, ~3000 have a totally unknown biological function and are called orphan genes. Several reasons explain this situation, the first one being a totally atypical genome composition, with above 80% of A and T, the mean in other genomes being about 50%. This feature renders ineffective the usual methods of functional annotation, which rely on sequence comparison (alignment) and prediction of function based on that of a related gene (homologue) in nearby organisms. Facing this drawback, numerous laboratories in the world engaged into a process of massive data acquisition both at genomic (by sequencing closely related organisms and diverse strains) and post-genomic levels (notably from DNA chips which allow measuring gene expression levels in various conditions). These data are heterogeneous, of variable quality, and bring information of very different nature. These data are publicly available and accessible in PlasmoDB which gathers most of genomic and post-genomic information on Plasmodium falciparum.

The PlasmoExplore project aims at predicting the function of orphan genes, by exploiting this constantly and rapidly growing mass of data. This characterisation should allow identifying new therapeutic targets for new treatments. The proposed approach will combine :

  1. the ontologies of the Gene Ontology (GO) consortium, which define the function of genes;
  2. efficient methods of alignment able to account for the compositional bias of the malarial genome, so as to exploit genomic data and establish new homologies, at the gene scale but also at chromosomes or entire genomes scales;
  3. supervised statistical learning methods, that will be used to exploit post-genomic data and build predictors associated with each GO class;
  4. classifier combination methods to synthesize the information extracted from each data source;
  5. visualisation and interaction adapted methods, allowing a multi-scale exploration of the predictions by end users.

The project will elaborate a free-access database on the web, in the very first months, which will provide to the international scientific community the best available predictions.

Author: Laurent Bréhélin <b r e h e l i n at l i r m m . f r>

Date: 2009 oct 22

HTML generated by org-mode 6.30c in emacs 23