Detailed presentation

Context and motivation

According to the latest joint WHO-UNICEF report, malaria (or paludism) affects every year about 500 million human beings around the world and kills around 3 million people, mainly children in Africa. This infectious disease is caused by an unicellular organism, Plasmodium falciparum, which is transmitted by an Anopheles mosquito. The fight against this infectious agent faces numerous problems, the three main issues being: (1) the outbreak of resistance to today's few existing treatments (e.g. chloroquine and artemisine); (2) the limited number of new therapeutic targets due to a poor knowledge of the Plasmodium falciparum genome; (3) the failure of all vaccine trials.

The PlasmoExplore project aims to contribute to the deciphering of the Plasmodium falciparum genome, and to answer point (2) by bringing into light the function of orphan (unknown) genes, thus constituting a new pool of therapeutic targets. This task, usually hard to carry out, is particularly difficult as far as Plasmodium falciparum is concerned. Indeed, this is a complex organism, which includes a vegetal part stemming from an ancient endosymbiosis with red alga. Its parasitic cycle is also complex as the organism successively lies in Anopheles, then in human liver and eventually in human red cells. Moreover, it undergoes several transformations in all these places. Its genome (published in 2002) is definitely atypical as it carries an important proportion of both A and T (about 80%), when the average in most species is around 50%. Its proteins are also atypical due to their amino-acid composition which is highly biased due to the abundance of A and T, but also because they are about 20% longer than the homologuous proteins known in other organisms. With the accumulation of all these issues, only 40% of the Plasmodium falciparum genes are functionally characterized (while being not yet experimentally assessed), thanks to the finding of an homologuous gene in a nearby organism. Therefore, 60% of genes have totally unknown biological function. Their number is around 3000, according to the PlasmoDB database that compiles and updates a major part of the available genomic and post-genomic data concerning Plasmodium falciparum.

The purpose of PlasmoExplore is to predict the function of these 3000 unknown genes. To do so, we will rely on: (1) the genomic data of Plasmodium falciparum and close species, a large number of which is currently under sequencing and should be available soon; (2) the post-genomic data, especially the transcriptome stemming from DNA chips, which measure the level of gene expressions in varied conditions or on different time points of the organism life cycle; interactome and proteome shall be used as well. These data are heterogeneous; they are of variable quality (transcriptomic data are usually noisy, while sequencing data are not); and, above all, they bring different information on gene function. Besides, these data constantly evolve and increase, as a result of the important international programs in this area.

Expected scientific and technical outcome

The general method we propose to exploit these data combines : (1) the ontologies of the Gene Ontology (GO) consortium which define the function of genes following three viewpoints: the cellular function, the biochemical function and the cellular location; (2) alignment methods and algorithms (including the development of score matrices dedicated to Plasmodium falciparum and taking into account the compositional bias of nucleotides or amino-acids) dedicated to genes, but also chromosomes and entire genomes, in order to exploit genomic data and set up new homologies; (3) statistical learning methods which will be used to exploit post-genomic data and to build predictors that will be associated with each GO class; (4) classifier combination methods to synthesize the information extracted from each data source; (5) visualization methods to allow for multi-scale exploration of the predictions. Finally, the approach will be flexible to account easily for new incoming data.

Each of these points requires methodological research whose range goes far beyond the applications presented here. For example, supervised learning methods constitute natural approaches to assign object to a set of classes that partition the description space, but they do not deal –or imperfectly– with non-exclusive and hierarchically organized classes. An important part of supervised classification (based on the Bayes rule) must be re-thought in the perspective of this brand new context. Clearly, such approaches will find applications in other domains where ontologies are available. In the same way, alignment of sequences that possess differences both in character composition and lengths requires to re-consider the fast algorithms (BLAST type) or those based on dynamic programming (Smith and Waterman type). The combination of classifiers in this context (implying ontologies in particular) also constitutes a promising field of investigation, with multiple applications. Finally, the visualisation of complex predictions in the framework of ontologies, with the aim to highlight the processes and data sources responsible for the expressed results, has a wide interest.

Apart form this methodological research, the project will develop a free-access database available on the internet. This database will grant access to the best available predictions concerning each unknown gene of Plasmodium falciparum. It will be linked to PlasmoDB and will allow for multiple requests, e.g. to extract the genes predicted with high confidence in a given set of functions. The feedback from the international community on these predictions will be invaluable to refine both the methodology and the results.

The biologist partners of the project aim to discover new therapeutic targets and to design new treatments. The most relevant predictions made within the project will be experimentally tested in wet laboratory, which will induce feedback on these predictions. These experimental approaches are generally expensive and will not be supported by the present project, but will be the subject of subsequent applications to ANR or other funding institutions. Nevertheless, they will bring strong foundations to the PlasmoExplore project and will contribute to its visibility.

Expected industrial and economic spin-offs

One of the goal of the PlasmoExplore project is to progress in the inventory of the potential targets for therapeutic treatments. The discovery of a new anti-malarial drug would constitute a major event whose industrial spin-offs would be highly significant. It may sound presumptuous to say that we will reach this goal, but as far as computer science is concerned, we aim to play an important part in this quest that matters so much.

Author: Laurent Bréhélin <b r e h e l i n at l i r m m . f r>

Date: 2009 oct 22

HTML generated by org-mode 6.30c in emacs 23