Version Française
|
Nicolas Terrapon's home page
|
|||||||
| Home | Research | Publications | Conferences | Teaching | CV | Contacts | ||
![]() TERRAPON Nicolas PhD in Bioinformatics Defence on December 2010 in LIRMM, Montpellier Post-doctoral researcher at IEB, Münster (Germany) Phone: +49 (0)176.649.917.01 E-mail: n.terrapon@wwu.de |
Since the beginning of my post-doctoral contract, I have been involved in the sequencing project of the first termite genome. Our group was in charge of the comparative analysis regarding the 10 insect genomes currently available, through the phylogeny computation and the annotation of protein domains (Interpro), GO terms (Gene Ontology) and of KEGG orthologs. Moreover, I am participating to some projects initiated by other members of the team by developing tools to help in the study of proteins through their domain architecture. My PhD works concerned the development of learning and statistical methods to improve protein domain detection, particularly for organisms which protein amino-acid composition is biased because of an unevenness of the A+T proportion of their DNA. This is notably the case of Plasmodium falciparum : lethal agent of human Malaria. The world-wide endemic caused by this intracellular parasite is a major issue for world health. To fight against this organism, understanding the function of its proteins is necessary. Protein domains, structural and functional subunits of proteins, have a key position in function prediction in bioinformatics. Nowadays, numerous protein domains databases can be found online (cf Interpro). These databases offers probabilistic models, mainly Hidden Markov Models (HMMs), built from alignments of manually curated families of homologous domains. Given a new protein sequence, these models allow to establish the domain composition of the protein. However, there is a obvious limit to this approach, due to the compositional bias of protein sequences of some organisms. We thus propose to overcome this limit by statistical and learning methods. On the one hand, my work involved the enhancing of domain detection thanks to co-occurrence properties of protein domains. Our approach allow to detect numerous new Pfam domains in Plasmodium falciparum and brought Gene Ontology (GO) annotations with low error rates. It has also revealed its ability to detect domains even in well characterized organisms. The article is published in Bioinformatics and a dedicated website containing the whole set of our results have been designed. Then, we apply the approach to ten other human pathogens and design EuPathDomains database to allow querying the results. An article presenting the EuPathDomains database has be published in the journal Infection, Genetics and Evolution. On the other hand, my researches consisted in developing methods to correct probabilistic models, unsuitable to the compositional bias. I put forward different corrections based on numerical, evolutionary, statistical and taxonomic techniques. These corrections and the obtained results are described in details in my PhD thesis and we just submitted an article to BMC Bioinformatics. |
|||||||
|
|
|
||||||