Positions after PhD


Research director


Jan 2016 – Present Montpellier, France

Responsibilities include:

  • Head of Institute of Computational Biology (2015-2019)
  • Head of French Molecular Bioinformatics network (2010-2016)
  • Head of computer science dpt (2007-2010)



Jan 2008 – Oct 1999 Montpellier, France
Bioinformatics and algorithmics research.

Postdoctoral fellow

German Cancer Research Center (Deutsches Krebsforschung Zentrum - DKFZ)

Sep 1996 – Oct 1999 Heidelberg, Germany
Algorithms for transcriptomics.


News about software, results, publications, or collaborations

Speech on RNA marks as biomarkers for cancer detection. Free event: join us!

Our combinatorial work will be published at 50th ICALP conference

A new European research network about the role of RNA translation in cancer biology was funded by the COST agency for five years starting in October 2022. It is called TRANSLACORE (COST network number CA21154) for “Translational control in Cancer”. Its COST “Action” webpage is here. You will find details about this action and its “Memorandum of Understanding". The action is coordinated by Dr Jean-Jacques Diaz from Lyon and Dr Fatima Gebauer from Barcelona, and organized in 5 Working Groups.





EU funded Int’al Training Network on Computational Pan-Genomics


Algorithms for shortest superstring questions


Software for efficient metagenomics and applications to virus metagenomics


Information fuelled biophysical models for the control of gene expression

Recent Publications

Finding the correct position of new sequences within an established phylogenetic tree is an increasingly relevant problem in evolutionary bioinformatics and metagenomics. Recently, alignment-free approaches for this task have been proposed. One such approach is based on the concept of phylogenetically-informative k-mers or phylo- k-mers for short. In practice, phylo- k-mers are inferred from a set of related reference sequences and are equipped with scores expressing the probability of their appearance in different locations within the input reference phylogeny. Computing phylo- k-mers, however, represents a computational bottleneck to their applicability in real-world problems such as the phylogenetic analysis of metabarcoding reads and the detection of novel recombinant viruses. Here we consider the problem of phylo- k-mer computation: how can we efficiently find all k-mers whose probability lies above a given threshold for a given tree node? We describe and analyze algorithms for this problem, relying on branch-and-bound and divide-and-conquer techniques. We exploit the redundancy of adjacent windows of the alignment to save on computation. Besides computational complexity analyses, we provide an empirical evaluation of the relative performance of their implementations on simulated and real-world data. The divide-and-conquer algorithms are found to surpass the branch-and-bound approach, especially when many phylo- k-mers are found.

Seeking probabilistic motifs in a sequence is a common task to annotate putative transcription factor binding sites or other RNA/DNA binding sites. Useful motif representations include position weight matrices (PWMs), dinucleotide PWMs (di-PWMs), and hidden Markov models (HMMs). Dinucleotide PWMs not only combine the simplicity of PWMs—a matrix form and a cumulative scoring function—but also incorporate dependency between adjacent positions in the motif (unlike PWMs which disregard any dependency). For instance to represent binding sites, the HOCOMOCO database provides di-PWM motifs derived from experimental data. Currently, two programs, SPRy-SARUS and MOODS, can search for occurrences of di-PWMs in sequences.We propose a Python package called dipwmsearch, which provides an original and efficient algorithm for this task (it first enumerates matching words for the di-PWM, and then searches these all at once in the sequence, even if the latter contains IUPAC codes). The user benefits from an easy installation via Pypi or conda, a comprehensive documentation, and executable scripts that facilitate the use of di-PWMs.dipwmsearch is available at https://pypi.org/project/dipwmsearch/ and https://gite.lirmm.fr/rivals/dipwmsearch/ under Cecill license.

Background: Protozoan parasites are known to attach specific and diverse group of proteins to their plasma membrane via a GPI anchor. In malaria parasites, GPI-anchored proteins (GPI-APs) have been shown to play an important role in host-pathogen interactions and a key function in host cell invasion and immune evasion. Because of their immunogenic properties, some of these proteins have been considered as malaria vaccine candidates. However, identification of all possible GPI-APs encoded by these parasites remains challenging due to their sequence diversity and limitations of the tools used for their characterization. Methods: The FT-GPI software was developed to detect GPI-APs based on the presence of a hydrophobic helix at both ends of the premature peptide. FT-GPI was implemented in C ++and applied to study the GPI-proteome of 46 isolates of the order Haemosporida. Using the GPI proteome of Plasmodium falciparum strain 3D7 and Plasmodium vivax strain Sal-1, a heuristic method was defined to select the most sensitive and specific FT-GPI software parameters. Results: FT-GPI enabled revision of the GPI-proteome of P. falciparum and P. vivax, including the identification of novel GPI-APs. Orthology- and synteny-based analyses showed that 19 of the 37 GPI-APs found in the order Haemosporida are conserved among Plasmodium species. Our analyses suggest that gene duplication and deletion events may have contributed significantly to the evolution of the GPI proteome, and its composition correlates with speciation. Conclusion: FT-GPI-based prediction is a useful tool for mining GPI-APs and gaining further insights into their evolution and sequence diversity. This resource may also help identify new protein candidates for the development of vaccines for malaria and other parasitic diseases. Keywords: FT-GPI; GPI-anchored protein; GPI-proteome; P. vivax; Plasmodium falciparum.

One of the main challenges in cancer management relates to the discovery of reliable biomarkers, which could guide decision-making and predict treatment outcome. In particular, the rise and democratization of high-throughput molecular profiling technologies bolstered the discovery of “biomarker signatures” that could maximize the prediction performance. Such an approach was largely employed from diverse OMICs data (i.e., genomics, transcriptomics, proteomics, metabolomics) but not from epitranscriptomics, which encompasses more than 100 biochemical modifications driving the post-transcriptional fate of RNA: stability, splicing, storage, and translation. We and others have studied chemical marks in isolation and associated them with cancer evolution, adaptation, as well as the response to conventional therapy. In this study, we have designed a unique pipeline combining multiplex analysis of the epitranscriptomic landscape by high-performance liquid chromatography coupled to tandem mass spectrometry with statistical multivariate analysis and machine learning approaches in order to identify biomarker signatures that could guide precision medicine and improve disease diagnosis. We applied this approach to analyze a cohort of adult diffuse glioma patients and demonstrate the existence of an “epitranscriptomics-based signature” that permits glioma grades to be discriminated and predicted with unmet accuracy. This study demonstrates that epitranscriptomics (co)evolves along cancer progression and opens new prospects in the field of omics molecular profiling and personalized medicine.

Mechanisms of drug-tolerance remain poorly understood and have been linked to genomic but also to non-genomic processes. 5-fluorouracil (5-FU), the most widely used chemotherapy in oncology is associated with resistance. While prescribed as an inhibitor of DNA replication, 5-FU alters all RNA pathways. Here, we show that 5-FU treatment leads to the production of fluorinated ribosomes exhibiting altered translational activities. 5-FU is incorporated into ribosomal RNAs of mature ribosomes in cancer cell lines, colorectal xenografts, and human tumors. Fluorinated ribosomes appear to be functional, yet, they display a selective translational activity towards mRNAs depending on the nature of their 5′-untranslated region. As a result, we find that sustained translation of IGF-1R mRNA, which encodes one of the most potent cell survival effectors, promotes the survival of 5-FU-treated colorectal cancer cells. Altogether, our results demonstrate that “man-made” fluorinated ribosomes favor the drug-tolerant cellular phenotype by promoting translation of survival genes.

Popular Topics

68W32 Aho-Corasik algebraic technique algorithm algorithms alignment alignment score ALPACA alphabet size Anchor-based strategy ancient DNA Approximability approximate match approximate pattern matching approximate repeats approximation Approximation algorithm approximation algorithms APX assembly autocorrelation award bacteria Bacterial genomes Basic Period binary alphabet Binary Vector binding binding site bioinformatics Biological Physics (physics.bio-ph) biophysics BLAST bounds Burrows-Wheeler cancer cDNA character Characterisation chromatin chromosome circular permutation cluster analysis clustering clustering algorithms coding coiled coil Collinear fragment chaining common word Comparative genomics complexity compressed data structures compression compression algorithms compression gain computer science Concat-Cycles concensus string conformation Connectivity cross-over cyclic cover Cyclic string cyclic strings Cytoplasmic Male Sterility Data compression data structure Data structures Data Structures and Algorithms (cs.DS) database DCJ de Bruijn graph discrete line DNA dominance order double cut and join duplication dynamic programming edit distance encoding enumeration epitranscriptome equality EST Eulerian tour evaluation evolution Exact Match exponential F.2.2 filtration FOS: Computer and information sciences FOS: Physical sciences Gapped seed genetics genome genome rearrangement genome sequencing genomics Golomb ruler graph greedy greedy algorithm Greedy conjecture Haemophilus influenzae Hamiltonian path heuristic algorithms Hi-C homologous sequences human hybrid zone Hypergraph incomplete lineage sorting indexing information content information theory input string INS insulin integer sequence internal duplication intragenic recombination irreducible factor kinship Kolmogorov complexity lattice LCS Levenshtein distance linear superstring linear time linear time algorithm Longest common subsequence machine learning malaria mapping tool matroid maximal chain Maximum coverage Maximum independent set Maximum stable set medecine memory metagenome metagenomics microorganisms microsatellite evolution Minimum assignment minisatellite minisatellite locus minisatellites MIS monkey test motif motif size mouse mRNA MS-Align multiple alignment multiple read mutation MVR N-gram NGS NP-complete NP-hard On-line algorithms optimal coding oryza line overlap overlap graph Pairwise alignment parameterized complexity path pattern Pattern recognition pattern search perfect detection periods Permutation phylogenetic profile Polynomial Time Approximation Scheme proportional length protein domain proteome PWM python radish genome random-access memory random text Read mapping rearrangement Recognition recombinant regular expression regularity detection regulation relative compression repeats reverse complementary sequence RLE RNA search algorithm seed segment tree seminar sequence sequence alignment sequence classification sequence comparison Sequence graph short tandem repeats Shortest cyclic cover of strings shortest DNA cyclic cover problem Shortest Superstring Problem similarity similarity metrics single cell soft software spaced-seed Statistical Mechanics (cond-mat.stat-mech) string String matching stringology Stringology Text Algorithms Indexing Data Structures De Bruijn Graph Assembly Space Complexity Dynamic Update strings student sturgeon phylogeny subset system suffix array suffix tree superstring sweep line tandem duplication tandem repeat tandem repeat alignment tandem repeats team text text compression Text indexing Tiling time complexity tool training transcription factor transcriptome transcriptomics translation tree tree alignment validation score virus VNTR W[1]-hard web resource web server Whole genome alignment word word enumeration word RAM model Yakuts zebra fish


Connect with me

  • (33) 04 67 41 86 64
  • LIRMM - UMR 5506 & University Montpellier (CC 05016) 860 rue de St Priest - 34095 Montpellier cedex 5 FRANCE