Approximate Common Intervals in Multiple Genome Comparison


We consider the problem of inferring approximate common intervals of multiple genomes. Genomes are modelled as sequences of homologous genes families identifiers, and approximate common intervals represent conserved regions possibly showing rearrangements, as well as repetitions, or insertions/deletions. This problem is already known, but existing approaches are not incremental and somehow limited to special cases. We adopt a simple, classical graph-based approach, where the vertices of the graph represent the exact common intervals of the sequences (i.e., regions containing the same gene set), and where edges link vertices that differ by less than δ elements (with δ being parameter). With this model, approximate gene clusters are maximal cliques of the graph: computing them can exploit known and well designed algorithms. For a proof of concept, we applied the method to several datasets of bacterial genomes and compared the two maximal cliques algorithms, a static and a dynamic one. While being quite flexible, this approach opens the way to a combinatorial characterization of genomic rearrangements in terms of graph substructures.

2011 IEEE International Conference on Bioinformatics and Biomedicine
bioinformatics genomics heuristic algorithms clustering algorithms approximation algorithms microorganisms