Location of repetitive regions in sequences by optimizing a compression method

Olivier Delgrange, Max Dauchet, Eric Rivals

December 1998

PDF DOI

Abstract

Suppose that a biologist wishes to study some local property $P$ of genetic sequences. If he can design (with a computer scientist) an algorithm $C$ which efficiently compresses parts of the sequence which satisfy $P$, then our algorithm TurboOptLift locates very quickly where property $P$ occurs by chance on a sequence, and where it occurs as a result of a significant process. The time complexity of TurboOptLift is $O(nłog n)$. We illustrate its use on the practical problem of locating approximate tandem repeats in DNA sequences.

Type

Journal article

Publication

Biocomputing ‘99