Location of repetitive regions in sequences by optimizing a compression method

Abstract

Suppose that a biologist wishes to study some local property $P$ of genetic sequences. If he can design (with a computer scientist) an algorithm $C$ which efficiently compresses parts of the sequence which satisfy $P$, then our algorithm TurboOptLift locates very quickly where property $P$ occurs by chance on a sequence, and where it occurs as a result of a significant process. The time complexity of TurboOptLift is $O(nłog n)$. We illustrate its use on the practical problem of locating approximate tandem repeats in DNA sequences.

Publication
Biocomputing ‘99