Location of Repetitive Regions in Sequences By Optimizing A Compression Method

Abstract

Suppose that a biologist wishes to study some local property P of genetic sequences. If he can design (with a computer scientist) an algorithm C which efficiently compresses parts of the sequence which satisfy P , then our algorithm TurboOptLift locates very quickly where property P occurs by chance on a sequence, and where it occurs as a result of a signi cant process. Under some conditions, the time complexity of TurboOptLift is O (n log n). We illustrate its use on the practical problem of locating approximate tandem repeats in DNA sequences.

Publication
Proc. of the 4th Pacific Symposium on Biocomputing