compression

Linking BWT and XBW via Aho-Corasick Automaton: Applications to Run-Length Encoding

The boom of genomic sequencing makes compression of sets of sequences inescapable. This underlies the need for multi-string indexing data structures that helps compressing the data. The most prominent example of such data structures is the …

STAR: an algorithm to Search for Tandem Approximate Repeats

Motivation: Tandem repeats consist in approximate and adjacent repetitions of a DNA motif. Such repeats account for large portions of eukaryotic genomes and have also been found in other life kingdoms. Owing to their polymorphism, tandem repeats have …

Optimal representation in average using Kolmogorov complexity

One knows from the Algorithmic Complexity Theory [2-5, 8, 14] that a word is incompressible on average. For words of pattern $x^m$, it is natural to believe that providing $x$ and $m$ is an optimal average representation. On the contrary, for words …

A First Step Towards Chromosome Analysis by Compression Algorithms

In this paper, we use Kolmogorov complexity and compression algorithms to study DOS-DNA (DOS: defined ordered sequence). This approach gives quantitative and qualitative explanations of the regularities of apparently regular regions. We present the …