Efficient Construction of Hierarchical Overlap Graphs


The hierarchical overlap graph (HOG for short) is an overlap encoding graph that efficiently represents overlaps from a given set $P$ of $n$ strings. A previously known algorithm constructs the HOG in $O(|| P || + n^2)$ time and $O(|| P || +n times min (n,max |s|:s ın P))$ space, where $|| P ||$ is the sum of lengths of the $n$ strings in $P$. We present a new algorithm of $O(|| P || łog n)$ time and $O(|| P || )$ space to compute the HOG, which exploits the segment tree data structure. We also propose an alternative algorithm using $O(|| P || fracłog nłog łog n)$ time and $O(|| P ||)$ space in the word RAM model of computation.

String Processing and Information Retrieval
assembly genome overlap graph complexity genome sequencing segment tree word RAM model data structure