Fast and Accurate Genome-Scale Identification of DNA-Binding Sites

Abstract

Discovering DNA binding sites in genome sequences is crucial for understanding genomic regulation. Currently available computational tools for finding binding sites with Position Weight Matrices of known motifs are often used in restricted genomic regions because of their long run times. The ever-increasing number of complete genome sequences points to the need for new generations of algorithms capable of processing large amounts of data. Here we present MOTIF, a new algorithm for seeking transcription factor binding sites in whole genome sequences in a few seconds. We propose a web service that enables the users to search for their own matrix or for multiple JASPAR matrices. Beyond its efficacy, the service properly handles undetermined positions within the genome sequence and provides an adequate output listing for each position the matching word and its score. MOTIF is available through a web interface at http://www.atgc-montpellier.fr/motif.

Publication
2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
transcription factor binding site motif pattern PWM word enumeration indexing algorithm web resource