dipwmsearch: a python package for searching di-PWM motifs

Abstract

Motivation Seeking probabilistic motifs in a sequence is a common task to annotate putative transcription factor binding sites (TFBS). Useful motif representations include Position Weight Matrices (PWMs), dinucleotidic PWMs (di-PWMs), and Hidden Markov Models (HMMs). Dinucleotidic PWMs combine the simplicity of PWMs – a matrix form and a cumulative scoring function –, but also incoporate dependency between adjacent positions in the motif (unlike PWMs which disregard any dependency). For instance, to represent binding sites, the HOCOMOCO database provides di-PWM motifs derived from experimental data. Currently, two programs, SPRy-SARUS and MOODS, can search for di-PWMs in sequences.Results We propose a Python package, dipwmsearch, which provides an original and efficient algorithm for this task (it first enumerates matching words for the di-PWM, and then search them at once in the sequence even if it contains IUPAC codes). The user benefits from an easy installation via Pypi or conda, a documented Python interface, and reusable example scripts that smooth the use of di-PWMs.Availability and Implementation dipwmsearch is available at https://pypi.org/project/dipwmsearch/ and https://gite.lirmm.fr/rivals/dipwmsearch/ under Cecill license.Competing Interest StatementThe authors have declared no competing interest.

Publication
bioRxiv
Avatar
Marie Mille
Master student in bioinformatics, internship on algorithms for probabilistic motif search.

Trained in cellular biology some years ago, Marie worked as a school teacher in France and abroad, created a individual business, a commercial website. She resumes her studies to shift towards bioinformatics, following a Master degree in bioinformatics.