Finding binding site motifs in long DNA or RNA sequences is a current bioinformatic task. We designed a new algorithm that handles dinucleotidic Position Weight Matrices (or di-PWMs for short) as motif representation. Our algorithm implements an adapted enumeration based strategy for di-PWMs. The HOCOMOCO database, for instance, collects di-PWMs for Human and mouse transcription factor binding sites.
We provide a new Python package, called dipwmsearch, which offers functions to search for a di-PWM in any DNA or RNA input sequences. It is easy to install via Pypi or conda, and documented online here. Try it out. Feedbacks are highly welcome.
A preprint presenting the algorithm behind dipwmsearch is freely available on HAL and BioRxiv:
dipwmsearch: a python package for searching di-PWM motifs
bioRxiv 2022 doi:10.1101/2022.11.08.515647
Marie Mille, Julie Ripoll, and Bastien Cazaux and myself are all co-authors.
Access
- Python package: https://pypi.org/project/dipwmsearch/
- Documentation: https://rivals.lirmm.net/dipwmsearch/
- Conda package: https://anaconda.org/atgc-montpellier/dipwmsearch
- Source code: https://gite.lirmm.fr/rivals/dipwmsearch