Motif finding problem is a major challenge in biology with significant applications in the detection of transcription factor binding sites and transcriptional regulatory elements that are crucial in understanding gene expression and function, human disease, drug design, etc. Two type of motif finding problems have been investigated. Planted Motif Search Problem (PMSP) which is defined as finding motifs that appear in all sequences and a restricted version of it “Planted Motif Search Problem-Sample Driven” (PMSP-SD) where the motifs themselves are found in the input. The first version is NP-Complete and the second version can be trivially solved in polynomial time. In this paper, a new idea is used to speed up the PMS-SD algorithm. Although PMS-SD is a polynomial time algorithm and the new idea does not improve its asymptotic runtime, but since most of the motif search algorithms combine a sample driven approach with a pattern driven approach, the speed up of PMS-SD running time would result in speed up of PMS algorithm. To verify the performance of the modified algorithms which are called PMS-two step and PMS-SD-two step, these algorithms are tested on simulated data. The experimental results approve the improvements.
Published in | Computational Biology and Bioinformatics (Volume 4, Issue 2) |
DOI | 10.11648/j.cbb.20160402.11 |
Page(s) | 15-20 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2016. Published by Science Publishing Group |
Pattern and Motif Discovery, Planted (l,d)-Motif Search Problem, Closest Substring Problem
[1] | D. Laurent and B. Philipp, "Searching for regulatory elements in human noncoding sequences," Current Opinion in Structural Biology, vol. 7, pp. 399-406, 1997. |
[2] | D. S. a. D. I. Ratne, "Use of a Probabilistic Motif Search to Identify Histidine Phosphotransfer Domain-Containing Proteins," PLOS one, pp. 1-18, 2016. |
[3] | S. Rajasekaran, "Computational techniques for motif search," Frontiers in Bioscience, no. 14, pp. 5052-5065, 2009. |
[4] | M. Frances and A. Litman, "On Covering Problems of Codes," Theory of Computing Systems, vol. 30, no. 2, pp. 113-119, 1997. |
[5] | S. Rajasekaran, S. Balla and C. Huang, "EXACT ALGORITHMS FOR PLANTED MOTIF CHALLENGE PROBLEMS," APBC, pp. p249-259, 2005. |
[6] | J. Davila, S. Balla and S. Rajasekaran, "Space and Time Efficient Algorithms for Planted," Proc. Second Int Workshop Bioinformatics Research and Applications, May 2006. |
[7] | S. Rajasekaran and H. Dinh, "A Speedup Technique for (l, d) Motif Finding Algorithms," BMC Research Notes, vol. 4, no. 54, Mar. 2011. |
[8] | P. Kuksa and V. Pavlovic, "Efficient Motif Finding Algorithms for Large-Alphabet Inputs," BMC Bioinformatics, vol. 11, May 2010. |
[9] | H. Dinh, S. Rajasekaran and V. Kundeti, "PMS5: An Efficient Exact Algorithm for the (l, d) Motif Finding Problem," BMC Bioinformatics, vol. 12, Oct. 2011. |
[10] | S. Bandyopadhyay, S. Sahni and S. Rajasekaran, "PMS6: A Fast Algorithm for Motif Discovery," in Proc. IEEE Second Int’l Conf. Computational Advances in Bio and Medical Sciences (ICCABS ’12), Feb. 2012. |
[11] | M. N. a. S. Rajasekaran, "Efficient sequential and parallel algorithms for planted motif search," BMC Bioinformatics, pp. DOI: 10.1186/1471-2105-15-34, 2014. |
[12] | M. N. a. S. Rajasekaran, "qPMS9: An Efficient Algorithm for Quorum Planted Motif Search," Scientific Reports, p. doi: 10.1038/srep07813, 2015. |
[13] | Davila, J.; Balla, S.; Rajasekaran, S., "Fast and Practical Algorithms for Planted (l,d) Motif Search," IEEE/ACM Trans. on Computational Biology and Bioinformatics, vol. 4, no. 4, pp. 544-552, 2007. |
[14] | F. Chin and H. Leung, "Voting Algorithms for discovering long motifs," in Proceedings of the Third Asia-Pacific Bioinformatics Conference (APBC2005), Singapore, pages 261-271, 2005. |
[15] | N. Pisanti, A. Carvalho, L. Marsan and M. Sagot, "Risotto: Fast extraction of motifs with mismatches," in Proceedings of the 7th Latin American Theoretical Informatics Symposium, 2006. |
[16] | T. Baily and C. Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers," in Proceedings of Second International Conference on Intelligent Systems for Molecular Biology, 1994. |
[17] | J. Buhler and M. Tompa, "Finding motifs using random projections," Proceedings of Fifth Annual International Conference on Computational Molecular Biology (RECOMB), pp. 69-76, 2001. |
[18] | C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald and e. al, "Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment," Science, vol. 262, pp. 208-214, 1993. |
[19] | P. Pevzner and S. Sze, "Combinatorial approaches to finding subtle signals in dna sequences," Proceedings of Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 269-278, 2000. |
[20] | E. Rocke and M. Tompa, "An algorithm for finding novel gapped motifs in dna sequences," Proceedings of Second International Conference on Computational Molecular Biology (RECOMB), pp. 228-233, 1998. |
[21] | U. Keich and P. Pevzner, "Finding motifs in the twilight zone," Bioinformatics, vol. 18, pp. 1374-1381, 2002. |
[22] | A. Price, S. Ramabhadran and P. Pevzner, "Finding subtle motifs by branching from sample strings," Bioinformatics, vol. 1, pp. 1-7, 2003. |
[23] | G. Hertz and G. Stormo, "Identifying dna and protein patterns with statistically significant alignments of multiple sequences," Bioinformatics, vol. 15, pp. 563-577, 1999. |
[24] | A. M. J. L. A. a. A. K. Joan Serr, "A Genetic Algorithm to Discover Flexible Motifs with Support," arXiv:1511.04986v2 [cs.LG], pp. 1-9, 2016. |
[25] | M. Blanchette, "Algorithms for phylogenetic Footprinting," Proc. Fifth Ann. Int'l Conf. Computational Molecular Biology (RECOMB01), Apr. 2001. |
[26] | J. Davila, S. Balla and S. Rajasekaran, "Pampa: An improved branch and bound algorithm for planted (l, d) motif search," Technical report, 2007. |
APA Style
Saeed Alirezanejad Gohardani, Mehri Bagherian, Hamid Reza Vaziri. (2016). A New Idea for Improving the Running Time of PMS Algorithm. Computational Biology and Bioinformatics, 4(2), 15-20. https://doi.org/10.11648/j.cbb.20160402.11
ACS Style
Saeed Alirezanejad Gohardani; Mehri Bagherian; Hamid Reza Vaziri. A New Idea for Improving the Running Time of PMS Algorithm. Comput. Biol. Bioinform. 2016, 4(2), 15-20. doi: 10.11648/j.cbb.20160402.11
AMA Style
Saeed Alirezanejad Gohardani, Mehri Bagherian, Hamid Reza Vaziri. A New Idea for Improving the Running Time of PMS Algorithm. Comput Biol Bioinform. 2016;4(2):15-20. doi: 10.11648/j.cbb.20160402.11
@article{10.11648/j.cbb.20160402.11, author = {Saeed Alirezanejad Gohardani and Mehri Bagherian and Hamid Reza Vaziri}, title = {A New Idea for Improving the Running Time of PMS Algorithm}, journal = {Computational Biology and Bioinformatics}, volume = {4}, number = {2}, pages = {15-20}, doi = {10.11648/j.cbb.20160402.11}, url = {https://doi.org/10.11648/j.cbb.20160402.11}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.cbb.20160402.11}, abstract = {Motif finding problem is a major challenge in biology with significant applications in the detection of transcription factor binding sites and transcriptional regulatory elements that are crucial in understanding gene expression and function, human disease, drug design, etc. Two type of motif finding problems have been investigated. Planted Motif Search Problem (PMSP) which is defined as finding motifs that appear in all sequences and a restricted version of it “Planted Motif Search Problem-Sample Driven” (PMSP-SD) where the motifs themselves are found in the input. The first version is NP-Complete and the second version can be trivially solved in polynomial time. In this paper, a new idea is used to speed up the PMS-SD algorithm. Although PMS-SD is a polynomial time algorithm and the new idea does not improve its asymptotic runtime, but since most of the motif search algorithms combine a sample driven approach with a pattern driven approach, the speed up of PMS-SD running time would result in speed up of PMS algorithm. To verify the performance of the modified algorithms which are called PMS-two step and PMS-SD-two step, these algorithms are tested on simulated data. The experimental results approve the improvements.}, year = {2016} }
TY - JOUR T1 - A New Idea for Improving the Running Time of PMS Algorithm AU - Saeed Alirezanejad Gohardani AU - Mehri Bagherian AU - Hamid Reza Vaziri Y1 - 2016/05/28 PY - 2016 N1 - https://doi.org/10.11648/j.cbb.20160402.11 DO - 10.11648/j.cbb.20160402.11 T2 - Computational Biology and Bioinformatics JF - Computational Biology and Bioinformatics JO - Computational Biology and Bioinformatics SP - 15 EP - 20 PB - Science Publishing Group SN - 2330-8281 UR - https://doi.org/10.11648/j.cbb.20160402.11 AB - Motif finding problem is a major challenge in biology with significant applications in the detection of transcription factor binding sites and transcriptional regulatory elements that are crucial in understanding gene expression and function, human disease, drug design, etc. Two type of motif finding problems have been investigated. Planted Motif Search Problem (PMSP) which is defined as finding motifs that appear in all sequences and a restricted version of it “Planted Motif Search Problem-Sample Driven” (PMSP-SD) where the motifs themselves are found in the input. The first version is NP-Complete and the second version can be trivially solved in polynomial time. In this paper, a new idea is used to speed up the PMS-SD algorithm. Although PMS-SD is a polynomial time algorithm and the new idea does not improve its asymptotic runtime, but since most of the motif search algorithms combine a sample driven approach with a pattern driven approach, the speed up of PMS-SD running time would result in speed up of PMS algorithm. To verify the performance of the modified algorithms which are called PMS-two step and PMS-SD-two step, these algorithms are tested on simulated data. The experimental results approve the improvements. VL - 4 IS - 2 ER -