| Peer-Reviewed

A New Idea for Improving the Running Time of PMS Algorithm

Received: 3 May 2016     Accepted: 16 May 2016     Published: 28 May 2016
Views:       Downloads:
Abstract

Motif finding problem is a major challenge in biology with significant applications in the detection of transcription factor binding sites and transcriptional regulatory elements that are crucial in understanding gene expression and function, human disease, drug design, etc. Two type of motif finding problems have been investigated. Planted Motif Search Problem (PMSP) which is defined as finding motifs that appear in all sequences and a restricted version of it “Planted Motif Search Problem-Sample Driven” (PMSP-SD) where the motifs themselves are found in the input. The first version is NP-Complete and the second version can be trivially solved in polynomial time. In this paper, a new idea is used to speed up the PMS-SD algorithm. Although PMS-SD is a polynomial time algorithm and the new idea does not improve its asymptotic runtime, but since most of the motif search algorithms combine a sample driven approach with a pattern driven approach, the speed up of PMS-SD running time would result in speed up of PMS algorithm. To verify the performance of the modified algorithms which are called PMS-two step and PMS-SD-two step, these algorithms are tested on simulated data. The experimental results approve the improvements.

Published in Computational Biology and Bioinformatics (Volume 4, Issue 2)
DOI 10.11648/j.cbb.20160402.11
Page(s) 15-20
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2016. Published by Science Publishing Group

Keywords

Pattern and Motif Discovery, Planted (l,d)-Motif Search Problem, Closest Substring Problem

References
[1] D. Laurent and B. Philipp, "Searching for regulatory elements in human noncoding sequences," Current Opinion in Structural Biology, vol. 7, pp. 399-406, 1997.
[2] D. S. a. D. I. Ratne, "Use of a Probabilistic Motif Search to Identify Histidine Phosphotransfer Domain-Containing Proteins," PLOS one, pp. 1-18, 2016.
[3] S. Rajasekaran, "Computational techniques for motif search," Frontiers in Bioscience, no. 14, pp. 5052-5065, 2009.
[4] M. Frances and A. Litman, "On Covering Problems of Codes," Theory of Computing Systems, vol. 30, no. 2, pp. 113-119, 1997.
[5] S. Rajasekaran, S. Balla and C. Huang, "EXACT ALGORITHMS FOR PLANTED MOTIF CHALLENGE PROBLEMS," APBC, pp. p249-259, 2005.
[6] J. Davila, S. Balla and S. Rajasekaran, "Space and Time Efficient Algorithms for Planted," Proc. Second Int Workshop Bioinformatics Research and Applications, May 2006.
[7] S. Rajasekaran and H. Dinh, "A Speedup Technique for (l, d) Motif Finding Algorithms," BMC Research Notes, vol. 4, no. 54, Mar. 2011.
[8] P. Kuksa and V. Pavlovic, "Efficient Motif Finding Algorithms for Large-Alphabet Inputs," BMC Bioinformatics, vol. 11, May 2010.
[9] H. Dinh, S. Rajasekaran and V. Kundeti, "PMS5: An Efficient Exact Algorithm for the (l, d) Motif Finding Problem," BMC Bioinformatics, vol. 12, Oct. 2011.
[10] S. Bandyopadhyay, S. Sahni and S. Rajasekaran, "PMS6: A Fast Algorithm for Motif Discovery," in Proc. IEEE Second Int’l Conf. Computational Advances in Bio and Medical Sciences (ICCABS ’12), Feb. 2012.
[11] M. N. a. S. Rajasekaran, "Efficient sequential and parallel algorithms for planted motif search," BMC Bioinformatics, pp. DOI: 10.1186/1471-2105-15-34, 2014.
[12] M. N. a. S. Rajasekaran, "qPMS9: An Efficient Algorithm for Quorum Planted Motif Search," Scientific Reports, p. doi: 10.1038/srep07813, 2015.
[13] Davila, J.; Balla, S.; Rajasekaran, S., "Fast and Practical Algorithms for Planted (l,d) Motif Search," IEEE/ACM Trans. on Computational Biology and Bioinformatics, vol. 4, no. 4, pp. 544-552, 2007.
[14] F. Chin and H. Leung, "Voting Algorithms for discovering long motifs," in Proceedings of the Third Asia-Pacific Bioinformatics Conference (APBC2005), Singapore, pages 261-271, 2005.
[15] N. Pisanti, A. Carvalho, L. Marsan and M. Sagot, "Risotto: Fast extraction of motifs with mismatches," in Proceedings of the 7th Latin American Theoretical Informatics Symposium, 2006.
[16] T. Baily and C. Elkan, "Fitting a mixture model by expectation maximization to discover motifs in biopolymers," in Proceedings of Second International Conference on Intelligent Systems for Molecular Biology, 1994.
[17] J. Buhler and M. Tompa, "Finding motifs using random projections," Proceedings of Fifth Annual International Conference on Computational Molecular Biology (RECOMB), pp. 69-76, 2001.
[18] C. Lawrence, S. Altschul, M. Boguski, J. Liu, A. Neuwald and e. al, "Detecting subtle sequence signals: a gibbs sampling strategy for multiple alignment," Science, vol. 262, pp. 208-214, 1993.
[19] P. Pevzner and S. Sze, "Combinatorial approaches to finding subtle signals in dna sequences," Proceedings of Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 269-278, 2000.
[20] E. Rocke and M. Tompa, "An algorithm for finding novel gapped motifs in dna sequences," Proceedings of Second International Conference on Computational Molecular Biology (RECOMB), pp. 228-233, 1998.
[21] U. Keich and P. Pevzner, "Finding motifs in the twilight zone," Bioinformatics, vol. 18, pp. 1374-1381, 2002.
[22] A. Price, S. Ramabhadran and P. Pevzner, "Finding subtle motifs by branching from sample strings," Bioinformatics, vol. 1, pp. 1-7, 2003.
[23] G. Hertz and G. Stormo, "Identifying dna and protein patterns with statistically significant alignments of multiple sequences," Bioinformatics, vol. 15, pp. 563-577, 1999.
[24] A. M. J. L. A. a. A. K. Joan Serr, "A Genetic Algorithm to Discover Flexible Motifs with Support," arXiv:1511.04986v2 [cs.LG], pp. 1-9, 2016.
[25] M. Blanchette, "Algorithms for phylogenetic Footprinting," Proc. Fifth Ann. Int'l Conf. Computational Molecular Biology (RECOMB01), Apr. 2001.
[26] J. Davila, S. Balla and S. Rajasekaran, "Pampa: An improved branch and bound algorithm for planted (l, d) motif search," Technical report, 2007.
Cite This Article
  • APA Style

    Saeed Alirezanejad Gohardani, Mehri Bagherian, Hamid Reza Vaziri. (2016). A New Idea for Improving the Running Time of PMS Algorithm. Computational Biology and Bioinformatics, 4(2), 15-20. https://doi.org/10.11648/j.cbb.20160402.11

    Copy | Download

    ACS Style

    Saeed Alirezanejad Gohardani; Mehri Bagherian; Hamid Reza Vaziri. A New Idea for Improving the Running Time of PMS Algorithm. Comput. Biol. Bioinform. 2016, 4(2), 15-20. doi: 10.11648/j.cbb.20160402.11

    Copy | Download

    AMA Style

    Saeed Alirezanejad Gohardani, Mehri Bagherian, Hamid Reza Vaziri. A New Idea for Improving the Running Time of PMS Algorithm. Comput Biol Bioinform. 2016;4(2):15-20. doi: 10.11648/j.cbb.20160402.11

    Copy | Download

  • @article{10.11648/j.cbb.20160402.11,
      author = {Saeed Alirezanejad Gohardani and Mehri Bagherian and Hamid Reza Vaziri},
      title = {A New Idea for Improving the Running Time of PMS Algorithm},
      journal = {Computational Biology and Bioinformatics},
      volume = {4},
      number = {2},
      pages = {15-20},
      doi = {10.11648/j.cbb.20160402.11},
      url = {https://doi.org/10.11648/j.cbb.20160402.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.cbb.20160402.11},
      abstract = {Motif finding problem is a major challenge in biology with significant applications in the detection of transcription factor binding sites and transcriptional regulatory elements that are crucial in understanding gene expression and function, human disease, drug design, etc. Two type of motif finding problems have been investigated. Planted Motif Search Problem (PMSP) which is defined as finding motifs that appear in all sequences and a restricted version of it “Planted Motif Search Problem-Sample Driven” (PMSP-SD) where the motifs themselves are found in the input. The first version is NP-Complete and the second version can be trivially solved in polynomial time. In this paper, a new idea is used to speed up the PMS-SD algorithm. Although PMS-SD is a polynomial time algorithm and the new idea does not improve its asymptotic runtime, but since most of the motif search algorithms combine a sample driven approach with a pattern driven approach, the speed up of PMS-SD running time would result in speed up of PMS algorithm. To verify the performance of the modified algorithms which are called PMS-two step and PMS-SD-two step, these algorithms are tested on simulated data. The experimental results approve the improvements.},
     year = {2016}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - A New Idea for Improving the Running Time of PMS Algorithm
    AU  - Saeed Alirezanejad Gohardani
    AU  - Mehri Bagherian
    AU  - Hamid Reza Vaziri
    Y1  - 2016/05/28
    PY  - 2016
    N1  - https://doi.org/10.11648/j.cbb.20160402.11
    DO  - 10.11648/j.cbb.20160402.11
    T2  - Computational Biology and Bioinformatics
    JF  - Computational Biology and Bioinformatics
    JO  - Computational Biology and Bioinformatics
    SP  - 15
    EP  - 20
    PB  - Science Publishing Group
    SN  - 2330-8281
    UR  - https://doi.org/10.11648/j.cbb.20160402.11
    AB  - Motif finding problem is a major challenge in biology with significant applications in the detection of transcription factor binding sites and transcriptional regulatory elements that are crucial in understanding gene expression and function, human disease, drug design, etc. Two type of motif finding problems have been investigated. Planted Motif Search Problem (PMSP) which is defined as finding motifs that appear in all sequences and a restricted version of it “Planted Motif Search Problem-Sample Driven” (PMSP-SD) where the motifs themselves are found in the input. The first version is NP-Complete and the second version can be trivially solved in polynomial time. In this paper, a new idea is used to speed up the PMS-SD algorithm. Although PMS-SD is a polynomial time algorithm and the new idea does not improve its asymptotic runtime, but since most of the motif search algorithms combine a sample driven approach with a pattern driven approach, the speed up of PMS-SD running time would result in speed up of PMS algorithm. To verify the performance of the modified algorithms which are called PMS-two step and PMS-SD-two step, these algorithms are tested on simulated data. The experimental results approve the improvements.
    VL  - 4
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Department of Applied Mathematics, Faculty of Mathematical Science, University of Guilan, Rasht, Iran

  • Department of Applied Mathematics, Faculty of Mathematical Science, University of Guilan, Rasht, Iran

  • Department of Biology, Faculty of Science, University of Guilan, Rasht, Iran

  • Sections