| Peer-Reviewed

A Comparative Analysis of Motif Discovery Algorithms

Received: 25 September 2015     Accepted: 26 October 2015     Published: 19 November 2015
Views:       Downloads:
Abstract

One of the major challenges in bioinformatics is the development of efficient computational algorithms for biological sequence motif discovery. In the post-genomic era, the ability to predict the behavior, the function, or the structure of biological entities or motifs such as genes and proteins, as well as interactions among them, play a fundamental role in the discovery of information to help explain biological mechanisms. This necessitated the development of computational methods for identifying these entities. Consequently, a large number of motif finding algorithms have been implemented and applied to various organisms over the past decade. This paper presents a comparative analysis of the latest developments in motif finding algorithms and proposed an algorithm for motif discovery based on a combinatorial approach of pattern driven and statistical based approach. The proposed algorithm, Suffix Tree Gene Enrichment Motif Searching (STGEMS) as reported in [30], proved effective in identifying motifs from organisms with peculiarity in their genomic structure such as the AT-rich sequence of the malaria parasite, P. falciparum. The empirical time analysis of seven motif discovery algorithms was evaluated using four sets of genes from the intraerythrocytic development cycle of P. falciparum. The result shows that algorithms based on a combinatorial approach are more desirable.

Published in Computational Biology and Bioinformatics (Volume 4, Issue 1)
DOI 10.11648/j.cbb.20160401.11
Page(s) 1-9
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2015. Published by Science Publishing Group

Keywords

Motifs, Suffix Tree, Time Complexity, P. falciparum

References
[1] Adebiyi, E. F. (2011) CODE MALARIA: Eradication developments for the decade. Covenant University 1st Inaugural lecture mini-book. Covenant University Press.
[2] Apostolico, A., Parida, L. and Rombo, S.E. (2001). Efficient detection of unusual words J. Comput. Bio., 7(1/2), pp 71-94
[3] Apostolico, A., Parida, L. and Rombo, S.E. (2008). Motif patterns in 2D, Theoretical Computer Science 390(1), pp 40–55
[4] Bailey, T. L. and Elkan, C. (2000). MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Research, 34, Web Server issue W369–W373.
[5] Bischo, E. and Vaquero, C.(2010) In silico and biological survey of transcription-associated proteins implicated in the transcriptional machinery during the erthrocyctic development of Plasmodium falciparum, BMC Genomics, 11:34.
[6] Carvalho, A.M., Freitas, A.T., Oliveira, A.L. and Sagot, M.F. (2004). A parallel algorithm for the extraction of structured motifs. 19th ACM Symposium on Applied Computing pp. 147–153.
[7] Carvalho, A., Freitas, A., Oliveira, A. and Sagot, M. (2005) Efficient Extraction of Structured Motifs Using Box-links. String Processing and Information Retrieval Conference 2004. pp. 267–278.
[8] Cawley S., Wirth, A., Speed, T.(2001). PHAT: A gene finding program for Plasmodium falciparum. Mol Biochem Parasitol. 118, pp 167–174.
[9] Cazaux, B. and Rivals, E. (2014). Reverse Engineering of Compact Suffix Trees and Links: a Novel Algorithm. J. Discrete Algorithms, dx.doi.org/10.1016/j.jda.
[10] Deb, K., Pratap, A., Agarwal, S., Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2), pp182–97.
[11] Deitsch, K. et al. (2007) Mechanisms of gene regulation in Plasmodium Am J. Trop. Med. Hyg. 77(2), pp201-8
[12] Dyer, M.D, Murali, T.M. and Sobral, B.W. (2007) Computational prediction of host-pathogen protein interactions BMC Bioinformatics; Vol. 23; 159-166.
[13] Eden, E., Lipson, D., Yogev, S. and Yakhini, Z. (2007) Discovering motifs in ranked lists of DNA sequences. PLoS Comput Biol 3(3): 239-243
[14] Eskin, E. and Pevzner, P. (2001) Finding composite regulatory patterns in DNA sequences. BMC Bioinformatics, 18 Suppl 1:S354, 63-70.
[15] Fan, Y., Wu, W., Liu, R. and Yang, W. (2013). An Iterative Algorithm for Motif Discovery. Procedia Computer Science 24, 25 – 29.
[16] Hon L.S. and Jain A.N.: A deterministic motif finding algorithm with application to the human genome. Bioinformatics 2006, 22:1047-1054
[17] Hu J, Li B, Kihara D: Limitations and potentials of current motif discovery algorithms. Nucleic Acids Res 2005, 33:4899-4913.
[18] Huo, H., Zhao, Z., Stojkovic, V. and Liu, L. (2010). Optimizing Genetic Algorithm for Motif Discovery. Mathematical and Computer Modelling, 52, 2011–2020.
[19] Jiang, D., Pei, J., Zhang, A. (2003) DHC: A Density-based Hierachical Clustering Method for Time-series Gene Expression Data. In Proceeding of BIBE2003: 3PRD P IEE International Symposium on Bioinformatics and Bioengineering, 10-12.
[20] Kaya, M. (2009) MOGAMOD: Multi-Objective Genetic Algorithm for Motif Discovery, Expert Systems with Applications, 36 (2): 1039-1947.
[21] Keich, U. and Pevzner, P. A. (2002). Finding motifs in the twilight zone BMC Bioinformatics, 18(10):1374-1381.
[22] Kellis, M., Patterson, N., Endrizzi, M., Birren, B., and Lander, E.S. 2004. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241–254.
[23] Kilpatrick, A.M., Ward, B. and Aitken, S. (2014). Stochastic EM-based TFBS motif discovery with MITSU. Vol.30, pg. 310–318, doi:10.1093/bioinformatics/btu286.
[24] Knowles, J.D. and Corne, D.W. (2000). Approximating the nondominated front using the Pareto archived evolution strategy. Evol Comput; 8(2):149–72.
[25] Konak, A., David, W., Coitb, A., Smith, E. (2006). TMulti-objective optimization using genetic algorithms: A tutorial. T Reliability Engineering and System Safety 91 pp 992–1007.
[26] Leibovich, L. and Yakhini, Z. (2012). Efficient Motif Search in Ranked Lists and Applications to Variable Gap Motifs. Nucleic Acids Research, 1–16, doi:10.1093/nar/gks206.
[27] Leibovich, L., Paz, I., Yakhini, Z. and Mandel-Gutfreund, Y. (2013). DRIMust: A Web Server for Discovering Rank Imbalanced Motifs Using Suffix Trees. Nucleic Acids Research, Vol. 41, Web Server issue, doi:10.1093/nar/gkt407.
[28] Leung HCM, Chin FYL (2006): Finding motifs from all sequences with and without binding sites. Bioinformatics, 22:2217-2223.
[29] Liu, F.M., Tsai, J.J., Chen, R.M., Chen, S.N. and Shih, S.H. (2004). TFMGA: finding motifs by genetic algorithm. T TFourth IEEE Symposium on Bioinformatics and Bioengineering T, 459.
[30] Makolo, A., Ezekiel, A. and Osofisan, A. (2011). Mining Structured Motifs with Gene Enrichment Motif Searching on Suffix tree. Journal of Computer Science and its Applications, 18(1), pp 71-78.
[31] Makolo, A., Ezekiel, A. and Osofisan, A. (2012). Comparative Analysis of Similarity Check Mechanism for Motif Extraction. IEEE African Journal of Computing & ICT.
[32] Marsan, L. and Sagot, M. (2000). Algorithm for extraction of structured motif using a suffix tree with an application to promoter and regulatory site consensus identification. Journal of Computational Biology, 7(3) pp 45-362.
[33] Mendes, N.D., Casimiro, A.C., Santos, P.M., Correia, I.S, Oliveira, A.L., Freitas, A.T. (2006). MUSA: A Parameter Free Algorithm For the Identification of Biologically Significant Motifs. Bioinformatics (Oxford Journals) 22(24), pp 2996-3002.
[34] Modan K. D. and Ho-Kwok, D. (2007). A survey of DNA motif finding algorithms. Proceedings of the Fourth Annual MCBIOS Conference. Computational Frontiers in Biomedicine.
[35] Ortet, P. and Bastien, O. (2010). Where Does the Alignment Score Distribution Shape Come from?. Evolutionary Bioinformatics 6: 159-187
[36] Pavesi, G., Thomas, M. and Martin V. (2001). An algorithm for finding sequence of unknown length. Bioinformatics(Oxford Journals)17(2), pp S207-S214.
[37] Pevzner P, Sze S. (2000): Combinatorial approaches to finding subtle signals in DNA sequences. In Proceedings of the Eighth International Conference on Intelligent Systems on Molecular Biology, San Diego, CA, 269-278.
[38] Pisanti, N., Carvalho, A., Marsan, L. and Sagot, M.F. (2006). RISOTTO: Fast extraction of motifs with mismatches. In seventh latin American theoretical information symposium.
[39] Pizzi, C., Rastas, P. and Ukkonen, E. (2011). Motif Discovery with Compact Approaches -Design and Applications, IEEE/ACM Trans. Comput. Biology Bioinform. 8(1), pp 69–79.
[40] Prakash A, Blanchette M, Sinha S, Tompa M (2004) Motif discovery in heterogeneous sequence data. Proceedings of the Ninth Pacific Symposium on Biocomputing 2004, 348-359.
[41] Pont N. et al.(2010). Nucleosome occupancy at transcription start sites in the human malaria parasite A hard-wired evolution of virulence? Infect Genet Evol, 10:1016.
[42] Quang, D. and Xie, X. (2014). EXTREME: An Online EM Algorithm for Motif Discovery. BIOINFORMATICS, Vol. 30, no. 12, pages 1667–1673, doi:10.1093/bioinformatics/btu093
[43] Redhead, E., Bailey, T.L. (2007) Discriminative motif discovery in DNA and protein sequences using the DEME algorithm. BMC Bioinformatics. 8, 385.
[44] Reid, J. and Wernisch, L. (2011). STEME: efficient EM to find motifs in large data sets Bioinformatics (Oxford Journals). 10(2), pp S93-S107.
[45] Sagot. Marie-France (1998). Spelling approximate repeated or common Motifs, PLoS Computational Biology.
[46] Siddharthan R, Siggia ED, van Nimwegen E: PhyloGibbs: A Gibbs sampling motif finder that incorporates phylogeny. PLoS Comput Biol 2005, 1:534-556
[47] Sinha, S., Blanchette, M. and Tompa, M. (2004) PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformatics, 5,170.
[48] Sun, C., Huo, H., Yu, Q., Guo, H. and Sun, Z. (2015). An Affinity Propagation-Based DNA Motif Discovery Algorithm. BioMed Research International, 853461, dx.doi.org/10.1155/2015/853461
[49] Tompa, M., Li, N., Bailey, T., Church, G., De Moor, B., Eskin, E., Favorov, A., Frith, M., Fu, Y., Kent, W., Makeev, V., Mironov, A., Noble, W., Pavesi, G., Pesole, G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., Van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C. and Zhu, Z. (2005). T Assessing computational tools for the discovery of transcription factor binding sites. TTNat Biotechnol T, T23:T137-144.
[50] Wei, Z. and Jensen, S.T. (2006) GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics, 22, 1577–1584.
[51] Wu, H., Wong, P.W.H., Caddick, M.X. and Sibthorp, C. (2013). Finding DNA Regulatory Motifs with Position-dependent Models. Journal of Medical and Bioengineering Vol. 2, No. 2.
[52] Young, J., Johnson, J., Benner, C., Yan, F., Chen, K., Roch, K., Zhou, Y. and Winzeler, E. (2008). In silico discovery of transcription regulatory elements in Plasmodium falciparum. BMC Genomics, 9, 70.
[53] Yuda, M., Iwanaga, S., Shigenubu, S., Mair, G., Janse, C., Waters, A., Kato, T. and Kaneko, I. (2009) Identification of a transcription factor in the mosquito-invasive stage of malaria parasite. Molecular Microbilogy, 71, 1402-1414
[54] Zhang. Y. and Zaki, M. (2006). EXMOTIF: Efficient structured motif extraction. Algorithms for Molecular Biology, 1, 21.
Cite This Article
  • APA Style

    Angela Makolo. (2015). A Comparative Analysis of Motif Discovery Algorithms. Computational Biology and Bioinformatics, 4(1), 1-9. https://doi.org/10.11648/j.cbb.20160401.11

    Copy | Download

    ACS Style

    Angela Makolo. A Comparative Analysis of Motif Discovery Algorithms. Comput. Biol. Bioinform. 2015, 4(1), 1-9. doi: 10.11648/j.cbb.20160401.11

    Copy | Download

    AMA Style

    Angela Makolo. A Comparative Analysis of Motif Discovery Algorithms. Comput Biol Bioinform. 2015;4(1):1-9. doi: 10.11648/j.cbb.20160401.11

    Copy | Download

  • @article{10.11648/j.cbb.20160401.11,
      author = {Angela Makolo},
      title = {A Comparative Analysis of Motif Discovery Algorithms},
      journal = {Computational Biology and Bioinformatics},
      volume = {4},
      number = {1},
      pages = {1-9},
      doi = {10.11648/j.cbb.20160401.11},
      url = {https://doi.org/10.11648/j.cbb.20160401.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.cbb.20160401.11},
      abstract = {One of the major challenges in bioinformatics is the development of efficient computational algorithms for biological sequence motif discovery. In the post-genomic era, the ability to predict the behavior, the function, or the structure of biological entities or motifs such as genes and proteins, as well as interactions among them, play a fundamental role in the discovery of information to help explain biological mechanisms. This necessitated the development of computational methods for identifying these entities. Consequently, a large number of motif finding algorithms have been implemented and applied to various organisms over the past decade. This paper presents a comparative analysis of the latest developments in motif finding algorithms and proposed an algorithm for motif discovery based on a combinatorial approach of pattern driven and statistical based approach. The proposed algorithm, Suffix Tree Gene Enrichment Motif Searching (STGEMS) as reported in [30], proved effective in identifying motifs from organisms with peculiarity in their genomic structure such as the AT-rich sequence of the malaria parasite, P. falciparum. The empirical time analysis of seven motif discovery algorithms was evaluated using four sets of genes from the intraerythrocytic development cycle of P. falciparum. The result shows that algorithms based on a combinatorial approach are more desirable.},
     year = {2015}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - A Comparative Analysis of Motif Discovery Algorithms
    AU  - Angela Makolo
    Y1  - 2015/11/19
    PY  - 2015
    N1  - https://doi.org/10.11648/j.cbb.20160401.11
    DO  - 10.11648/j.cbb.20160401.11
    T2  - Computational Biology and Bioinformatics
    JF  - Computational Biology and Bioinformatics
    JO  - Computational Biology and Bioinformatics
    SP  - 1
    EP  - 9
    PB  - Science Publishing Group
    SN  - 2330-8281
    UR  - https://doi.org/10.11648/j.cbb.20160401.11
    AB  - One of the major challenges in bioinformatics is the development of efficient computational algorithms for biological sequence motif discovery. In the post-genomic era, the ability to predict the behavior, the function, or the structure of biological entities or motifs such as genes and proteins, as well as interactions among them, play a fundamental role in the discovery of information to help explain biological mechanisms. This necessitated the development of computational methods for identifying these entities. Consequently, a large number of motif finding algorithms have been implemented and applied to various organisms over the past decade. This paper presents a comparative analysis of the latest developments in motif finding algorithms and proposed an algorithm for motif discovery based on a combinatorial approach of pattern driven and statistical based approach. The proposed algorithm, Suffix Tree Gene Enrichment Motif Searching (STGEMS) as reported in [30], proved effective in identifying motifs from organisms with peculiarity in their genomic structure such as the AT-rich sequence of the malaria parasite, P. falciparum. The empirical time analysis of seven motif discovery algorithms was evaluated using four sets of genes from the intraerythrocytic development cycle of P. falciparum. The result shows that algorithms based on a combinatorial approach are more desirable.
    VL  - 4
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Department of Computer Science, University of Ibadan, Ibadan, Nigeria

  • Sections