Promoters are significant cis-acting elements in genomes and play important roles in gene regulation. Each gene is regulated by a specific type of promoter, so determining the type of promoter for regulation of a gene is crucial to explore the gene function. Although some computational methods to predict promoters have been proposed, their performances are not satisfying. Convolutional neural network (CNN) is a powerful model in deep learning, it has been applied in bioinformatics in recent years. To improve the performance of promoter prediction, in this study, six types of Escherichia coli K-12 promoter DNA sequences were collected from the RegulonDB database, and constructed a CNN model to predict promoters using the Keras platform. The CNN model is composed of two convolutional layers, three dropout layers, four batch normalization layers and one hidden layer. To evaluate the performances of the CNN model, the 10-fold cross-validation and the receiver operating characteristic (ROC) curve plotting were performed. The results show, the accuracies of predictions for promoters sigma 24, sigma 28, sigma 32, sigma 38, sigma 54 and sigma 70 are 94%, 97%, 95%, 95%, 97% and 83%, respectively. The convolutional neural network model achieves the highest accuracy in promoter prediction up to now. In conclusion, CNN is the best model in promoter prediction, and it will be a promising model both in DNA and protein sequence analysis.
Published in | Computational Biology and Bioinformatics (Volume 6, Issue 2) |
DOI | 10.11648/j.cbb.20180602.11 |
Page(s) | 31-35 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2018. Published by Science Publishing Group |
Convolutional Neural Network (CNN), Escherichia Coli, Promoter, Prediction
[1] | He W, Jia C, Duan Y, et al. 70ProPred: a predictor for discovering sigma70 promoters based on combining multiple features. [J] BMC Systems Biology, 2018, 12(4):44. |
[2] | Barrios H, Valderrama B, Morett E. Compilation and analysis of sigma(54)-dependent promoter sequences. [J] Nucleic Acids Research, 1999, 27(22):4305-4313. |
[3] | Gershenzon N I, Stormo G D, Ioshikhes I P. Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites. [J] Nucleic Acids Research, 2005, 33(7):2290-301. |
[4] | Zhang L, Luo L. Splice site prediction with quadratic discriminant analysis using diversity measure. [J] Nucleic Acids Research, 2003, 31(21):6214-6220. |
[5] | Drioli S, Felluga F, Forzato C, et al. The recognition and prediction of σ 70, promoters in Escherichia coli K-12. [J] Journal of Theoretical Biology, 2006, 242(1):135. |
[6] | Gordon J J, Towsey M W, Hogan J M, et al. Improved prediction of bacterial transcription start sites. [J] Bioinformatics, 2006, 22(2):142-148. |
[7] | Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets. [J] Neural Computation, 2006, 18(7):1527-1554. |
[8] | Tran N H, Zhang X, Xin L, et al. De novo peptide sequencing by deep learning. [J] Proceedings of the National Academy of Sciences of the United States of America, 2017:201705691. |
[9] | Yang B, Liu F, Ren C, et al. BiRen: predicting enhancers with a deep-learning-based model using the DNA sequence alone. [J] Bioinformatics, 2017, 33(13). |
[10] | Hong H, Xiao-Chen B O, Fei L I. Application of Deep Learning in Biomedical Data. [J] Journal of Medical Informatics, 2018, 39(03):2-9. |
[11] | Bengio Y. Learning Deep Architectures for AI. [J] Foundations & Trends® in Machine Learning, 2009, 2(1):1-127. |
[12] | Serre T, Kreiman G, Kouh M, et al. A quantitative theory of immediate visual recognition. [J] Progress in Brain Research, 2007, 165(6):33-56. |
[13] | Zhou F Y, Jin L P, Dong J. Review of Convolutional Neural Network. [J] Chinese Journal of Computers, 2017, 40(06):1229-1251. |
[14] | Lecun Y L, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE. [J] Proceedings of the IEEE, 1998, 86(11):2278-2324. |
[15] | Lecun Y, Boser B, Denker J, et al. Backpropagation Applied to Handwritten Zip Code Recognition. [J] Neural Computation, 2014, 1(4):541-551. |
[16] | Gao L, Chen P Y, Yu S. Demonstration of Convolution Kernel Operation on Resistive Cross-Point Array. [J] IEEE Electron Device Letters, 2016, 37(7):870-873. |
[17] | Boureau Y L, Ponce J, Lecun Y. A Theoretical Analysis of Feature Pooling in Visual Recognition. International Conference on Machine Learning. DBLP, 2010:111-118. |
[18] | Ioffe S, Szegedy C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. [J] 2015:448-456. |
[19] | Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. [J] Journal of Machine Learning Research, 2014, 15(1):1929-1958. |
[20] | Zhou X, Li Z, Dai Z, et al. Predicting promoters by pseudo-trinucleotide compositions based on discrete wavelets transform. [J] Journal of Theoretical Biology, 2013, 319(5):1-7. |
[21] | Yan Y, Wan P. Prediction of Escherichia coli K-12 promoters using position-specific scoring matrix (PSSM) method. [J] Chinese Journal of Bioinformatics, 2015, 13(02):125-130. |
[22] | De A E S S, Echeverrigaray S, Gerhardt G J. BacPP: bacterial promoter prediction--a tool for accurate sigma-factor specific assignment in enterobacteria. [J] Journal of Theoretical Biology, 2011, 287(1):92. |
APA Style
Lu Wang, Ping Wan. (2018). Prediction of Escherichia Coli K-12 Promoters Using Convolutional Neural Network. Computational Biology and Bioinformatics, 6(2), 31-35. https://doi.org/10.11648/j.cbb.20180602.11
ACS Style
Lu Wang; Ping Wan. Prediction of Escherichia Coli K-12 Promoters Using Convolutional Neural Network. Comput. Biol. Bioinform. 2018, 6(2), 31-35. doi: 10.11648/j.cbb.20180602.11
@article{10.11648/j.cbb.20180602.11, author = {Lu Wang and Ping Wan}, title = {Prediction of Escherichia Coli K-12 Promoters Using Convolutional Neural Network}, journal = {Computational Biology and Bioinformatics}, volume = {6}, number = {2}, pages = {31-35}, doi = {10.11648/j.cbb.20180602.11}, url = {https://doi.org/10.11648/j.cbb.20180602.11}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.cbb.20180602.11}, abstract = {Promoters are significant cis-acting elements in genomes and play important roles in gene regulation. Each gene is regulated by a specific type of promoter, so determining the type of promoter for regulation of a gene is crucial to explore the gene function. Although some computational methods to predict promoters have been proposed, their performances are not satisfying. Convolutional neural network (CNN) is a powerful model in deep learning, it has been applied in bioinformatics in recent years. To improve the performance of promoter prediction, in this study, six types of Escherichia coli K-12 promoter DNA sequences were collected from the RegulonDB database, and constructed a CNN model to predict promoters using the Keras platform. The CNN model is composed of two convolutional layers, three dropout layers, four batch normalization layers and one hidden layer. To evaluate the performances of the CNN model, the 10-fold cross-validation and the receiver operating characteristic (ROC) curve plotting were performed. The results show, the accuracies of predictions for promoters sigma 24, sigma 28, sigma 32, sigma 38, sigma 54 and sigma 70 are 94%, 97%, 95%, 95%, 97% and 83%, respectively. The convolutional neural network model achieves the highest accuracy in promoter prediction up to now. In conclusion, CNN is the best model in promoter prediction, and it will be a promising model both in DNA and protein sequence analysis.}, year = {2018} }
TY - JOUR T1 - Prediction of Escherichia Coli K-12 Promoters Using Convolutional Neural Network AU - Lu Wang AU - Ping Wan Y1 - 2018/11/30 PY - 2018 N1 - https://doi.org/10.11648/j.cbb.20180602.11 DO - 10.11648/j.cbb.20180602.11 T2 - Computational Biology and Bioinformatics JF - Computational Biology and Bioinformatics JO - Computational Biology and Bioinformatics SP - 31 EP - 35 PB - Science Publishing Group SN - 2330-8281 UR - https://doi.org/10.11648/j.cbb.20180602.11 AB - Promoters are significant cis-acting elements in genomes and play important roles in gene regulation. Each gene is regulated by a specific type of promoter, so determining the type of promoter for regulation of a gene is crucial to explore the gene function. Although some computational methods to predict promoters have been proposed, their performances are not satisfying. Convolutional neural network (CNN) is a powerful model in deep learning, it has been applied in bioinformatics in recent years. To improve the performance of promoter prediction, in this study, six types of Escherichia coli K-12 promoter DNA sequences were collected from the RegulonDB database, and constructed a CNN model to predict promoters using the Keras platform. The CNN model is composed of two convolutional layers, three dropout layers, four batch normalization layers and one hidden layer. To evaluate the performances of the CNN model, the 10-fold cross-validation and the receiver operating characteristic (ROC) curve plotting were performed. The results show, the accuracies of predictions for promoters sigma 24, sigma 28, sigma 32, sigma 38, sigma 54 and sigma 70 are 94%, 97%, 95%, 95%, 97% and 83%, respectively. The convolutional neural network model achieves the highest accuracy in promoter prediction up to now. In conclusion, CNN is the best model in promoter prediction, and it will be a promising model both in DNA and protein sequence analysis. VL - 6 IS - 2 ER -