| Peer-Reviewed

An Entropy Based Objective Bayesian Prior Distribution

Received: 20 July 2021    Accepted: 6 August 2021    Published: 23 August 2021
Views:       Downloads:
Abstract

Bayesian Statistical Analysis requires that a prior probability distribution be assumed. This prior is used to describe the likelihood that a given probability distribution generated the sample data. When no information is provided about how data samples are drawn, a statistician must use what is called an, “objective prior distribution” for analysis. Some common objective prior distributions are the Jeffery’s prior, Haldane prior, and reference prior. The choice of an objective prior has a strong effect on statistical inference, so it must be chosen with care. In this paper, a novel entropy based objective prior distribution is proposed. It is proven to be uniquely defined given a few postulates, which are based on well accepted properties of probability distributions. This novel objective prior distribution is shown to be the exponential of the entropy information in a probability distribution (eS), which suggests a strong connection to information theory. This result confirms the maximal entropy principle, which paves the way for a more robust mathematical foundation for thermodynamics. It also suggests possible connection between quantum mechanics and information theory. The novel objective prior distribution is used to derive a new regularization technique that is shown to improve the accuracy of modern day artificial intelligence on a few real world data sets on most test runs. On just a couple of trials, the new regularization technique overly regularized a neural network and lead to poorer results. This showed that, while often quite effective, this new regularization technique must be used with care. It is anticipated that this novel objective prior will be an integral part of many new algorithms that focus on finding an appropriate model to describe a data set.

Published in American Journal of Theoretical and Applied Statistics (Volume 10, Issue 4)
DOI 10.11648/j.ajtas.20211004.12
Page(s) 184-193
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Statistics, Data Science, Artificial Intelligence, Information Theory

References
[1] BAYES, “AN ESSAY TOWARDS SOLVING A PROBLEM IN THE DOCTRINE OF CHANCES,” Biometrika, vol. 45, pp. 296–315, 12 1958.
[2] H. Jeffreys, “An invariant form for the prior probability in estimation problems,” Royal Society, vol. 186, 1946.
[3] J. B. S. Haldane, “A note on inverse probability,” Mathematical Proceedings of the Cambridge Philosophical Society, vol. 28, no. 1, pp. 55–61, 1932.
[4] J. M. Bernardo, “Reference posterior distributions for bayesian inference,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 41, no. 2, pp. 113–128, 1979.
[5] M. Ghosh, “Objective Priors: An Introduction for Frequentists,” Statistical Science, vol. 26, no. 2, pp. 187 – 202, 2011.
[6] C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948.
[7] P. Buhlmann and S. Van De Geer, Statistics for high- dimensional data: methods, theory and applications. Springer Science & Business Media, 2011.
[8] C. Lemaréchal, “Cauchy and the gradient method,” Doc Math Extra, vol. 251, no. 254, p. 10, 2012.
[9] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proceedings of the National Academy of Sciences, vol. 79, no. 8, pp. 2554–2558, 1982.
[10] S. Ross, A First Course in Probability. Pearson Prentice Hall, 2010.
[11] D. Zill, A First Course in Differential Equations (5th ed.). 2001.
[12] A.BOUHOULA,E.KOUNALIS,andM.RUSINOWITCH, “Automated Mathematical Induction,” Journal of Logic and Computation, vol. 5, pp. 631–668, 10 1995.
[13] W. Felscher, “Bolzano, cauchy, epsilon, delta,” The American Mathematical Monthly, vol. 107, no. 9, pp. 844–862, 2000.
[14] F. Mandl, Statistical Physics. Manchester Physics Series, Wiley, 2013.
[15] J. Babb, “Mathematical concepts and proofs from nicole oresme: Using the history of calculus to teach mathematics,” Science and Education, vol. 14, pp. 443– 456, 07 2005.
[16] W. R. Inc., “Differential equation solver.” Champaign, IL, 2021.
[17] A. Ralston and E. D. Reilly, Encyclopedia of Computer Science (3rd Ed.). USA: Van Nostrand Reinhold Co., 1993.
[18] E. T. Jaynes, “Information theory and statistical mechanics,” Physical review, vol. 106, no. 4, p. 620, 1957.
[19] K. P. Murphy, Machine learning : a probabilistic perspective. Cambridge, Mass. [u.a.]: MIT Press, 2013.
[20] D. Luenberger, Linear and Nonlinear Programming: Second Edition. Springer US, 2003.
[21] “Digit recognizer.” https://www.kaggle.com/c/digit- recognizer/overview. Accessed: 2021-06-18.
[22] “Creditcardfrauddetection.” https://www.kaggle.com/mlg- ulb/creditcardfraud. Accessed: 2021-06-18.
[23] A. Dal Pozzolo, O. Caelen, R. Johnson, and G. Bontempi, “Calibrating probability with undersampling for unbalanced classification,” 12 2015.
[24] A. Dal Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and G. Bontempi, “Learned lessons in credit card fraud detection from a practitioner perspective,” Expert Systems with Applications, vol. 41, pp. 4915–4928, 08 2014.
[25] A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi, “Credit card fraud detection: A realistic modeling and a novel learning strategy,” IEEE Transactions on Neural Networks and Learning Systems, vol. PP, pp. 1–14, 09 2017.
[26] F. Carcillo, A. Dal Pozzolo, Y.-A. Le Borgne, O. Caelen, Y. Mazzer, and G. Bontempi, “Scarff : a scalable framework for streaming credit card fraud detection with spark,” Information Fusion, vol. 41, 09 2017.
[27] B. Lebichot, Y.-A. Le Borgne, L. He, F. Oble, and G. Bontempi, Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection, pp. 78–88. 01 2019.
[28] F. Carcillo, Y.-A. Le Borgne, O. Caelen, Y. Kessaci, F. Oble, and G. Bontempi, “Combining unsupervised and supervised learning in credit card fraud detection,” Information Sciences, 05 2019.
[29] Y.-A. Le Borgne and G. Bontempi, Machine Learning for Credit Card Fraud Detection - Practical Handbook. 05 2021.
[30] “Us graduate school’s admission parameters.” https://www.kaggle.com/tanmoyie/us-graduate-schools- admission-parameters. Accessed: 2021-06-18.
[31] D. Powers, “Evaluation: From precision, recall and f- factor to roc, informedness, markedness and correlation,” Mach. Learn. Technol., vol. 2, 01 2008.
[32] “Aizia.” https://www.aizia.org/. Accessed: 2021-06-18.
[33] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” 2012.
[34] S. B˜A1 4hlmann, Peter; van de Geer, Statistics for High- Dimensional Data [electronic resource] : Methods, Theory and Applications. 2011.
[35] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[36] R. P. Feynman, “Space-time approach to non-relativistic quantum mechanics,” Feynman’s Thesis A New Approach To Quantum T
Cite This Article
  • APA Style

    Jamie Watson. (2021). An Entropy Based Objective Bayesian Prior Distribution. American Journal of Theoretical and Applied Statistics, 10(4), 184-193. https://doi.org/10.11648/j.ajtas.20211004.12

    Copy | Download

    ACS Style

    Jamie Watson. An Entropy Based Objective Bayesian Prior Distribution. Am. J. Theor. Appl. Stat. 2021, 10(4), 184-193. doi: 10.11648/j.ajtas.20211004.12

    Copy | Download

    AMA Style

    Jamie Watson. An Entropy Based Objective Bayesian Prior Distribution. Am J Theor Appl Stat. 2021;10(4):184-193. doi: 10.11648/j.ajtas.20211004.12

    Copy | Download

  • @article{10.11648/j.ajtas.20211004.12,
      author = {Jamie Watson},
      title = {An Entropy Based Objective Bayesian Prior Distribution},
      journal = {American Journal of Theoretical and Applied Statistics},
      volume = {10},
      number = {4},
      pages = {184-193},
      doi = {10.11648/j.ajtas.20211004.12},
      url = {https://doi.org/10.11648/j.ajtas.20211004.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20211004.12},
      abstract = {Bayesian Statistical Analysis requires that a prior probability distribution be assumed. This prior is used to describe the likelihood that a given probability distribution generated the sample data. When no information is provided about how data samples are drawn, a statistician must use what is called an, “objective prior distribution” for analysis. Some common objective prior distributions are the Jeffery’s prior, Haldane prior, and reference prior. The choice of an objective prior has a strong effect on statistical inference, so it must be chosen with care. In this paper, a novel entropy based objective prior distribution is proposed. It is proven to be uniquely defined given a few postulates, which are based on well accepted properties of probability distributions. This novel objective prior distribution is shown to be the exponential of the entropy information in a probability distribution (eS), which suggests a strong connection to information theory. This result confirms the maximal entropy principle, which paves the way for a more robust mathematical foundation for thermodynamics. It also suggests possible connection between quantum mechanics and information theory. The novel objective prior distribution is used to derive a new regularization technique that is shown to improve the accuracy of modern day artificial intelligence on a few real world data sets on most test runs. On just a couple of trials, the new regularization technique overly regularized a neural network and lead to poorer results. This showed that, while often quite effective, this new regularization technique must be used with care. It is anticipated that this novel objective prior will be an integral part of many new algorithms that focus on finding an appropriate model to describe a data set.},
     year = {2021}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - An Entropy Based Objective Bayesian Prior Distribution
    AU  - Jamie Watson
    Y1  - 2021/08/23
    PY  - 2021
    N1  - https://doi.org/10.11648/j.ajtas.20211004.12
    DO  - 10.11648/j.ajtas.20211004.12
    T2  - American Journal of Theoretical and Applied Statistics
    JF  - American Journal of Theoretical and Applied Statistics
    JO  - American Journal of Theoretical and Applied Statistics
    SP  - 184
    EP  - 193
    PB  - Science Publishing Group
    SN  - 2326-9006
    UR  - https://doi.org/10.11648/j.ajtas.20211004.12
    AB  - Bayesian Statistical Analysis requires that a prior probability distribution be assumed. This prior is used to describe the likelihood that a given probability distribution generated the sample data. When no information is provided about how data samples are drawn, a statistician must use what is called an, “objective prior distribution” for analysis. Some common objective prior distributions are the Jeffery’s prior, Haldane prior, and reference prior. The choice of an objective prior has a strong effect on statistical inference, so it must be chosen with care. In this paper, a novel entropy based objective prior distribution is proposed. It is proven to be uniquely defined given a few postulates, which are based on well accepted properties of probability distributions. This novel objective prior distribution is shown to be the exponential of the entropy information in a probability distribution (eS), which suggests a strong connection to information theory. This result confirms the maximal entropy principle, which paves the way for a more robust mathematical foundation for thermodynamics. It also suggests possible connection between quantum mechanics and information theory. The novel objective prior distribution is used to derive a new regularization technique that is shown to improve the accuracy of modern day artificial intelligence on a few real world data sets on most test runs. On just a couple of trials, the new regularization technique overly regularized a neural network and lead to poorer results. This showed that, while often quite effective, this new regularization technique must be used with care. It is anticipated that this novel objective prior will be an integral part of many new algorithms that focus on finding an appropriate model to describe a data set.
    VL  - 10
    IS  - 4
    ER  - 

    Copy | Download

Author Information
  • AiZiA, Santa Barbara, the United States of America

  • Sections