Research Article | | Peer-Reviewed

Assessing the Quality of Ordinary Least Squares in General Lp Spaces

Received: 20 September 2024     Accepted: 18 October 2024     Published: 18 November 2024
Views:       Downloads:
Abstract

In the context of regression analysis, we propose an estimation method capable of producing estimators that are closer to the true parameters than standard estimators when the residuals are non-normally distributed and when outliers are present. We achieve this improvement by minimizing the norm of the errors in general Lp spaces, as opposed to minimizing the norm of the errors in the typical L2 space, corresponding to Ordinary Least Squares (OLS). The generalized model proposed here—the Ordinary Least Powers (OLP) model—can implicitly adjust its sensitivity to outliers by changing its parameter p, the exponent of the absolute value of the residuals. Especially for residuals of large magnitude, such as those stemming from outliers or heavy-tailed distributions, different values of p will implicitly exert different relative weights on the corresponding residual observation. We fitted OLS and OLP models on simulated data under varying distributions providing outlying observations and compared the mean squared errors relative to the true parameters. We found that OLP models with smaller p's produce estimators closer to the true parameters when the probability distribution of the error term is exponential or Cauchy, and larger p's produce closer estimators to the true parameters when the error terms are distributed uniformly.

Published in American Journal of Theoretical and Applied Statistics (Volume 13, Issue 6)
DOI 10.11648/j.ajtas.20241306.12
Page(s) 193-202
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Regression Analysis, Least Squares, Robust Regression, Outliers, Simulation

References
[1] Huber, Peter J. "Robust estimation of a location parameter." In Breakthroughs in statistics: Methodology and distribution, pp. 492-518. New York, NY: Springer New York, 1992.
[2] Hampel, Frank R. "The influence curve and its role in robust estimation." Journal of the American Statistical Association 69, no. 346 (1974): 383-393.
[3] Rousseeuw, Peter J. "Least median of squares regression." Journal of the American Statistical Association 79, no. 388 (1984): 871-880.
[4] Yohai, Victor J. "High breakdown-point and high efficiency robust estimates for regression." The Annals of Statistics (1987): 642-656.
[5] Maronna, Ricardo A., R. Douglas Martin, Victor J. Yohai, and Matías Salibián-Barrera. Robust statistics: theory and methods (with R). John Wiley & Sons, 2019.
[6] Schumacker, R. E., Monahan, M. P., and Mount, R. E. (2002). A comparison of OLS and robust regression using S-PLUS. Multiple Linear Regression Viewpoints, 28(2), 10-13.
[7] Ellis, S., and Morgenthaler, S. (1992). Leverage and Breakdown in L1 Regression. Journal of the American Statistical Association, 87(417), 143-148.
[8] Davies, P. (1993). Aspects of Robust Linear Regression. The Annals of Statistics, 21(4), 1843-1899. Retrieved April 24, 2020, from
[9] Rousseeuw, P. J., and Leroy, A. M. (2005). Robust regression and outlier detection (Vol. 589). John Wiley & Sons.
[10] Lai, P., and Lee, S. (2005). An Overview of Asymptotic Properties of Lp Regression under General Classes of Error Distributions. Journal of the American Statistical Association, 100(470), 446-458. Retrieved April 24, 2020, from
[11] Lai, P., and Lee, S. (2008). Ratewise Efficient Estimation Of Regression Coefficients Based On Lp Procedures. Statistica Sinica, 18(4), 1619-1640. Retrieved April 24, 2020, from
[12] Bouaziz, S., Tagliasacchi, A., and Pauly, M. (2013, August). Sparse iterative closest point. In Computer graphics forum (Vol. 32, No. 5, pp. 113-123). Oxford, UK: Blackwell Publishing Ltd.
[13] Hasselman, Berend (2018). nleqslv: Solve Systems of Nonlinear Equations. R package version 3.3.2.
[14] Fox, John, and Sanford Weisberg. An R companion to applied regression. Sage publications, 2018.
[15] Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables. R package version 5.2.2.
[16] Cont, Rama. "Empirical properties of asset returns: stylized facts and statistical issues." Quantitative finance 1, no. 2 (2001): 223.
[17] Hoek, Gerard, Bert Brunekreef, Sandra Goldbohm, Paul Fischer, and Piet A. van den Brandt. "Association between mortality and indicators of traffic-related air pollution in the Netherlands: a cohort study." The lancet 360, no. 9341 (2002): 1203-1209.
[18] Stijnen, Theo, Taye H. Hamza, and Pinar Özdemir. "Random effects meta-analysis of event outcome in the framework of the generalized linear mixed model with applications in sparse data." Statistics in medicine 29, no. 29 (2010): 3046-3067.
[19] Cutler, Winnifred, James Kolter, Catherine Chambliss, Heather O’Neill, and Hugo M. Montesinos-Yufa. "Long term absence of invasive breast cancer diagnosis in 2,402,672 pre and postmenopausal women: A systematic review and meta-analysis." Plos one 15, no. 9 (2020): e0237925.
[20] Montesinos-Yufa, Hugo Moises, and Emily Musgrove. "A Sentiment Analysis of News Articles Published Before and During the COVID-19 Pandemic." International Journal on Data Science and Technology 10, no. 2 (2024): 38-44.
[21] Montesinos-Yufa, H. M., Nagasuru-McKeever, T. (2024). Gender-Specific Mental Health Outcomes in Central America: A Natural Experiment. International Journal on Data Science and Technology, 10(3), 45-50.
[22] Coleman, E., Innocent, J., Kircher, S., Montesinos-Yufa, H. M., Trauger, M. (2024). A Pandemic of Mental Health: Evidence from the U. S. International Journal of Data Science and Analysis, 10(4), 77-85.
Cite This Article
  • APA Style

    Hoffman, K., Montesinos-Yufa, H. M. (2024). Assessing the Quality of Ordinary Least Squares in General Lp Spaces. American Journal of Theoretical and Applied Statistics, 13(6), 193-202. https://doi.org/10.11648/j.ajtas.20241306.12

    Copy | Download

    ACS Style

    Hoffman, K.; Montesinos-Yufa, H. M. Assessing the Quality of Ordinary Least Squares in General Lp Spaces. Am. J. Theor. Appl. Stat. 2024, 13(6), 193-202. doi: 10.11648/j.ajtas.20241306.12

    Copy | Download

    AMA Style

    Hoffman K, Montesinos-Yufa HM. Assessing the Quality of Ordinary Least Squares in General Lp Spaces. Am J Theor Appl Stat. 2024;13(6):193-202. doi: 10.11648/j.ajtas.20241306.12

    Copy | Download

  • @article{10.11648/j.ajtas.20241306.12,
      author = {Kevin Hoffman and Hugo Moises Montesinos-Yufa},
      title = {Assessing the Quality of Ordinary Least Squares in General Lp Spaces
    },
      journal = {American Journal of Theoretical and Applied Statistics},
      volume = {13},
      number = {6},
      pages = {193-202},
      doi = {10.11648/j.ajtas.20241306.12},
      url = {https://doi.org/10.11648/j.ajtas.20241306.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20241306.12},
      abstract = {In the context of regression analysis, we propose an estimation method capable of producing estimators that are closer to the true parameters than standard estimators when the residuals are non-normally distributed and when outliers are present. We achieve this improvement by minimizing the norm of the errors in general Lp spaces, as opposed to minimizing the norm of the errors in the typical L2 space, corresponding to Ordinary Least Squares (OLS). The generalized model proposed here—the Ordinary Least Powers (OLP) model—can implicitly adjust its sensitivity to outliers by changing its parameter p, the exponent of the absolute value of the residuals. Especially for residuals of large magnitude, such as those stemming from outliers or heavy-tailed distributions, different values of p will implicitly exert different relative weights on the corresponding residual observation. We fitted OLS and OLP models on simulated data under varying distributions providing outlying observations and compared the mean squared errors relative to the true parameters. We found that OLP models with smaller p's produce estimators closer to the true parameters when the probability distribution of the error term is exponential or Cauchy, and larger p's produce closer estimators to the true parameters when the error terms are distributed uniformly.
    },
     year = {2024}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Assessing the Quality of Ordinary Least Squares in General Lp Spaces
    
    AU  - Kevin Hoffman
    AU  - Hugo Moises Montesinos-Yufa
    Y1  - 2024/11/18
    PY  - 2024
    N1  - https://doi.org/10.11648/j.ajtas.20241306.12
    DO  - 10.11648/j.ajtas.20241306.12
    T2  - American Journal of Theoretical and Applied Statistics
    JF  - American Journal of Theoretical and Applied Statistics
    JO  - American Journal of Theoretical and Applied Statistics
    SP  - 193
    EP  - 202
    PB  - Science Publishing Group
    SN  - 2326-9006
    UR  - https://doi.org/10.11648/j.ajtas.20241306.12
    AB  - In the context of regression analysis, we propose an estimation method capable of producing estimators that are closer to the true parameters than standard estimators when the residuals are non-normally distributed and when outliers are present. We achieve this improvement by minimizing the norm of the errors in general Lp spaces, as opposed to minimizing the norm of the errors in the typical L2 space, corresponding to Ordinary Least Squares (OLS). The generalized model proposed here—the Ordinary Least Powers (OLP) model—can implicitly adjust its sensitivity to outliers by changing its parameter p, the exponent of the absolute value of the residuals. Especially for residuals of large magnitude, such as those stemming from outliers or heavy-tailed distributions, different values of p will implicitly exert different relative weights on the corresponding residual observation. We fitted OLS and OLP models on simulated data under varying distributions providing outlying observations and compared the mean squared errors relative to the true parameters. We found that OLP models with smaller p's produce estimators closer to the true parameters when the probability distribution of the error term is exponential or Cauchy, and larger p's produce closer estimators to the true parameters when the error terms are distributed uniformly.
    
    VL  - 13
    IS  - 6
    ER  - 

    Copy | Download

Author Information
  • Department of Mathematics, Computer Science, and Statistics, Ursinus College, Collegeville, USA

  • Department of Mathematics, Computer Science, and Statistics, Ursinus College, Collegeville, USA

  • Sections