Ridge Regression is an important statistical method in modeling vehicle crash frequency when crash data contains collinear predictors. The term multicollinearity refers to the condition in which two or more predictors are highly correlated with one another. This would make the explanatory variables become very sensitive to small changes in the model. Multicollinearity reduces the precision of the estimated coefficients, which weakens the statistical power of the regression model. Common methods to address multicollinearity include: variable selection and ridge regression. Variable selection simply entails dropping predictors that are highly correlated in the model. But sometimes this is not possible, especially when a variable that contributes to the collinearity might be a main predictor in the model. However, using ridge regression will allow retention of all explanatory variables of interest, even if they are highly collinear, and provide information regarding which coefficients are the most sensitive to multicollinearity. Ridge regression works by adding a degree of bias to the regression estimates that reduce the standard errors and produce estimates that are much more reliable. This paper uses a five-year vehicle crash data extending from 2011 to 2015 on the interstate highway (I-90) in the state of Minnesota, USA. The data has shown multicollinearity between some independent variables. Results show that the Ridge regression is an effective tool to address the existing multicollinearity and produce accurate regression estimates compared with multiple linear regression.
Cite this paper
Abdulhafedh, A. (2022). Modeling Vehicle Crash Frequency When Multicollinearity Exists in Vehicle Crash Data: Ridge Regression versus Ordinary Least Squares Linear Regression. Open Access Library Journal, 9, e8873. doi: http://dx.doi.org/10.4236/oalib.1108873.
Washington, S.P., Karlaftis, M.G. and Mannering, F. (2010) Statistical and Econometric Methods for Transportation Data Analysis. 2nd Edition, Chapman Hall/CRC, Boca Raton.
Cule, E. and De Iorio, M. (2012) A Semi-Automatic Method to Guide the Choice of Ridge Parameter in Ridge Regression. arXiv: 1205.0686
http://arxiv.org/pdf/1205.0686.pdf
Ahn, J.J., Kim, Y.M., Yoo, K., Park, J. and Oh, K.J. (2012) Using GA-Ridge Regression to Select Hydro-Geological Parameters Influencing Groundwater Pollution Vulnerability. Environmental Monitoring and Assessment, 184, 6637-6645.
https://doi.org/10.1007/s10661-011-2448-1
Alkhamisi, M., Khalaf, G. and Shukur, G. (2006) Some Modifications for Choosing Ridge Parameters. Communications in Statistics: Theory and Methods, 35, 2005-2020. https://doi.org/10.1080/03610920600762905
Abdulhafedh, A. (2022) Comparison between Common Statistical Modeling Techniques Used in Research, Including: Discriminant Analysis vs Logistic Regression, Ridge Regression vs LASSO, and Decision Tree vs Random Forest. Open Access Library Journal, 9, e8414. https://doi.org/10.4236/oalib.1108414
Cameron, A.C. and Trivedi, P.K. (1998) Regression Analysis of Count Data. Cambridge University Press, Cambridge, UK.
https://doi.org/10.1017/CBO9780511814365
Alkhamisi, M.A. and Shukur, G. (2008) Developing Ridge Parameters for SUR Model. Communications in Statistics: Theory and Methods, 37, 544-564.
https://doi.org/10.1080/03610920701469152
El-Dereny, M. and Rashwan, N.I. (2011) Solving Multicollinearity Problem Using Ridge Regression Models. International Journal of Contemporary Mathematical Sciences, 6, 585-600.
Mansson, K., Shukur, G. and Golam Kibria, B.M. (2010) A Simulation Study of Some Ridge Regression Estimators under Different Distributional Assumptions. Communications in Statistics: Simulation and Computation, 39, 1639-1670.
https://doi.org/10.1080/03610918.2010.508862
Lauridsen, J. and Mur, J. (2006) Multicollinearity in Cross-Sectional Regressions. Journal of Geographical Systems, 8, 317-333.
https://doi.org/10.1007/s10109-006-0031-z
Chopra, P., Sharma, R.K. and Kumar, M. (2013) Ridge Regression for the Prediction of Compressive Strength of Concrete. International Journal of Innovations in Engineering and Technology (IJIET), 2, 106-111.
Zaka, A. and Akhter, A.S. (2013) Methods for Estimating the Parameters of the Power Function Distribution. Pakistan Journal of Statistics and Operation Research, 9, 213-224. https://doi.org/10.18187/pjsor.v9i2.488
Farrar, D.E. and Glauber, R.R. (1967) Multicollinearity in Regression Analysis: The Problem Revisited. The Review of Economics and Statistics, 49, 92-107.
https://doi.org/10.2307/1937887
Abdulhafedh, A. (2016) Crash Severity Modeling in Transportation Systems. PhD Dissertation. University of Missouri-Columbia, MO, USA.
https://doi.org/10.32469/10355/59817
Frank, I.E. and Friedman, J.H. (1993) A Statistical View of Some Chemometrics Regression Tools. Technometrics, 35, 109-135.
https://doi.org/10.1080/00401706.1993.10485033
Abdulhafedh, A. (2017) Road Traffic Crash Data: An Overview on Sources, Problems, and Collection Methods. Journal of Transportation Technologies, 7, 206-219.
https://doi.org/10.4236/jtts.2017.72015
Fu, W.J. (1998) Penalized Regressions: The Bridge versus the Lasso. Journal of Computational and Graphical Statistics, 7, 397-416.
https://doi.org/10.1080/10618600.1998.10474784
Duzan, H. and Shariff, N.S. M. (2015) Ridge Regression for Solving the Multicollinearity Problem: Review of Methods and Models. Journal of Applied Sciences, 15, 392-404. https://doi.org/10.3923/jas.2015.392.404
Abdulhafedh, A. (2022) Incorporating Multiple Linear Regression in Predicting the House Prices Using a Big Real Estate Dataset with 80 Independent Variables. Open Access Library Journal, 9, e8346. https://doi.org/10.4236/oalib.1108346
Khalaf, G. (2012) A Proposed Ridge Parameter to Improve the Least Square Estimator. Journal of Modern Applied Statistical Methods, 11, Article 15.
https://doi.org/10.22237/jmasm/1351743240
Singh, R. (2012) Solution of Multicollinearity by Ridge Regression. International Journal of Research in Computer Application & Management, 2, 130-136.
Abdulhafedh, A. (2017) Incorporating the Multinomial Logistic Regression in Vehicle Crash Severity Modeling: A Detailed Overview. Journal of Transportation Technologies, 7, 279-303. https://doi.org/10.4236/jtts.2017.73019
Gorman, J.W. and Toman, R.J. (1966) Selection of Variables for Fitting Equations to Data. Technometrics, 8, 27-51. https://doi.org/10.1080/00401706.1966.10490322
Heinze, G. and Schemper, M. (2002) A Solution to the Problem of Separation in Logistic Regression. Statistics in Medicine, 21, 2409-2419.
https://doi.org/10.1002/sim.1047
Goldstein, M. and Smith, A.F.M. (1974) Ridge-Type Estimators for Regression Analysis. Journal of the Royal Statistical Society: Series B (Methodological), 36, 284-291. https://doi.org/10.1111/j.2517-6161.1974.tb01006.x
Golub, G.H., Heath, M. and Wahba, G. (1979) Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter. Technometrics, 21, 215-223.
https://doi.org/10.1080/00401706.1979.10489751
Abdulhafedh, A. (2017) A Novel Hybrid Method for Measuring the Spatial Autocorrelation of Vehicular Crashes: Combining Moran’s Index and Getis-Ord Statistic. Open Journal of Civil Engineering, 7, 208-221.
https://doi.org/10.4236/ojce.2017.72013
Abdulhafedh, A. (2017) Identifying Vehicular Crash High Risk Locations along Highways via Spatial Autocorrelation Indices and Kernel Density Estimation. World Journal of Engineering and Technology, 5, 198-215.
https://doi.org/10.4236/wjet.2017.52016
Khalaf, G. and Shukur, G. (2005) Choosing Ridge Parameter for Regression Problems. Communications in Statistics: Theory and Methods, 34, 1177-1182.
https://doi.org/10.1081/STA-200056836