全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Unveiling the Predictive Capabilities of Machine Learning in Air Quality Data Analysis: A Comparative Evaluation of Different Regression Models

DOI: 10.4236/ojap.2023.124009, PP. 142-159

Keywords: Regression Analysis, Air Quality Index, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Logistic Regression, K-Nearest Neighbors, Machine Learning, Big Data Analysis

Full-Text   Cite this paper   Add to My Lib

Abstract:

Air quality is a critical concern for public health and environmental regulation. The Air Quality Index (AQI), a widely adopted index by the US Environmental Protection Agency (EPA), serves as a crucial metric for reporting site-specific air pollution levels. Accurately predicting air quality, as measured by the AQI, is essential for effective air pollution management. In this study, we aim to identify the most reliable regression model among linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four different regression analyses using a machine learning approach to determine the model with the best performance. By employing the confusion matrix and error percentages, we selected the best-performing model, which yielded prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA, logistic regression, and KNN models. The logistic regression model outperformed the other three statistical models in predicting AQI. Understanding these models' performance can help address an existing gap in air quality research and contribute to the integration of regression techniques in AQI studies, ultimately benefiting stakeholders like environmental regulators, healthcare professionals, urban planners, and researchers.

References

[1]  Cromar, K.R., Ghazipura, M., Gladson, L.A. and Perlmutt, L. (2020) Evaluating the US Air Quality Index as a Risk Communication Tool: Comparing Associations of Index Values with Respiratory Morbidity among Adults in California. PLOS ONE, 15, e0242031.
https://doi.org/10.1371/journal.pone.0242031
[2]  Declet-Barreto, J., Goldman, G.T., Desikan, A., Berman, E., Goldman, J., Johnson, C., Rosenberg, A.A., et al. (2020) Hazardous Air Pollutant Emissions Implications Under 2018 Guidance on US Clean Air Act Requirements for Major Sources. Journal of the Air & Waste Management Association, 70, 481-490.
https://doi.org/10.1080/10962247.2020.1735575
[3]  United States Environmental Protection Agency (2018) Technical Assistance Document for the Reporting of Daily Air Quality—The Air Quality Index (AQI). Office of Air Quality Planning and Standards Air Quality Assessment Division Research Triangle Park, NC.
https://www.airnow.gov/sites/default/files/2020-05/aqi-technical-assistance-document-sept2018.pdf
[4]  Lumb, A., Sharma, T.C., Bibeault, J.F. and Klawunn, P. (2012) A Comparative Study of USA and Canadian Water Quality Index Models. Water Quality, Exposure and Health, 3, 203-216.
https://doi.org/10.1007/s12403-011-0056-5
[5]  AirNow (2018) Basic Information on the AQI in English.
https://www.airnow.gov/aqi/aqi-basics/
[6]  Cairncross, E.K., John, J. and Zunckel, M. (2007) A Novel Air Pollution Index Based on the Relative Risk of Daily Mortality Associated with Short-Term Exposure to Common Air Pollutants. Atmospheric Environment, 41, 8442-8454.
https://doi.org/10.1016/j.atmosenv.2007.07.003
[7]  Kyrkilis, G., Chaloulakou, A. and Kassomenos, P.A. (2007) Development of an Aggregate Air Quality Index for an Urban Mediterranean Agglomeration: Relation to Potential Health Effects. Environment International, 33, 670-676.
https://doi.org/10.1016/j.envint.2007.01.010
[8]  Hu, J., Ying, Q., Wang, Y. and Zhang, H. (2015) Characterizing Multi-Pollutant Air Pollution in China: Comparison of Three Air Quality Indices. Environment International, 84, 17-25.
https://doi.org/10.1016/j.envint.2015.06.014
[9]  Leung, D.M., Tai, A.P., Mickley, L.J., Moch, J.M., van Donkelaar, A., Shen, L. and Martin, R.V. (2018) Synoptic Meteorological Modes of Variability for Fine Particulate Matter (PM2.5) Air Quality in Major Metropolitan Regions of China. Atmospheric Chemistry and Physics, 18, 6733-6748.
https://doi.org/10.5194/acp-18-6733-2018
[10]  Gong, Z.Z. and Zhang, X.P. (2017) Assessment of Urban Air Pollution and Spatial Spillover Effects in China: Cases of 113 Key Environmental Protection Cities. Journal of Resources and Ecology, 8, 584-594.
https://doi.org/10.5814/j.issn.1674-764x.2017.06.004
[11]  Alvarez-Guerra, M., Ballabio, D., Amigo, J.M., Viguri, J.R. and Bro, R. (2010) A Chemometric Approach to the Environmental Problem of Predicting Toxicity in Contaminated Sediments. Journal of Chemometrics, 24, 379-386.
https://doi.org/10.1002/cem.1264
[12]  Srivastava, C., Singh, S. and Singh, A.P. (2018) Estimation of Air Pollution in Delhi Using Machine Learning Techniques. 2018 International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, 28-29 September 2018, 304-309.
https://doi.org/10.1109/GUCON.2018.8675022
[13]  Dragomir, E.G. (2010) Air Quality Index Prediction Using K-Nearest Neighbor Technique. Bulletin of PG University of Ploiesti, Series Mathematics, Informatics, Physics, LXII, 103-108.
[14]  United States Environmental Protection Agency (2021) Air Data: Air Quality Data Collected at Outdoor Monitors Across the US.
https://www.epa.gov/outdoor-air-quality-data
[15]  Choi, B.G., Rha, S.W., Kim, S.W., Kang, J.H., Park, J.Y. and Noh, Y.K. (2019) Machine Learning for the Prediction of New-Onset Diabetes Mellitus during 5-Year Follow-Up in Non-Diabetic Patients with Cardiovascular Risks. Yonsei Medical Journal, 60, 191-199.
https://doi.org/10.3349/ymj.2019.60.2.191
[16]  Witten, D. and James, G. (2013) An Introduction to Statistical Learning: With Applications in R. Springer, New York.
[17]  Huberty, C.J. and Olejnik, S. (2006) Applied MANOVA and Discriminant Analysis. John Wiley & Sons, New York.
https://doi.org/10.1002/047178947X
[18]  Rencher, A.C. and Schimek, M.G. (1997) Methods of Multivariate Analysis. Computational Statistics, 12, 422.
[19]  Fix, E. and Hodges, J.L. (1951) Discriminatory Analysis, Non-Parametric Discrimination. International Statistical Review, 57, 238-247.
https://doi.org/10.1037/e471672008-001
[20]  Ziegel, E.R. (2001) Multivariate Data Reduction and Discrimination with SAS Software. Technometrics, 43, 248-249.
https://doi.org/10.1198/tech.2001.s616
[21]  Ripley, B., Venables, B., Bates, D.M., Hornik, K., Gebhardt, A., Firth, D. and Ripley, M.B. (2013) Package ‘Mass’. Cran R, 538, 113-120.
[22]  James, G., Witten, D., Hastie, T. and Tibshirani, R. (2017) Data for an Introduction to Statistical Learning with Applications in R. Package ‘ISLR’. CRAN.
[23]  NASA Ozone Watch (2023) Images, Data, and Information for Atmospheric Ozone.
https://ozonewatch.gsfc.nasa.gov/facts/dobson_SH.html#:~:text=The%20average%20amount%
20of%20ozone,of%20about%20100%20Dobson%20Units
[24]  RStudio Team (2020) RStudio: Integrated Development for R. Boston.
http://www.rstudio.com/
[25]  Thach, T.Q., Tsang, H., Cao, P. and Ho, L.M. (2018) A Novel Method to Construct an Air Quality Index Based on Air Pollution Profiles. International Journal of Hygiene and Environmental Health, 221, 17-26.
https://doi.org/10.1016/j.ijheh.2017.09.012
[26]  Li, X., Peng, L., Hu, Y., Shao, J. and Chi, T. (2016) Deep Learning Architecture for Air Quality Predictions. Environmental Science and Pollution Research, 23, 22408-22417.
https://doi.org/10.1007/s11356-016-7812-9

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

WeChat 1538708413