Air quality is a critical concern for public health
and environmental regulation. The Air Quality Index (AQI), a widely adopted
index by the US Environmental Protection Agency (EPA), serves as a crucial metric
for reporting site-specific air pollution levels. Accurately predicting air
quality, as measured by the AQI, is essential for effective air pollution
management. In this study, we aim to identify the most reliable regression
model among linear discriminant analysis (LDA), quadratic discriminant analysis
(QDA), logistic regression, and K-nearest neighbors (KNN). We conducted four
different regression analyses using a machine learning approach to determine
the model with the best performance. By employing the confusion matrix and
error percentages, we selected the best-performing model, which yielded
prediction error rates of 22%, 23%, 20%, and 27%, respectively, for LDA, QDA,
logistic regression, and KNN models. The logistic regression model outperformed
the other three statistical models in predicting AQI. Understanding these
models' performance can help address an existing gap in air quality research
and contribute to the integration of regression techniques in AQI studies,
ultimately benefiting stakeholders like environmental regulators, healthcare
professionals, urban planners, and researchers.
References
[1]
Cromar, K.R., Ghazipura, M., Gladson, L.A. and Perlmutt, L. (2020) Evaluating the US Air Quality Index as a Risk Communication Tool: Comparing Associations of Index Values with Respiratory Morbidity among Adults in California. PLOS ONE, 15, e0242031. https://doi.org/10.1371/journal.pone.0242031
[2]
Declet-Barreto, J., Goldman, G.T., Desikan, A., Berman, E., Goldman, J., Johnson, C., Rosenberg, A.A., et al. (2020) Hazardous Air Pollutant Emissions Implications Under 2018 Guidance on US Clean Air Act Requirements for Major Sources. Journal of the Air & Waste Management Association, 70, 481-490. https://doi.org/10.1080/10962247.2020.1735575
[3]
United States Environmental Protection Agency (2018) Technical Assistance Document for the Reporting of Daily Air Quality—The Air Quality Index (AQI). Office of Air Quality Planning and Standards Air Quality Assessment Division Research Triangle Park, NC. https://www.airnow.gov/sites/default/files/2020-05/aqi-technical-assistance-document-sept2018.pdf
[4]
Lumb, A., Sharma, T.C., Bibeault, J.F. and Klawunn, P. (2012) A Comparative Study of USA and Canadian Water Quality Index Models. Water Quality, Exposure and Health, 3, 203-216. https://doi.org/10.1007/s12403-011-0056-5
[5]
AirNow (2018) Basic Information on the AQI in English. https://www.airnow.gov/aqi/aqi-basics/
[6]
Cairncross, E.K., John, J. and Zunckel, M. (2007) A Novel Air Pollution Index Based on the Relative Risk of Daily Mortality Associated with Short-Term Exposure to Common Air Pollutants. Atmospheric Environment, 41, 8442-8454. https://doi.org/10.1016/j.atmosenv.2007.07.003
[7]
Kyrkilis, G., Chaloulakou, A. and Kassomenos, P.A. (2007) Development of an Aggregate Air Quality Index for an Urban Mediterranean Agglomeration: Relation to Potential Health Effects. Environment International, 33, 670-676. https://doi.org/10.1016/j.envint.2007.01.010
[8]
Hu, J., Ying, Q., Wang, Y. and Zhang, H. (2015) Characterizing Multi-Pollutant Air Pollution in China: Comparison of Three Air Quality Indices. Environment International, 84, 17-25. https://doi.org/10.1016/j.envint.2015.06.014
[9]
Leung, D.M., Tai, A.P., Mickley, L.J., Moch, J.M., van Donkelaar, A., Shen, L. and Martin, R.V. (2018) Synoptic Meteorological Modes of Variability for Fine Particulate Matter (PM2.5) Air Quality in Major Metropolitan Regions of China. Atmospheric Chemistry and Physics, 18, 6733-6748. https://doi.org/10.5194/acp-18-6733-2018
[10]
Gong, Z.Z. and Zhang, X.P. (2017) Assessment of Urban Air Pollution and Spatial Spillover Effects in China: Cases of 113 Key Environmental Protection Cities. Journal of Resources and Ecology, 8, 584-594. https://doi.org/10.5814/j.issn.1674-764x.2017.06.004
[11]
Alvarez-Guerra, M., Ballabio, D., Amigo, J.M., Viguri, J.R. and Bro, R. (2010) A Chemometric Approach to the Environmental Problem of Predicting Toxicity in Contaminated Sediments. Journal of Chemometrics, 24, 379-386. https://doi.org/10.1002/cem.1264
[12]
Srivastava, C., Singh, S. and Singh, A.P. (2018) Estimation of Air Pollution in Delhi Using Machine Learning Techniques. 2018 International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, 28-29 September 2018, 304-309. https://doi.org/10.1109/GUCON.2018.8675022
[13]
Dragomir, E.G. (2010) Air Quality Index Prediction Using K-Nearest Neighbor Technique. Bulletin of PG University of Ploiesti, Series Mathematics, Informatics, Physics, LXII, 103-108.
[14]
United States Environmental Protection Agency (2021) Air Data: Air Quality Data Collected at Outdoor Monitors Across the US. https://www.epa.gov/outdoor-air-quality-data
[15]
Choi, B.G., Rha, S.W., Kim, S.W., Kang, J.H., Park, J.Y. and Noh, Y.K. (2019) Machine Learning for the Prediction of New-Onset Diabetes Mellitus during 5-Year Follow-Up in Non-Diabetic Patients with Cardiovascular Risks. Yonsei Medical Journal, 60, 191-199. https://doi.org/10.3349/ymj.2019.60.2.191
[16]
Witten, D. and James, G. (2013) An Introduction to Statistical Learning: With Applications in R. Springer, New York.
[17]
Huberty, C.J. and Olejnik, S. (2006) Applied MANOVA and Discriminant Analysis. John Wiley & Sons, New York. https://doi.org/10.1002/047178947X
[18]
Rencher, A.C. and Schimek, M.G. (1997) Methods of Multivariate Analysis. Computational Statistics, 12, 422.
[19]
Fix, E. and Hodges, J.L. (1951) Discriminatory Analysis, Non-Parametric Discrimination. International Statistical Review, 57, 238-247. https://doi.org/10.1037/e471672008-001
[20]
Ziegel, E.R. (2001) Multivariate Data Reduction and Discrimination with SAS Software. Technometrics, 43, 248-249. https://doi.org/10.1198/tech.2001.s616
[21]
Ripley, B., Venables, B., Bates, D.M., Hornik, K., Gebhardt, A., Firth, D. and Ripley, M.B. (2013) Package ‘Mass’. Cran R, 538, 113-120.
[22]
James, G., Witten, D., Hastie, T. and Tibshirani, R. (2017) Data for an Introduction to Statistical Learning with Applications in R. Package ‘ISLR’. CRAN.
[23]
NASA Ozone Watch (2023) Images, Data, and Information for Atmospheric Ozone. https://ozonewatch.gsfc.nasa.gov/facts/dobson_SH.html#:~:text=The%20average%20amount% 20of%20ozone,of%20about%20100%20Dobson%20Units
[24]
RStudio Team (2020) RStudio: Integrated Development for R. Boston. http://www.rstudio.com/
[25]
Thach, T.Q., Tsang, H., Cao, P. and Ho, L.M. (2018) A Novel Method to Construct an Air Quality Index Based on Air Pollution Profiles. International Journal of Hygiene and Environmental Health, 221, 17-26. https://doi.org/10.1016/j.ijheh.2017.09.012
[26]
Li, X., Peng, L., Hu, Y., Shao, J. and Chi, T. (2016) Deep Learning Architecture for Air Quality Predictions. Environmental Science and Pollution Research, 23, 22408-22417. https://doi.org/10.1007/s11356-016-7812-9