全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Spatial Heterogeneity Modeling Using Machine Learning Based on a Hybrid of Random Forest and Convolutional Neural Network (CNN)

DOI: 10.4236/jdaip.2024.123018, PP. 319-347

Keywords: Spatial Heterogeneity, Spatial Data, Feature Selection, Standardization, Machine Learning Models, Hybrid Models

Full-Text   Cite this paper   Add to My Lib

Abstract:

Spatial heterogeneity refers to the variation or differences in characteristics or features across different locations or areas in space. Spatial data refers to information that explicitly or indirectly belongs to a particular geographic region or location, also known as geo-spatial data or geographic information. Focusing on spatial heterogeneity, we present a hybrid machine learning model combining two competitive algorithms: the Random Forest Regressor and CNN. The model is fine-tuned using cross validation for hyper-parameter adjustment and performance evaluation, ensuring robustness and generalization. Our approach integrates Global Moran’s I for examining global autocorrelation, and local Moran’s I for assessing local spatial autocorrelation in the residuals. To validate our approach, we implemented the hybrid model on a real-world dataset and compared its performance with that of the traditional machine learning models. Results indicate superior performance with an R-squared of 0.90, outperforming RF 0.84 and CNN 0.74. This study contributed to a detailed understanding of spatial variations in data considering the geographical information (Longitude & Latitude) present in the dataset. Our results, also assessed using the Root Mean Squared Error (RMSE), indicated that the hybrid yielded lower errors, showing a deviation of 53.65% from the RF model and 63.24% from the CNN model. Additionally, the global Moran’s I index was observed to be 0.10. This study underscores that the hybrid was able to predict correctly the house prices both in clusters and in dispersed areas.

References

[1]  Goodchild, M.F. (2013) The Quality of Big (Geo) Data. Dialogues in Human Geography, 3, 280-284.
https://doi.org/10.1177/2043820613513392
[2]  Gaspard, G., Kim, D. and Chun, Y. (2019) Residual Spatial Autocorrelation in Macroecological and Biogeographical Modeling: A Review. Journal of Ecology and Environment, 43, Article No. 19.
https://doi.org/10.1186/s41610-019-0118-3
[3]  Shekhar, S., Zhang, P. and Huang, Y. (2010) Spatial Data Mining. In: Maimon, O. and Rokach, L., Eds., Data Mining and Knowledge Discovery Handbook, Springer, Berlin, 837-854.
https://doi.org/10.1007/978-0-387-09823-4_43
[4]  Dutilleul, P. and Legendre, P. (1993) Spatial Heterogeneity against Heteroscedasticity: An Ecological Paradigm versus a Statistical Concept. Oikos, 66, 152-171.
https://doi.org/10.2307/3545210
[5]  Brenning, A. (2005) Spatial Prediction Models for Landslide Hazards: Review, Comparison and Evaluation. Natural Hazards and Earth System Sciences, 5, 853-862.
https://doi.org/10.5194/nhess-5-853-2005
[6]  Aguilar, R., Zurita-Milla, R., Izquierdo-Verdiguier, E. and De By, R.A. (2018) A Cloud-Based Multi-Temporal Ensemble Classifier to Map Smallholder Farming Systems. Remote Sensing, 10, Article No. 729.
https://doi.org/10.3390/rs10050729
[7]  Pradhan, A.M.S. and Kim, Y.-T. (2020) Rainfall-Induced Shallow Landslide Susceptibility Mapping at Two Adjacent Catchments Using Advanced Machine Learning Algorithms. ISPRS International Journal of Geo-Information, 9, Article No. 569.
https://doi.org/10.3390/ijgi9100569
[8]  Zurita-Milla, R., Goncalves, R., Izquierdo-Verdiguier, E. and Ostermann, F.O. (2019) Exploring Spring Onset at Continental Scales: Mapping Phenoregions and Correlating Temperature and Satellite-Based Phenometrics. IEEE Transactions on Big Data, 6, 583-593.
https://doi.org/10.1109/TBDATA.2019.2926292
[9]  Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N. and Prabhat, F. (2019) Deep Learning and Process Understanding for Data-Driven Earth System Science. Nature, 566, 195-204.
https://doi.org/10.1038/s41586-019-0912-1
[10]  Shekhar, S., Jiang, Z., Ali, R.Y., Eftelioglu, E., Tang, X., Gunturi, V.M.V. and Zhou, X. (2015) Spatiotemporal Data Mining: A Computational Perspective. ISPRS International Journal of Geo-Information, 4, 2306-2338.
https://doi.org/10.3390/ijgi4042306
[11]  Nwaila, G.T., Zhang, S.E., Bourdeau, J.E., Frimmel, H.E. and Ghorbani, Y. (2024) Spatial Interpolation Using Machine Learning: from Patterns and Regularities to Block Models. Natural Resources Research, 33, 129-161.
https://doi.org/10.1007/s11053-023-10280-7
[12]  Wang, Z., Shi, W.J., Zhou, W., Li, X.Y. and Yue, T.X. (2020) Comparison of Additive and Isometric Log-Ratio Transformations Combined with Machine Learning and Regression Kriging Models for Mapping Soil Particle Size Fractions. Geoderma, 365, Article ID: 114214.
https://doi.org/10.1016/j.geoderma.2020.114214
[13]  Pereira, G.W., et al. (2022) Smart-Map: An Open-Source QGIS Plugin for Digital Mapping Using Machine Learning Techniques and Ordinary Kriging. Agronomy, 12, Article No. 1350.
https://doi.org/10.3390/agronomy12061350
[14]  Hengl, T., Nussbaum, M., Wright, M.N., Heuvelink, G.B.M. and Graler, B. (2018) Random Forest as a Generic Framework for Predictive Modeling of Spatial and Spatio-Temporal Variables. PeerJ, 6, e5518.
https://doi.org/10.7717/peerj.5518
[15]  Behrens, T., Rossel, R.A.V., Kerry, R, MacMillan, R., Schmidt, K., Lee, J., Scholten, T. and Zhu, A.-X. (2019) The Relevant Range of Scales for Multi-Scale Contextual Spatial Modelling. Scientific Reports, 9, Article No. 14800.
https://doi.org/10.1038/s41598-019-51395-3
[16]  Georganos, S., Grippa, T., Gadiaga, A.N., Linard, C., Lennert, M., Vanhuysse, S., Mboga, N., Wolff, E. and Kalogirou, S. (2021) Geographical Random Forests: A Spatial Extension of the Random Forest Algorithm to Address Spatial Heterogeneity in Remote Sensing and Population Modelling. Geocarto International, 36, 121-136.
https://doi.org/10.1080/10106049.2019.1595177
[17]  Meyer, H., Reudenbach, C., Wollauer, S. and Nauss, T. (2019) Importance of Spatial Predictor Variable Selection in Machine Learning Applications-Moving from Data Reproduction to Spatial Prediction. Ecological Modelling, 411, Article ID: 108815.
https://doi.org/10.1016/j.ecolmodel.2019.108815
[18]  Behrens, T. and Rossel, R.A.V. (2020) On the Interpretability of Predictors in Spatial Data Science: The Information Horizon. Scientific Reports, 10, Article No. 16737.
https://doi.org/10.1038/s41598-020-73773-y
[19]  Chen, L., Ren, C.Y., Li, L., Wang, Y.Q., Zhang, B., Wang, Z.M. and Li, L.F. (2019) A Comparative Assessment of Geostatistical, Machine Learning, and Hybrid Approaches for Mapping Topsoil Organic Carbon Content. ISPRS International Journal of Geo-Information, 8, Article No. 174.
https://doi.org/10.3390/ijgi8040174
[20]  Behrens, T., Schmidt, K., Rossel, R.A.V., Gries, P., Scholten, T. and MacMillan, R.A. (2018) Spatial Modelling with Euclidean Distance Fields and Machine Learning. European Journal of Soil Science, 69, 757-770.
https://doi.org/10.1111/ejss.12687
[21]  Quinones, S., Goyal, A. and Ahmed, Z.U. (2021) Geographically Weighted Machine Learning Model for Untangling Spatial Heterogeneity of Type 2 Diabetes Mellitus (T2D) Prevalence in the USA. Scientific Reports, 11, Article No. 6955.
https://doi.org/10.1038/s41598-021-85381-5
[22]  Liu, X.J., Kounadi, O. and Zurita-Milla, R. (2022) Incorporating Spatial Autocorrelation in Machine Learning Models Using Spatial Lag and Eigenvector Spatial Filtering Features. ISPRS International Journal of Geo-Information, 11, Article No. 242.
https://doi.org/10.3390/ijgi11040242
[23]  Khaki, S., Wang, L.Z. and Archontoulis, S.V. (2020) A CNN-RNN Framework for Crop Yield Prediction. Frontiers in Plant Science, 10, Article ID: 492736.
https://doi.org/10.3389/fpls.2019.01750
[24]  Yu, W.T., Li, J., Liu, Q.H., Zhao, J., Dong, Y.D., Wang, C., Lin, S.R., Zhu, X.R. and Zhang, H. (2021) Spatial-Temporal Prediction of Vegetation Index with Deep Recurrent Neural Networks. IEEE Geoscience and Remote Sensing Letters, 19, 1-5.
https://doi.org/10.1109/LGRS.2021.3064814
[25]  Xu, L., Cai, R.N., Yu, H.C., Du, W.Y., Chen, Z.Q. and Chen, N.C. (2024) Monthly NDVI Prediction Using Spatial Autocorrelation and Nonlocal Attention Networks. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17, 3425-3437.
https://doi.org/10.1109/JSTARS.2024.3350053
[26]  Deng, M., Yang, W.T. and Liu, Q.L. (2017) Geographically Weighted Extreme Learning Machine: A Method for Space-Time Prediction. Geographical Analysis, 49, 433-450.
https://doi.org/10.1111/gean.12127
[27]  Deng, M., Yang, W.T., Liu, Q.L., Jin, R., Xu, F. and Zhang, Y.F. (2018) Heterogeneous Space-Time Artificial Neural Networks for Space-Time Series Prediction. Transactions in GIS, 22, 183-201.
https://doi.org/10.1111/tgis.12302
[28]  Wang, Y.M., Feng, L.W., Li, S.J., Ren, F. and Du, Q.Y. (2020) A Hybrid Model Considering Spatial Heterogeneity for Landslide Susceptibility Mapping in Zhejiang Province, China. Catena, 188, Article ID: 104425.
https://doi.org/10.1016/j.catena.2019.104425
[29]  Almulihi, A., Saleh, H., Hussien, A.M., Mostafa, S., El-Sappagh, S., Alnowaiser, K., et al. (2022) Ensemble Learning Based on Hybrid Deep Learning Model for Heart Disease Early Prediction. Diagnostics, 12, Article No. 3215.
https://doi.org/10.3390/diagnostics12123215
[30]  Zeng, H.R., Zhang, B. and Wang, H.J. (2023) A Hybrid Modeling Approach Considering Spatial Heterogeneity and Nonlinearity to Discover the Transition Rules of Urban Cellular Automata Models. Environment and Planning B: Urban Analytics and City Science, 50, 1898-1915.
https://doi.org/10.1177/23998083221149018
[31]  Zhao, Z.X., Wu, J.R., Cai, F.J., Zhang, S.T. and Wang, Y.-G. (2023) A Hybrid Deep Learning Framework for Air Quality Prediction with Spatial Autocorrelation during the COVID-19 Pandemic. Scientific Reports, 13, Article No. 1015.
https://doi.org/10.1038/s41598-023-28287-8
[32]  Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M. and Chica-Rivas, M. (2015) Machine Learning Predictive Models for Mineral Prospectivity: An Evaluation of Neural Networks, Random Forest, Regression Trees and Support Vector Machines. Ore Geology Reviews, 71, 804-818.
https://doi.org/10.1016/j.oregeorev.2015.01.001
[33]  Li, J., Heap, A.D., Potter, A. and Daniell, J.J. (2011) Application of Machine Learning Methods to Spatial Interpolation of Environmental Variables. Environmental Modelling & Software, 26, 1647-1659.
https://doi.org/10.1016/j.envsoft.2011.07.004
[34]  Lee, H., Kim, J., Jung, S., Kim, M. and Kim, S. (2019) Case Dependent Feature Selection Using Mean Decrease Accuracy for Convective Storm Identification. 2019 IEEE International Conference on Fuzzy Theory and Its Applications (IFUZZY), New Taipei, 7-10 November 2019, 1-4.
[35]  Zhu, Y.T., Brettin, T., Xia, F.F., Partin, A., Shukla, M., Yoo, H., Evrard, Y.A., Doroshow, J.H. and Stevens, R. (2021) Converting Tabular Data into Images for Deep Learning with Convolutional Neural Networks. Scientific Reports, 11, Article No. 11325.
https://doi.org/10.1038/s41598-021-90923-y
[36]  Liu, X., Wang, X.G. and Matwin, S. (2018) Improving the Interpretability of Deep Neural Networks with Knowledge Distillation. 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore, 17-20 November 2018, 905-912.
https://doi.org/10.1109/ICDMW.2018.00132
[37]  Kavitha, M., Gnaneswar, G., Dinesh, R., Rohith Sai, Y. and Sai Suraj, R. (2021) Heart Disease Prediction Using Hybrid Machine Learning Model. 2021 IEEE 6th International Conference on Inventive Computation Technologies (ICICT), Coimbatore, 20-22 January 2021, 1329-1333.
https://doi.org/10.1109/ICICT50816.2021.9358597
[38]  Taufiqurrahman, A., Putrada, A.G. and Dawani, F. (2020) Decision Tree Regression with Adaboost Ensemble Learning for Water Temperature Forecasting in Aquaponic Ecosystem. 2020 IEEE 6th International Conference on Interactive Digital Media (ICIDM), 14-15 December 2020, 1-5.
https://doi.org/10.1109/ICIDM51048.2020.9339669
[39]  Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Tibshirani, R. and Friedman, J. (2009) Overview of Supervised Learning. In: Hastie, T., Tibshirani, R. and Friedman, J., Eds., The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, Berlin, 9-41.
[40]  Nti, I.K., Nyarko-Boateng, O., Aning, J., et al. (2021) Performance of Machine Learning Algorithms with Different K Values in K-Fold Cross-Validation. International Journal of Information Technology and Computer Science, 13, 61-71.
https://doi.org/10.5815/ijitcs.2021.06.05
[41]  Chen, Y.G. (2013) New Approaches for Calculating Moran’s Index of Spatial Autocorrelation. PLOS ONE, 8, e68336.
https://doi.org/10.1371/journal.pone.0068336
[42]  Nguyen, K.T., Nguyen, Q.D., Le, T.A., Shin, J. and Lee, K. (2020) Analyzing the Compressive Strength of Green Fly Ash Based Geopolymer Concrete Using Experiment and Machine Learning Approaches. Construction and Building Materials, 247, Article ID: 118581.
https://doi.org/10.1016/j.conbuildmat.2020.118581
[43]  Kobayashi, K. and Us Salam, M. (2000) Comparing Simulated and Measured Values Using Mean Squared Deviation and Its Components. Agronomy Journal, 92, 345-352.
https://doi.org/10.2134/agronj2000.922345x
[44]  Andreas, A., Mavromoustakis, C.X., Mastorakis, G. Mumtaz, S., Batalla, J.M. and Pallis, E. (2020) Modified Machine Learning Technique for Curve Fitting on Regression Models for COVID-19 Projections. 2020 IEEE 25th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), 14-16 September 2020, 1-6.
https://doi.org/10.1109/CAMAD50429.2020.9209264
[45]  Zhang, B.Z., Duan, M., Sun, Y.F., Lyu, Y.T., Hou, Y.L. and Tan, T. (2023) Air Quality Index Prediction in Six Major Chinese Urban Agglomerations: A Comparative Study of Single Machine Learning Model, Ensemble Model, and Hybrid Model. Atmosphere, 14, Article No. 1478.
https://doi.org/10.3390/atmos14101478
[46]  Barry, M.H., Nderu, L. and Gichuhi, A.W. (2023) A Hybrid Spatial Dependence Model Based on Radial Basis Function Neural Networks (RBFNN) and Random Forest (RF). Journal of Data Analysis and Information Processing, 11, 293-309.
https://doi.org/10.4236/jdaip.2023.113015
[47]  Sun, Y.M., Ao, Z.Q., Jia, W.W., Xu, K., et al. (2021) A Geographically Weighted Deep Neural Network Model for Research on the Spatial Distribution of the Down Dead Wood Volume in Liangshui National Nature Reserve (China). IForest-Biogeosciences and Forestry, 14, 353-361.
https://doi.org/10.3832/ifor3705-014

Full-Text

comments powered by Disqus

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133