全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721




Spatial Heterogeneity Modeling Using Machine Learning Based on a Hybrid of Random Forest and Convolutional Neural Network (CNN)

DOI: 10.4236/jdaip.2024.123018, PP. 319-347

Keywords: Spatial Heterogeneity, Spatial Data, Feature Selection, Standardization, Machine Learning Models, Hybrid Models

Full-Text   Cite this paper   Add to My Lib


Spatial heterogeneity refers to the variation or differences in characteristics or features across different locations or areas in space. Spatial data refers to information that explicitly or indirectly belongs to a particular geographic region or location, also known as geo-spatial data or geographic information. Focusing on spatial heterogeneity, we present a hybrid machine learning model combining two competitive algorithms: the Random Forest Regressor and CNN. The model is fine-tuned using cross validation for hyper-parameter adjustment and performance evaluation, ensuring robustness and generalization. Our approach integrates Global Moran’s I for examining global autocorrelation, and local Moran’s I for assessing local spatial autocorrelation in the residuals. To validate our approach, we implemented the hybrid model on a real-world dataset and compared its performance with that of the traditional machine learning models. Results indicate superior performance with an R-squared of 0.90, outperforming RF 0.84 and CNN 0.74. This study contributed to a detailed understanding of spatial variations in data considering the geographical information (Longitude & Latitude) present in the dataset. Our results, also assessed using the Root Mean Squared Error (RMSE), indicated that the hybrid yielded lower errors, showing a deviation of 53.65% from the RF model and 63.24% from the CNN model. Additionally, the global Moran’s I index was observed to be 0.10. This study underscores that the hybrid was able to predict correctly the house prices both in clusters and in dispersed areas.


[1]  Goodchild, M.F. (2013) The Quality of Big (Geo) Data. Dialogues in Human Geography, 3, 280-284.
[2]  Gaspard, G., Kim, D. and Chun, Y. (2019) Residual Spatial Autocorrelation in Macroecological and Biogeographical Modeling: A Review. Journal of Ecology and Environment, 43, Article No. 19.
[3]  Shekhar, S., Zhang, P. and Huang, Y. (2010) Spatial Data Mining. In: Maimon, O. and Rokach, L., Eds., Data Mining and Knowledge Discovery Handbook, Springer, Berlin, 837-854.
[4]  Dutilleul, P. and Legendre, P. (1993) Spatial Heterogeneity against Heteroscedasticity: An Ecological Paradigm versus a Statistical Concept. Oikos, 66, 152-171.
[5]  Brenning, A. (2005) Spatial Prediction Models for Landslide Hazards: Review, Comparison and Evaluation. Natural Hazards and Earth System Sciences, 5, 853-862.
[6]  Aguilar, R., Zurita-Milla, R., Izquierdo-Verdiguier, E. and De By, R.A. (2018) A Cloud-Based Multi-Temporal Ensemble Classifier to Map Smallholder Farming Systems. Remote Sensing, 10, Article No. 729.
[7]  Pradhan, A.M.S. and Kim, Y.-T. (2020) Rainfall-Induced Shallow Landslide Susceptibility Mapping at Two Adjacent Catchments Using Advanced Machine Learning Algorithms. ISPRS International Journal of Geo-Information, 9, Article No. 569.
[8]  Zurita-Milla, R., Goncalves, R., Izquierdo-Verdiguier, E. and Ostermann, F.O. (2019) Exploring Spring Onset at Continental Scales: Mapping Phenoregions and Correlating Temperature and Satellite-Based Phenometrics. IEEE Transactions on Big Data, 6, 583-593.
[9]  Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N. and Prabhat, F. (2019) Deep Learning and Process Understanding for Data-Driven Earth System Science. Nature, 566, 195-204.
[10]  Shekhar, S., Jiang, Z., Ali, R.Y., Eftelioglu, E., Tang, X., Gunturi, V.M.V. and Zhou, X. (2015) Spatiotemporal Data Mining: A Computational Perspective. ISPRS International Journal of Geo-Information, 4, 2306-2338.
[11]  Nwaila, G.T., Zhang, S.E., Bourdeau, J.E., Frimmel, H.E. and Ghorbani, Y. (2024) Spatial Interpolation Using Machine Learning: from Patterns and Regularities to Block Models. Natural Resources Research, 33, 129-161.
[12]  Wang, Z., Shi, W.J., Zhou, W., Li, X.Y. and Yue, T.X. (2020) Comparison of Additive and Isometric Log-Ratio Transformations Combined with Machine Learning and Regression Kriging Models for Mapping Soil Particle Size Fractions. Geoderma, 365, Article ID: 114214.
[13]  Pereira, G.W., et al. (2022) Smart-Map: An Open-Source QGIS Plugin for Digital Mapping Using Machine Learning Techniques and Ordinary Kriging. Agronomy, 12, Article No. 1350.
[14]  Hengl, T., Nussbaum, M., Wright, M.N., Heuvelink, G.B.M. and Graler, B. (2018) Random Forest as a Generic Framework for Predictive Modeling of Spatial and Spatio-Temporal Variables. PeerJ, 6, e5518.
[15]  Behrens, T., Rossel, R.A.V., Kerry, R, MacMillan, R., Schmidt, K., Lee, J., Scholten, T. and Zhu, A.-X. (2019) The Relevant Range of Scales for Multi-Scale Contextual Spatial Modelling. Scientific Reports, 9, Article No. 14800.
[16]  Georganos, S., Grippa, T., Gadiaga, A.N., Linard, C., Lennert, M., Vanhuysse, S., Mboga, N., Wolff, E. and Kalogirou, S. (2021) Geographical Random Forests: A Spatial Extension of the Random Forest Algorithm to Address Spatial Heterogeneity in Remote Sensing and Population Modelling. Geocarto International, 36, 121-136.
[17]  Meyer, H., Reudenbach, C., Wollauer, S. and Nauss, T. (2019) Importance of Spatial Predictor Variable Selection in Machine Learning Applications-Moving from Data Reproduction to Spatial Prediction. Ecological Modelling, 411, Article ID: 108815.
[18]  Behrens, T. and Rossel, R.A.V. (2020) On the Interpretability of Predictors in Spatial Data Science: The Information Horizon. Scientific Reports, 10, Article No. 16737.
[19]  Chen, L., Ren, C.Y., Li, L., Wang, Y.Q., Zhang, B., Wang, Z.M. and Li, L.F. (2019) A Comparative Assessment of Geostatistical, Machine Learning, and Hybrid Approaches for Mapping Topsoil Organic Carbon Content. ISPRS International Journal of Geo-Information, 8, Article No. 174.
[20]  Behrens, T., Schmidt, K., Rossel, R.A.V., Gries, P., Scholten, T. and MacMillan, R.A. (2018) Spatial Modelling with Euclidean Distance Fields and Machine Learning. European Journal of Soil Science, 69, 757-770.
[21]  Quinones, S., Goyal, A. and Ahmed, Z.U. (2021) Geographically Weighted Machine Learning Model for Untangling Spatial Heterogeneity of Type 2 Diabetes Mellitus (T2D) Prevalence in the USA. Scientific Reports, 11, Article No. 6955.
[22]  Liu, X.J., Kounadi, O. and Zurita-Milla, R. (2022) Incorporating Spatial Autocorrelation in Machine Learning Models Using Spatial Lag and Eigenvector Spatial Filtering Features. ISPRS International Journal of Geo-Information, 11, Article No. 242.
[23]  Khaki, S., Wang, L.Z. and Archontoulis, S.V. (2020) A CNN-RNN Framework for Crop Yield Prediction. Frontiers in Plant Science, 10, Article ID: 492736.
[24]  Yu, W.T., Li, J., Liu, Q.H., Zhao, J., Dong, Y.D., Wang, C., Lin, S.R., Zhu, X.R. and Zhang, H. (2021) Spatial-Temporal Prediction of Vegetation Index with Deep Recurrent Neural Networks. IEEE Geoscience and Remote Sensing Letters, 19, 1-5.
[25]  Xu, L., Cai, R.N., Yu, H.C., Du, W.Y., Chen, Z.Q. and Chen, N.C. (2024) Monthly NDVI Prediction Using Spatial Autocorrelation and Nonlocal Attention Networks. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17, 3425-3437.
[26]  Deng, M., Yang, W.T. and Liu, Q.L. (2017) Geographically Weighted Extreme Learning Machine: A Method for Space-Time Prediction. Geographical Analysis, 49, 433-450.
[27]  Deng, M., Yang, W.T., Liu, Q.L., Jin, R., Xu, F. and Zhang, Y.F. (2018) Heterogeneous Space-Time Artificial Neural Networks for Space-Time Series Prediction. Transactions in GIS, 22, 183-201.
[28]  Wang, Y.M., Feng, L.W., Li, S.J., Ren, F. and Du, Q.Y. (2020) A Hybrid Model Considering Spatial Heterogeneity for Landslide Susceptibility Mapping in Zhejiang Province, China. Catena, 188, Article ID: 104425.
[29]  Almulihi, A., Saleh, H., Hussien, A.M., Mostafa, S., El-Sappagh, S., Alnowaiser, K., et al. (2022) Ensemble Learning Based on Hybrid Deep Learning Model for Heart Disease Early Prediction. Diagnostics, 12, Article No. 3215.
[30]  Zeng, H.R., Zhang, B. and Wang, H.J. (2023) A Hybrid Modeling Approach Considering Spatial Heterogeneity and Nonlinearity to Discover the Transition Rules of Urban Cellular Automata Models. Environment and Planning B: Urban Analytics and City Science, 50, 1898-1915.
[31]  Zhao, Z.X., Wu, J.R., Cai, F.J., Zhang, S.T. and Wang, Y.-G. (2023) A Hybrid Deep Learning Framework for Air Quality Prediction with Spatial Autocorrelation during the COVID-19 Pandemic. Scientific Reports, 13, Article No. 1015.
[32]  Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M. and Chica-Rivas, M. (2015) Machine Learning Predictive Models for Mineral Prospectivity: An Evaluation of Neural Networks, Random Forest, Regression Trees and Support Vector Machines. Ore Geology Reviews, 71, 804-818.
[33]  Li, J., Heap, A.D., Potter, A. and Daniell, J.J. (2011) Application of Machine Learning Methods to Spatial Interpolation of Environmental Variables. Environmental Modelling & Software, 26, 1647-1659.
[34]  Lee, H., Kim, J., Jung, S., Kim, M. and Kim, S. (2019) Case Dependent Feature Selection Using Mean Decrease Accuracy for Convective Storm Identification. 2019 IEEE International Conference on Fuzzy Theory and Its Applications (IFUZZY), New Taipei, 7-10 November 2019, 1-4.
[35]  Zhu, Y.T., Brettin, T., Xia, F.F., Partin, A., Shukla, M., Yoo, H., Evrard, Y.A., Doroshow, J.H. and Stevens, R. (2021) Converting Tabular Data into Images for Deep Learning with Convolutional Neural Networks. Scientific Reports, 11, Article No. 11325.
[36]  Liu, X., Wang, X.G. and Matwin, S. (2018) Improving the Interpretability of Deep Neural Networks with Knowledge Distillation. 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore, 17-20 November 2018, 905-912.
[37]  Kavitha, M., Gnaneswar, G., Dinesh, R., Rohith Sai, Y. and Sai Suraj, R. (2021) Heart Disease Prediction Using Hybrid Machine Learning Model. 2021 IEEE 6th International Conference on Inventive Computation Technologies (ICICT), Coimbatore, 20-22 January 2021, 1329-1333.
[38]  Taufiqurrahman, A., Putrada, A.G. and Dawani, F. (2020) Decision Tree Regression with Adaboost Ensemble Learning for Water Temperature Forecasting in Aquaponic Ecosystem. 2020 IEEE 6th International Conference on Interactive Digital Media (ICIDM), 14-15 December 2020, 1-5.
[39]  Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Tibshirani, R. and Friedman, J. (2009) Overview of Supervised Learning. In: Hastie, T., Tibshirani, R. and Friedman, J., Eds., The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, Berlin, 9-41.
[40]  Nti, I.K., Nyarko-Boateng, O., Aning, J., et al. (2021) Performance of Machine Learning Algorithms with Different K Values in K-Fold Cross-Validation. International Journal of Information Technology and Computer Science, 13, 61-71.
[41]  Chen, Y.G. (2013) New Approaches for Calculating Moran’s Index of Spatial Autocorrelation. PLOS ONE, 8, e68336.
[42]  Nguyen, K.T., Nguyen, Q.D., Le, T.A., Shin, J. and Lee, K. (2020) Analyzing the Compressive Strength of Green Fly Ash Based Geopolymer Concrete Using Experiment and Machine Learning Approaches. Construction and Building Materials, 247, Article ID: 118581.
[43]  Kobayashi, K. and Us Salam, M. (2000) Comparing Simulated and Measured Values Using Mean Squared Deviation and Its Components. Agronomy Journal, 92, 345-352.
[44]  Andreas, A., Mavromoustakis, C.X., Mastorakis, G. Mumtaz, S., Batalla, J.M. and Pallis, E. (2020) Modified Machine Learning Technique for Curve Fitting on Regression Models for COVID-19 Projections. 2020 IEEE 25th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), 14-16 September 2020, 1-6.
[45]  Zhang, B.Z., Duan, M., Sun, Y.F., Lyu, Y.T., Hou, Y.L. and Tan, T. (2023) Air Quality Index Prediction in Six Major Chinese Urban Agglomerations: A Comparative Study of Single Machine Learning Model, Ensemble Model, and Hybrid Model. Atmosphere, 14, Article No. 1478.
[46]  Barry, M.H., Nderu, L. and Gichuhi, A.W. (2023) A Hybrid Spatial Dependence Model Based on Radial Basis Function Neural Networks (RBFNN) and Random Forest (RF). Journal of Data Analysis and Information Processing, 11, 293-309.
[47]  Sun, Y.M., Ao, Z.Q., Jia, W.W., Xu, K., et al. (2021) A Geographically Weighted Deep Neural Network Model for Research on the Spatial Distribution of the Down Dead Wood Volume in Liangshui National Nature Reserve (China). IForest-Biogeosciences and Forestry, 14, 353-361.


comments powered by Disqus

Contact Us



WhatsApp +8615387084133