Remote Sensing, Vol. 15, Pages 4153: Comparison of Machine Learning Models to Predict Lake Area in an Arid Area
Remote Sensing doi: 10.3390/rs15174153
Authors: Di Wang Zailin Huo Ping Miao Xiaoqiang Tian
Machine learning (ML)-based models are popular for complex physical system simulation and prediction. Lake is the important indicator in arid and semi-arid areas, and to achieve the proper management of the water resources in a lake basin, it is crucial to estimate and predict the lake dynamics, based on hydro-meteorological variations and anthropogenic disturbances. This task is particularly challenging in arid and semi-arid regions, where water scarcity poses a significant threat to human life. In this study, a typical arid area of China was selected as the study area, and the performances of eight widely used ML models (i.e., Bayesian Ridge (BR), K-Nearest Neighbor (KNN), Gradient Boosting Decision Tree (GBDT), Extra Trees (ET), Random Forest (RF), Adaptive Boosting (AB), Bootstrap aggregating (Bagging), eXtreme Gradient Boosting (XGB)) were evaluated in predicting lake area. Monthly lake area was determined by meteorological (precipitation, air temperature, Standardised Precipitation Evapotranspiration Index (SPEI)) and anthropogenic factors (ETc, NDVI, LUCC). Lake area determined by Landsat satellite image classification for 2000–2020 was analysed side-by-side with the Standardised Precipitation Evapotranspiration Index (SPEI) on 9 and 12-month time scales. With the evaluation of six input variables and eight ML algorithms, it was found that the RF models performed best when using the SPEI-9 index, with R2 = 0.88, RMSE = 1.37, LCCC = 0.95, and PRD = 1331.4 for the test samples. Furthermore, the performance of the ML model constructed with the 9-month time scale SPEI (SPEI-9) as an input variable (MLSPEI-9) depended on seasonal variations, with the average relative errors of up to 0.62 in spring and a minimum of 0.12 in summer. Overall, this study provides valuable insights into the effectiveness of different ML models for predicting lake area by demonstrating that the right inputs can lead to a remarkable increase in performance of up to 13.89%. These findings have important implications for future research on lake area prediction in arid zones and demonstrate the power of ML models in advancing scientific understanding of complex natural systems.