1.
Introduction
Soil moisture (SM) is an important variable in water and energy fluxes occurring at the interface between land surface and atmosphere [1,2,3,4]. SM influences the surface energy, and it is a key feature in the partitioning of the net radiation into sensible heat and latent heat [1,5], the partitioning of rainfall into runoff, percolation and evapotranspiration [1,6], the plant photosynthesis and respiration [7,8,9,10], plant transpiration [10,11] and plant growth and physiological state [12,13,14]. SM is widely used in several disciplines, including: earth system dynamics, water resource management, agriculture, forestry and soil science [15,16]. In agriculture, SM is used for studies on crop production and for determining irrigation requirements [7,17,18,19]. In the event of soil moisture stress, productivity is seriously affected, and the plant has to consume more energy to take water from the soil [7,20].
SM models allow for the reduction of regular measurements of soil moisture, which is expensive and time consuming. In addition, soil moisture is only measured in special cases [21,22]. Physically-based models are based on the water transport in the soil-plant-root system, and they use the mass balance of soil water in the crop root zone, with terms for effective precipitation (Pef), evapotranspiration (E), runoff (R) and percolation (G). These terms aim at providing realistic estimates of these physical processes. Extensive field data (including soil characteristics) are required for estimating model parameters, so its applicability is limited [23]. The empirical models emerged later, using the balance of the water mass with nonlinear functions related to Pef, E, R and G, aimed at obtaining an accurate estimation of the SM; however, a realistic estimation of physical processes is not intended [22,24]. They use available meteorological data, requiring few or no soil characteristics, their tuning is simple and they can achieve high performance with high r2 values [22,25].
In [22], an empirical model is proposed for estimating soil moisture in a Himalayan watershed. It comprises the classical soil water balance. However, simplified empirical relations are used for infiltration, evapotranspiration and percolation: i) the infiltration term is a function of only rainfall, and it comprises a power law relation; ii) the percolation term is a function of only soil moisture and infiltration; iii) the evapotranspiration term is a function of only air temperature, soil surface temperature and wind speed. Therefore, the input variables of the model are rainfall, wind speed, air temperature and soil surface temperature. The model is calibrated with measurements from three sites at the lesser Himalaya for three soil depths. The performance of the model decreases as the depth of the soil increases.
In [26], a set of simplified models is proposed for estimating the soil moisture, and it is calibrated to data from Taranaki, New Zealand. Each model comprises a first-order differential equation, with empirical terms depending on rainfall, soil temperature, soil moisture and day length. A loss term is included, which depends on soil moisture level, soil, the temperature of the soil and the length of the day. The models were calibrated with data from ten locations in the Taranaki region in New Zealand. Despite their simplicity, the models can be used for sites with different soil characteristics and soil moisture dynamics. The negative soil moisture term proved significant in sites with soil moisture levels above field capacity, whereas the term involving temperature and daylight proved significant in sites with soil moisture levels below field capacity.
In [21], an adaptive neurofuzzy inference system (ANFIS) is used for estimating soil moisture at the Istanbul Bolge station in Turkey, considering the classical ANFIS model and its coupling with three bioinspired metaheuristic optimization methods: ANFIS coupled with the whale optimization algorithm (ANFIS-WOA), ANFIS coupled with the krill herd algorithm (ANFIS-KHA) and ANFIS coupled with the firefly algorithm (ANFIS-FA). The input variables are the daily air temperature, relative humidity, wind speed, sunshine hours and soil temperature. The models were calibrated with data from Turkey, 2008–2009. The SM estimation with hybrid models (ANFIS-WOA, ANFIS-KHA and ANIFS-FA) achieved a significantly lower estimation error compared to the classical ANFIS model. Furthermore, the performance of the model for different soil moisture intervals was: 15–25% (first interval), 25–35% (middle interval) and ≥35% (last interval). Hybrid models had better performance for the last interval and worst performance for the middle interval. The ANFIS-WOA model had the best performance, considering each of the three SM intervals and considering the entire SM range.
In [7], a simple model is used for estimating soil moisture and calculating irrigation requirements, in the Sikkim state of India. The model comprises a discretized soil water balance model, with an effective rainfall term, an evapotranspiration term and a constant term; a multiplicative coefficient is incorporated into the rainfall and evapotranspiration terms. The effective rainfall term is a nonlinear piecewise function of rainfall. The input variables of the model are rainfall, daily maximum and minimum temperatures, relative humidity, wind speed, precipitation and sunshine hours. The model was calibrated with data from three meteorological stations located in different parts of the Sikkim state, from the 2015–2016 period, and then validated using data from the 2016–2017 period. The determination coefficients were 0.76, 0.66 and 0.63 for the stations. Additionally, irrigation requirements for cabbage, mustard green leaves, potato, broccoli and cauliflower were calculated for the three locations.
In this work, several empirical models are used for estimating soil moisture in three locations of Colombia, considering limited weather data. The input variables are the daily maximum, minimum and average air temperatures, and precipitation. The first model (HDG0) is based on water balance and the Hargreaves model is used for the evapotranspiration term, while the runoff and percolation components are functions of precipitation and soil moisture. The second model (VJRAM) is based on water balance, where the evapotranspiration term uses the Hargreaves model and the runoff and percolation terms are functions of precipitation. The third model (K) is a simple compartment model, involving terms related to gain from rainfall and soil moisture loss, considering daylight and temperature effects. In addition, two models are proposed, formulated by combining the first model with modifications in the precipitation, runoff, percolation and evapotranspiration terms, using functions recently proposed in the current literature. Each model is calibrated using data from each location, and the fitting quality of the models is compared for each location. Compared to previous related studies on SM modeling using empirical models, the main contributions of this paper are the following:
i) Based on the model in [27], a new model is proposed with terms from the recent literature.
ii) The effect of model parameters on the fitting quality is assessed and the parameters with higher effect are determined.
iii) The proposed models and the empirical models found in the recent literature are compared in terms of the combination of fitting accuracy and number of parameters, through the Akaike Information Criterion (AIC).
2.
Soil moisture models
Precipitation turns into runoff, percolation and evapotranspiration, so that the soil moisture dynamics depends on these components [25,28,29]. Furthermore, a part of precipitated water evaporates into the atmosphere before it reaches the top soil layer [1], and the other part is retained by plant cover [30,31,32]. The part of precipitation that can be used by the plant root is known as effective precipitation [1].
Soil moisture (SM) is influenced by topography, soil properties (soil texture, drainage capability and soil density), vegetation and meteorological conditions (e.g. precipitation, temperature, wind speed) [1,33,34,35]. SM exhibits spatial variation due to its dependence on local soil characteristics [25]. Soil density influences root growth, movement of air and water through the soil and infiltration rates [7]. Additionally, the vegetation type and density affect the soil water content and soil hydrological processes [12]. Precipitation and temperature are considered as the meteorological variables with major effect on SM [22,33], and there is a strong correlation of soil moisture with temperature and precipitation [36]. SM decreases with air temperature, wind speed and solar radiation through evapotranspiration [22].
The water balance model considers the partition of precipitation into effective precipitation (infiltration), runoff, percolation and evapotranspiration, so that the soil moisture dynamics depend on these terms [28]. Evapotranspiration is commonly estimated with the Penman-Monteith (PM) model, which comprises air temperature, air relative humidity, wind speed and solar radiation as input variables. However, evapotranspiration can also be estimated with the Hargreaves (HG) model, mainly in the case where measurements of air relative humidity, wind speed and solar radiation are missing [24,37,38,39]. It can provide a satisfactory representation of evapotranspiration, achieving high values of the determination coefficient [39], and it has proven effective in estimating rainfall with effectiveness indexes similar to those of the PM model (see [24]).
2.1. Water balance and evapotranspiration models
The water balance is [3,24,40]:
The term D is the product of soil depth and porosity, W is the soil moisture, Pef is the effective rainfall, E is the evapotranspiration, R is the streamflow divergence (runoff) and G is the loss of groundwater through deep percolation. The D coefficient can be combined with the coefficients of Pef,k, Ek, Rk and Gk, so that:
Evapotranspiration can be expressed as in Eq (2) [24,25]:
where f(W) is a function of W and ET0 is the reference evapotranspiration [mm/day]. ET0 is usually calculated through the Penman-Monteith equation derived by the United Nations Food and Agriculture Organization (FAO), but the Hargreaves equation can be used for the case that solar radiation, air relative humidity and wind speed are not available [24,37,38,41,42]:
where Tavg (℃) is the daily averaged air temperature (℃); Tmax and Tmin are the daily maximum and daily minimum air temperature (℃); Ra is the extraterrestrial radiation. The Ra equations are provided in [42]. The Hargreaves model in Eq (3) can be combined with a constant term and a multiplicative coefficient (see [42]):
where kE1 and kE2 are constant coefficients. All the coefficients, not only kE1 and kE2, can be fitted to experimental data (see [39,41]). Some examples of the term f(W) in Eq (2) are [24,27]:
where Wmax, ke, kcb and kp are positive constants.
2.2. VJRAM model
Effective rainfall, runoff and percolation for unshaded coffee are represented as [43]:
However, the measurements given in [43] and [32] indicate that Pef, R and G exhibit a dead zone for low rainfall values, so that the above relations can be improved as:
where kRPRkG1 and PG are constant. In [44], the evapotranspiration is expressed as E=ET0(W/kws), where kws is a constant, and several equations are suggested for ET0. Using the above expressions, the VJRAM model is given by Eq (1) with Pef, E, R, and G terms defined as:
where ET0 is given by Eq (4). The parameters to be estimated are: kp, kR1, kR2, kG1, kG2, kE1 and kE2.
2.3. K models
The K8 model is the model (8) in [26]:
The term W is the soil moisture, P is the precipitation (mm), Tair is the air temperature in degrees Celsius, DL is the day length (hours), whereas α1, α1b, α1c, α1d, α2 and α2b are constant parameters to be estimated. In order to overcome the lack of soil temperature data, the relationship Ts=α1c+α1dTair where Ts is the soil temperature, taken from [45]. The −α1bW(Ts/20)(DL/12) term represents the soil moisture loss due to transpiration and evaporation, whereas the −α1W term represents runoff and percolation. The day length (DL) equation is given in [46]. The model requires daily precipitation and daily average air temperature as input data. The parameters to be estimated are: α1, α1b, α1c, α1d, α2, and α2b.
The K7 model is a combination of model (7) from [26] with the α2P term:
The parameters to be estimated are: α1, α1b, α1c, α1d, α2 and α2b.
2.4. HDG0 model
The HDG0 model is given by Eq (1) with Pef, E, R and G terms [27]:
where Wmax is the capacity of soil to hold water; m,Wmax,α, and μ are constants. Substituting the Pef, E and R+G expressions into Eq (1), gives:
In [27], the Thornthwaite formulas are used for ET0, whereas we use Eq (4). The parameters to be estimated are: kp, Wmax, m, α, kE1 and kE2.
2.5. HDG4 model
The HDG4 model is obtained by combining the HDG0 model with modification of the terms given in Table 1.
Runoff (R) and percolation (G) terms vanish for precipitation lower than some threshold [44,48], whereas the effective precipitation data given in [32] indicate that Pef vanishes for rainfall below some threshold. Then, we consider the dead zone function of precipitation:
where Plinf is a positive constant. The soil moisture time series for Balboa (Figure A3) indicates that for SM lower than a threshold, the rate of decrease in SM is overly slow. To account for this, we consider a dead zone function of SM:
where Wlinf is a positive constant. The modifications presented in Table 2 are proposed to give a wider capability of describing the nonlinear nature of Pef and R+G, based on the terms stated in Table 1.
In the evapotranspiration term, different values of the proportional coefficients are used for P≤Plinf and P>Plinf, where Plinf is a positive constant:
In summary, the HDG4 model is given by Eq (1) with Pef, E, R and G terms:
where,
The parameters to be estimated are: kp, Wmax, mw1, α1, kE11, kE21, α0, α3, α2, mw2, mp1, mp2, kE12 and kE22.
2.6. HDG1 model
The HDG1 model is a simplification of the HDG4 model, comprising the following modifications of terms in Eq (9):
i) the evapotranspiration function Eq (4) is used instead of the piecewise function ETT;
ii) only two additive terms are used for R+G;
iii) a Monod type function with dead zone is used for Pef, wherein the Monod function is used to represent the non-linear increase of Pef with P.
Then, the HDG1 model is given by Eq (1) with Pef, E, R and G terms:
The term Wlinf is a positive constant that can be defined from data, or it can be set as zero if it is uncertain. The parameters to be estimated are: kp1, kp2, Pp, Wmax, m, α, n, kE1 and kE2. The resulting mass balance model is:
2.7. HDG3 model
The HDG3 model is a simplification of the HDG4 model, obtained from the HDG4 model with a sole value of the proportional coefficients in the evapotranspiration term. Then, the HDG3 model is given by Eq (1) with Pef, E and R+G terms:
where,
The parameters to be estimated are: kp, Wmax, mw1, α1, kE1, kE2, α0, α3, α2, mw2 and mp1.
3.
Study sites, and model discretization, calibration and evaluation
3.1. Study sites and meteorological data
Three weather stations were selected for this study: Naranjal, Calarcá (CALARCA AUTOM) and Balboa (BALBOA AUTOM). The experimental Naranjal station (04°58'N, 75°39'W) is located in Chinchiná (Caldas, Colombia) at an elevation of 1381 m, with an average temperature of 20.9℃ and Castillo variety coffee crop [49]. The distance between plants and rows is 1.0 by 2.0 m and the soil bulk density is 0.64 g/cm3. The records of daily precipitation, daily maximum air temperature, daily minimum air temperature, daily average air temperature and daily average soil moisture were extracted from [49] through image digitization using WebPlotDigitizer software (https://automeris.io/WebPlotDigitizer/index.html), spanning from July 10, 2015 to December 1, 2015 (Figure A1). The soil moisture measurements correspond to 10 cm in depth.
CALARCA AUTOM (4.528°N, 75.596W) is located at Calarcá (Quindío, Colombia) at an elevation of 2255 m. The climate is cold and humid, and the land cover type is cropland and pastures [50]. The records of daily precipitation, daily maximum air temperature, daily minimum air temperature, daily average air temperature and daily average soil moisture were collected from the IDEAM database (http://www.ideam.gov.co/) and span from April 21, 2021 to October 27, 2021, including a period with no measurement from May 13, 2021 to July 26, 2021 (Figure A2). The short dataset spans from September 1, 2021 to October 27, 2021. The soil moisture measurements correspond to 10 cm in depth.
BALBOA AUTOM (2.033°N, 77.222W) is located at Balboa (Cauca, Colombia) at an elevation of 1700 m. The climate is temperate and humid, and the land cover type is cropland and pastures [50]. The records of daily precipitation, daily maximum air temperature, daily minimum air temperature, daily average air temperature and daily average soil moisture were collected from the IDEAM database (http://www.ideam.gov.co/) and span from January 01, 2019 to December 31, 2019 (Figure A3). The soil moisture measurements correspond to 10 cm in depth.
3.2. Model discretization
The discretized form of Eq (1) is [51]:
where Δt is the time increment, whereas Pef,k,Ek,Rk, and Gk are the values of Pef,E,R, and G at time k, given by: Eq (7) for model K7; Eq (6) for model K8; Eq (5) for model VJRAM; Eq (8) for model HDG0; Eq (10) for model HDG1; Eq (11) for model HDG3; and Eq (9) for model HDG4.
3.3. Model calibration and performance evaluation
The models, using the discretized model Eq (12) instead of the continuous time model Eq (1) are: K7 (7); K8 (6); VJRAM [Eqs (12) and (5)]; HDG0 [Eqs (12) and (8)]; HDG1 [Eqs (12) and (10)]; HDG3 [Eqs (12) and (11)]; and HDG4 [Eqs (12) and (9)]. These models are calibrated using records of daily precipitation, daily maximum air temperature, daily minimum air temperature, daily average air temperature and daily average soil moisture. An initial estimate of each parameter is chosen, and the parameter estimates are obtained by minimization of the sum of the squared deviations between the observed and simulated data, using MATLAB (The MathWorks Inc., Natick, MA, USA) with the fmin function. The fitting quality is assessed via the simulation error metrics and the Akaike information criteria. The simulation error metrics compare the observed and simulated soil moisture values over the time range (see [25,52,53]):
● Mean absolute error:
● Root-mean-square error:
● Mean bias error:
● Classical Nash-Sutcliffe efficiency:
● Nash-Sutcliffe efficiency based on absolute error:
The term Xobs,j represents the observed values, Xmodel,j represents the simulated values, ¯Xo represents the mean of observed values, N represents the number of observations and the summation Σ holds over the range of the time series. The Nash-Sutcliffe (NS) coefficient ranges between −∞ and 1.0; an NS value of 1.0 corresponds to perfect fit, 0.0 corresponds to a model that is not better than using the average value and a negative value corresponds to a model that has less efficacy than the mean of the observations [25,54]. The range NS>0.7 is considered as satisfactory fitting by some authors, whereas NS≥0.8 is considered as high fitting quality [23,53], but there is no universally accepted criterion [25,28]. Low values of MAE, RMSE and MBE correspond to high fitting quality [53].
In addition, the Akaike information criterion allows comparison of models considering both the prediction quality and the number of model parameters. The basic Akaike information criterion is [55,56,57]:
where SSE is the sum of squared errors, N is the number of observations and Np is the number of parameters of a given model. The Akaike information criteria for the case that the number of observations is small in comparison to the number of parameters Np+1, that is, Np>N/40, is [55,56,57]:
The best model is the one with the lowest AIC value [56,57]. In addition, the effect of model parameters on the weighted sum of squared simulation errors (E) allows identification of which parameters require an accurate estimation [53], E being given by:
4.
Model fitting results
The predicted and observed soil moisture data for the Naranjal station, period ΩN123=[194−333] are shown in Figure 1. The values of performance criteria are given in Table 3. The sensitivity analysis is shown in Figure 5.
The boxplot for soil moisture models for Naranjal station is given in Figure 2, and the Taylor diagram is given in Figure 3.
The order of the models, from the lowest to the highest AIC, the lowest to the highest RMSE and the highest to the lowest NS0, is:
− AIC: VJRAM, HDG3, HDG4, HDG1, HDG0, K7, K8.
− RMSE: VJRAM, HDG4, HDG3, HDG1, HDG0, K7, K8.
− NS0: VJRAM, HDG4, HDG3, HDG1, HDG0, K7, K8.
Therefore, the order of models is different for the AIC, RMSE and NS0 indices, but VJRAM model is the best according to AIC, RMSE and NS0, whereas models HDG3 and HDG4 achieve either the second or third place. The second lowest AIC corresponds to HDG3 and the second highest NS0 (0.889) corresponds to the HDG4 model. The NS0 values obtained with the K7 and K8 models (in the range 0.816–0.817) are lower than those of the HDG models (range 0.822–0.890). The boxplot (see Figure 2) indicates that: i) HDG3, HDG4 and VJRAM models have a similar interquartile range (IQR), lower than that of HDG0 and HDG1; ii) VJRAM model has the lowest range and the lowest median; iii) HDG0 model has the highest IQR, the highest range and the highest outliers. The Taylor diagram (see Figure 3) indicates that: i) the correlation coefficient (R) is higher than 0.9 for the models, so that the modelling data are similar to measurement data; ii) VJRAM and HDG4 models have higher correlation coefficient, whereas HDG0 has the lowest value. Finally, the VJRAM model Eq (5) is chosen as the highest quality model, and the main reason is that it has the lowest AIC and the highest NS0. The better performance of VJRAM model confirms the effectiveness of the tailored structure of percolation term G for improved soil moisture modelling. The predicted and observed soil moisture values for the Naranjal station, using the VJRAM model are shown in Figure 4.
The sensitivity analysis for the VJRAM model (Naranjal station) is shown in Figure 5.
The effects of model parameters on model fit for the Naranjal station are: i) higher effect: kG1, kG2, kP and kE2; ii) lower effect: kR and kE1. Therefore, accurate estimation of kG1, kG2, kP and kE2 is important.
Collecting additional meteorological records for Naranjal station is not straightforward, and there is not a public database with this information. Therefore, the 2015 period ΩN123=[194−333] is split into calibration period ΩN13=[194−264]⋃[310−333] and validation period ΩN2=[265−309]. Then, model VJRAM is calibrated over ΩN13 and the obtained parameter estimates are used for model validation over period ΩN2. The error metrics are given in Table 3; the simulations of soil moisture over time for calibration and validation periods are given in Figure 6; the plots of measurements versus simulation (calibration and validation) are given in Figure 7.
The predicted and observed soil moisture data for the Calarcá station over period ΩK3=[244−300] are shown in Figure 8; the values of performance criteria are given in Table 4.
The boxplot for soil moisture models for Calarcá station is given in Figure 9, and the Taylor diagram is given in Figure 10.
The NS0 values obtained with K7, K8 and VJRAM models are in the range 0.21–0.37, whereas the NS0 values for HDG models are in the range 0.653–0.712. The order of the models, from the lowest to the highest AIC, the lowest to the highest RMSE and the highest to the lowest NS0 is:
-AIC: HDG0, HDG3, HDG1, HDG4, VJRAM, K8, K7.
-RMSE: HDG3 and HDG4 (same value), HDG1, HDG0, VJRAM, K8, K7.
-NS0: HDG4, HDG3, HDG1, HDG0, VJRAM, K8, K7.
Therefore, HDG3 model achieves either the first or second place in the RMSE and NS0 indices, whereas HDG4 model achieves the highest NS0 but it is in the fourth place in terms of AIC. The boxplot (see Figure 9) indicates that: i) all the models have a near-zero median; ii) HDG4 and HDG3 models have a similar IQR and range; iii) HDG4 model has the lowest range, whereas HDG4 and HDG3 have the lowest IQR; iv) VJRAM model has the highest range, the highest IQR and the highest outliers. In summary, HDG4 and HDG3 are the best two models in terms of range and IQR, whereas VJRAM is the worst model. The Taylor diagram (see Figure 10) indicates that: i) for the VJRAM model, the correlation coefficient (R) is close to 0.6 (see Figure 10), while other models (HDG0, HDG1, HDG3, HDG4) have an R higher than 0.8; ii) The HDG3 and HDG4 models have a similar R, and the HDG4 model has the highest R. Finally, the HDG3 model is chosen as the highest quality model, and the main reason is that the lowest AIC corresponds to HDG0, but its NS0 (0.6532) is lower than 0.7, whereas HDG3 has the second lowest AIC and exhibits NS0>0.7. Then, the HDG3 model is fitted using the period ΩK123=[111−132]⋃[208−235]⋃[244−300]. The resulting predicted and observed soil moisture values are shown in Figure 11, and the performance is:
MAE: 0.0069431; RMSE: 0.010809; MBE: 0.00036298; NSabs: 0.68547; and NS0: 0.84014.
It achieves a high simulation capability for low (0.27–0.30) and high (0.36–0.38) soil moisture ranges and for the drying time periods after rain events.
The sensitivity analysis for the HDG0 model (Calarcá station) is shown in Figure 12.
The effect of model parameters on model fit for the Calarcá station is: i) higher effect: m, kp and Wmax; ii) lower effect: α, kE2 and kE1. Therefore, an accurate estimation of m, kp and Wmax is important.
Model validation for Calarcá station over 2020 or 2022 is hampered by the quality of meteorological data: corresponding daily measurements of precipitation, average temperature, maximum temperature, minimum temperature and soil moisture over time periods of at least 50 days, in 2020 or 2022 are not available from the IDEAM database. Therefore, the year 2021 is split into calibration period ΩK13=[111−132]⋃[244−300] and validation period ΩK2=[208−235]. Then, model HDG3 is calibrated over ΩK13, and the obtained parameter estimates are used for model validation over period ΩK2. The error metrics are given in Table 4; the calibration and validation of soil moisture over time are given in Figure 13; the plots of measurements versus simulation (calibration and validation) are given in Figure 14.
For Balboa station, soil moisture models are calibrated over ΩB1 = [1−318] ⋃ [321−365], 2019, and the obtained parameter estimates are used for model validation over period ΩB2 = [01−129], 2020. The simulations of soil moisture for period ΩB1 are shown in Figure 15. The values of the performance criteria are given in Table 5.
The boxplot for soil moisture models for Balboa station is given in Figure 16, and the Taylor diagram is given in Figure 17.
The order of the models, from the lowest to the highest AIC, the lowest to the highest RMSE and the highest to the lowest NS0 is:
-AIC: HDG0, HDG1, K7, K8, VJRAM, HDG4, HDG3.
-RMSE: HDG0, HDG1, HDG4, HDG3, K7, K8, VJRAM.
-NS0: HDG0, HDG1, K7, K8, HDG4, HDG3, VJRAM.
Therefore, the HDG0 model is the best one according to AIC, RMSE and NS0, whereas the HDG1 model is the second best. The NS0 values obtained with the K7 and K8 models (in the range 0.948–0.951) are comparable to those of the HDG models (range 0.921–0.983). The NS0 value for the VJRAM model (0.902) is the lowest. The boxplot (Figure 16) indicates that: i) HDG0, HDG3 and HDG4 models have a near zero median; ii) HDG3 and HDG4 models have the lowest range and the lowest IQR (see Figure 16), but their disadvantage is the higher amount of outliers compared to other models; iii) VJRAM model has the highest range, the highest IQR and the highest median; iv) HDG0 model has a lower amount of outliers compared to HDG3 and HDG4 models, although its IQR and range are higher. The Taylor diagram (see Figure 17) indicates that: i) all the correlation coefficients (R) are higher than 0.7, the R of VJRAM is close to 0.73 and the R of other models is close to 0.95; ii) the R of HDG0, HDG1, HDG3 and HDG4 is similar, and it is higher than that of VJRAM. Finally, the HDG0 model is chosen as the highest quality model, and the main reason is that it has the lowest AIC (-4627.4) and the highest NS0 (0.983). The predicted and observed soil moisture values for Balboa station, using the HDG0 model are shown in Figure 18.
The sensitivity analysis for the HDG0 model (Balboa station) is shown in Figure 19.
The effect of model parameters on model fit for Balboa station is: i) higher effect: m, kp and Wmax; ii) lower effect: α, kE2 and kE1. Therefore, an accurate estimation of m, kp and Wmax is important.
Model HDG0 is validated for Balboa station, period ΩB2 = [01 − 129], 2020, using estimated parameters obtained from calibration. The error metrics are given in Table 5; the simulation of soil moisture over time for validation period is given in Figure 20; the plots of measurements versus simulation (calibration and validation) are given in Figure 21.
The parameter values of the models with best performance are given in Table 7.
There is not a single model that exhibits neither the highest nor the lowest capacity for representing the soil moisture over the three stations (see Table 6). This is related to the different soil conditions. The lowest AIC values (-927.4, -496.97 and -4627.4) correspond to the VJRAM, HDG0 and HDG0 models for the Naranjal, Calarcá and Balboa stations, respectively. The highest AIC values (-857.74, -450.05 and -2457.1) correspond to the K8, K7 and HDG3 models, for the Naranjal, Calarcá and Balboa stations, respectively. In general terms, the compartment-based models (K models) exhibited worse performance compared with the HDG models: the K models exhibited neither the best NS0 nor the best AIC value, for any of the three stations. In addition, the K7 and K8 models exhibited lower NS0 and higher AIC compared to the HDG models. Although the HDG0 is a standard SM model, it was the best model only for the Balboa station, if both AIC and NS0 are considered. For instance, the highest NS0 value for the Calarcá station was obtained by the HDG4 model but the lowest AIC was obtained by the HDG0 model. This implies that HDG4 achieves higher simulation accuracy at the cost of a significantly higher number of model parameters, for the Calarcá station. For the Naranjal station, the previous knowledge on percolation representation resulted in higher capacity for modeling SM, since the VJRAM model achieved the lowest AIC and highest NS0, by using logarithmic type nonlinear functions.
The Nash-Sutcliffe coefficient obtained was higher than 0.8 for the three stations, considering the large dataset for the Calarcá station. NS0 values for each model depend on the weather station considered. The highest NS0 values (0.895, 0.712 and 0.983) correspond to the VJRAM, HDG4 and HDG0 models, for the Naranjal, Calarcá and Balboa stations, respectively. These values can be considered as satisfactory, considering the complex effects of the soil that support the variation of soil moisture and the lack of measurements of relative humidity, solar radiation and wind speed. The lowest NS0 values correspond to the K8, K7 and VJRAM models, for the Naranjal, Calarcá and Balboa stations, respectively. The NS0 range obtained at the Balboa station (0.902 to 0.983) was higher than the NS0 value for the Calarcá station (0.840) and the NS0 range for the Naranjal station (0.816 to 0.895). The HDG group of models exhibited high NS0 values for all cases. The NS0 values for the K7 and K8 models are similar, when comparing them for the same station. For the Naranjal station, VJRAM was chosen as the best model, but the HDG4 model also exhibited a high performance, as its NS0 (0.889) is comparable to VJRAM (0.895). The use of a combination of polynomial terms of precipitation and soil moisture resulted in better NS0 of the HDG3 model compared to that of HDG0 for the Naranjal and Calarcá stations. However, the NS0 of the HDG3 model was lower compared to HDG0 for the Balboa station. The main differences between the HDG3 and HDG0 models is that HDG3 includes α3¯Pmp1¯W+α0+α2¯Wmw2 and uses ¯P and ¯W instead of P and W, in the R+G term.
The simulated versus observed soil moisture values are given in Figure 10. The gray straight line is the identity line, and the black straight line is the linear regression.
The slope is similar to 1 for the three stations, which implies model consistency. The R2 value is higher than 0.8 for the three stations, implying that a large proportion of the variation in observed values is explained by the regression model. The simulation residual versus simulated soil moisture values are shown in Figure 11.
For the Naranjal station, there are no clear patterns, and the distribution is approximately symmetrical. For the Calarcá station, there are no clear patterns, but the residuals are shorter for the simulated SM range 0.295–0.310. For the Balboa station, the residuals are shorter and there is a nonlinear pattern for simulated SM lower than 0.42. This means that the model can be further improved.
5.
Discussion
The mass balance models VJRAM, HDG0, HDG3 and HDG4 are simple, as they only require basic meteorological data and no soil information. Despite the model simplicity, the obtained NS0 and NSabs values are quite high. The results confirm that alternative simple evapotranspiration equation (4) can be used for different soil conditions instead of the classical FAO56 equation, despite the high complexity of the process of water loss extraction from soil. This relaxes the requirement of data, including: i) meteorological data: air relative humidity, wind speed and solar radiation; ii) soil data, for instance water content at field capacity, soil texture and soil hydrologic conductivity. In evapotranspiration equation (4), the coefficients kE1 and kE2 are considered as the only parameters to be estimated, thus reducing the total number of estimated parameters. In addition, the percolation and runoff terms are represented as a lumped term called R+G, which is simple with no requirement of soil data. This is in contrast to the complexity of the curve number method for runoff. Due to these features, the application of the soil moisture dynamic models VHRAM, HDG0, HDG1, HDG3 and HDG4 to different soil types is facilitated. However, application of these models at specific sites requires calibration using daily measurements of soil moisture, precipitation, maximum air temperature, minimum air temperature and average air temperature.
Since the AIC value depends on the effect of both SSE and the number of parameters, a higher NS0 value does not guarantee a lower AIC for a given model, whereas a high number of parameters may reduce the AIC, and the order of models is different for the AIC and NS0 indices. For instance, the highest NS0 value for the Calarcá station was obtained by the HDG4 model, but the lowest AIC was obtained by the HDG0 model. In addition, a high number of parameters does not guarantee a higher NS0 for a given model (see Table 6). For instance, the HDG4 model has the largest number of parameters and exhibits the highest NS0 value for Calarcá station (see Table 4), but not for Naranjal nor Balboa stations (see Tables 3 and 5). Recall that the number of parameters is: 14 (HDG4), 11 (HDG3), 9 (HDG1), 7 (VJRAM), 6 (HDG0), 6 (K7) and 6 (K8).
In general, models VJRAM and HDG0 achieved the lowest AIC values, considering the three weather stations. The higher performance of the VJRAM model for the Naranjal site indicates that the logarithmic structure of percolation term G is suitable for Naranjal station but not for Calarcá or Balboa stations. The higher performance of the HDG0 model for the Calarcá and Balboa stations in terms of AIC indicates its modelling capacity with a low number of parameters. The low performance of the K7 and K8 models confirms that the evapotranspiration term E has an important effect on the soil moisture model, and a proper expression must be used, whereas the Hargreaves model Eq (4) is suitable. Model HDG4 achieved a lower performance in terms of AIC, which is related to both its large number of parameters and its limited modelling capability in terms of NS0.
Comparison of performance in different studies is not straightforward, because there are numerous factors influencing the performance of soil moisture modelling, including measurement uncertainty and model structure. Then, only model structure and modeling procedure is compared in what follows. The soil moisture modeling using models VJRAM, HDG0, HDG3 and HDG4 is simpler than that of [25]: the definition of the category of the soil moisture resistance curve, the fitting of the infiltration equation and the determination of the field capacity are not required. This leads to fewer measurements, and a simpler fitting procedure, whereas soil data are not required. Comparing soil moisture modelling in the present study with that of [22], a remarkable similarity is that the definition of the category of the soil moisture resistance curve, the preliminary fitting of infiltration equation and the determination of the field capacity are not required. In [22], the evapotranspiration term is a combination of exponential and logarithmic functions, and it requires measurement of wind speed, whereas the evapotranspiration term of models in the present study is based on the Hargreaves equation and requires measurement of Tavg, Tmax and Tmin. In [22], and also in our study with models HDG0, HDG4, HDG1 and HDG3, the percolation term depends on soil moisture and precipitation.
6.
Conclusions
In this study, several empirical models are evaluated to estimate soil moisture for three locations in Colombia. The daily precipitation, and average, maximum and minimum air temperatures are the input variables. The first group of models are water balance types of models, where the evapotranspiration term is based on the Hargreaves model and is SM limited, whereas the runoff and percolation terms are functions of precipitation and soil moisture. The second group of models are simple compartment-based models. In addition, three models are proposed and formulated by combining the first model group with modifications in the precipitation, runoff, percolation and evapotranspiration terms, using functions recently proposed in current literature. The models are calibrated using field data from each location. The main contributions over closely related studies are: i) a new model is proposed, based on a water balance model with terms from recent literature, and combinations of these terms are proposed; ii) the effect of model parameters on the fitting quality is assessed, and the parameters with higher effect are determined; iii) the proposed models and the empirical models reported in the recent literature are compared in terms of the combination of fitting accuracy and number of parameters, through the Akaike information criterion (AIC). The proposed modifications improved the simulation of soil moisture at the three locations.
The lowest AIC values correspond to the VJRAM, HDG0 and HDG0 models for the Naranjal, Calarcá and Balboa stations, respectively. In addition, the highest AIC values correspond to the K8, K7 and HDG3 models for the Naranjal, Calarcá and Balboa stations, respectively. Therefore, there is no single model that exhibits neither the highest nor the lowest capacity for representing the soil moisture in the three stations. However, the K7 and K8 models exhibited a lower modeling capacity in terms of AIC, compared to the HDG group and VJRAM, for the three stations. For the Naranjal station, the previous knowledge on percolation representation resulted in a higher capacity for modeling SM.
The Nash-Sutcliffe coefficient obtained was higher than 0.8 for the three stations. These NS0 values can be considered as satisfactory, considering the complex soil effects supporting the variation of soil moisture and the lack of measurements of relative humidity, solar radiation and wind speed. The highest NS0 values correspond to the VJRAM, HDG4 and HDG0 models, for the Naranjal, Calarcá and Balboa stations, respectively, whereas the lowest NS0 values correspond to the K8, K7 and VJRAM models, for the Naranjal, Calarcá and Balboa stations, respectively. The HDG group of models exhibited a high NS0 value for all cases. The use of a combination of polynomial terms of precipitation and soil moisture resulted in better NS0 for the HDG3 model compared to HDG0, for the Naranjal and Calarcá stations.
Use of AI tools declaration
The authors declare that they have not used artificial intelligence (AI) tools in the creation of this article.
Acknowledgments
The work of Alejandro Rincón was supported by the Universidad Católica de Manizales. The work of Fredy E. Hoyos and John E. Candelo-Becerra was supported by the Universidad Nacional de Colombia, Sede Medellín. This work was also supported by the Universidad Nacional de Colombia, Sede Medellín, under the project HERMES: 55173.
The records of meteorological data for Calarcá and Balboa stations were collected from IDEAM database (http://www.ideam.gov.co/).
Conflict of interest
The authors declare there is no conflict of interest.
Appendix A