1.
Introduction
1.1. Overview of horticulture
China has become one of the world's leading manufacturers of agricultural products. Gardening is China's most important industry. The expansion of farming is determined by a multitude of economic and environmental factors since gardening represents an atypical industry. Essentially, Andhra Pradesh, with a farming-based economy, provides around 29% of the GDP of the nation than 17% nationwide. Occasional guidance to farmers on improved agricultural techniques or breakthroughs in the factors influencing harvest development might help the state's agricultural business [1]. The global population is deeply worried about the progressively worsening situation of malnutrition. The demand for food to support the world's increasing population has increased over the last several decades despite major improvements in crop production. The shortage of available soil and irrigation resources, the effects of weather evolution and the desire to advance organic farming methods are expected to increase the constraints on current agricultural food production processes [2].
1.2. IOT-enable resource management
In farming environments, greater irrigation inspection, chemical control, water supply assessment and fertilizer controls are indicators of optimal resource administration. Past studies have demonstrated that improved use of resources results in significant cost savings. Internet of Things connectivity via Long Range Wide Area Network (LoRaWAN) and other relevant systems may improve resource effectiveness and optimization in agricultural fields and greenhouse cultivation, reduce the cost of production by removing unnecessary human activity and reduce buyer price [3]. The cost reductions were underlined by the decrease in crops caused by climate change and warming temperatures, deforestation and a shortage of agricultural land due to growing cities. According to Environmental Protection, climate change may result in a boost of $11 billion in chemical expenditure by US farmers. The global expenditures are immeasurable because of increased worries about environmental toxicity, negative effects on human wellness and permanent efficacy [4].
1.3. Sustainable food security
Farming materials are critical to global food security and social prosperity. These assets include farmland, water availability, fertile soil, variation in genes in animal and plant life, modern farming methods and skilled labor. Food security, economic stability and environmental preservation depend on the effective management and sustainable use of these resources. As the world's population rises, protecting and maximizing agricultural resources is more important than ever, promoting resilient and fruitful systems that can fulfill the world's growing food demand while preserving its ecosystems [5]. Global warming, extreme weather and natural disasters may all have a severe influence on harvests and animal productivity, leading to food insecurity and economic instability. Overuse of land and water supplies may exacerbate agricultural business issues, resulting in the degradation of soil, loss of biodiversity and shortages of water [6]. Furthermore, heavy dependence on chemical ingredients like pesticides and fertilizer may degrade the planet and pose dangers to consumers and the agricultural workforce.
1.4. Environmental challenges
The provisions of agriculture encompass a diverse variety of organic substances required for cultivating foodstuffs, fiber and fuel. The natural assets that enable food production across the globe vary from fertile soils to abundant water sources for cultivation. The variety of the environment, which includes animals, microorganisms and plants, is critical to sustaining ecological equilibrium and adaptation. Additionally, contemporary agricultural techniques, innovative technology and sustainable management practices remain to successfully use these resources, ensuring food security and financial security for citizens all over the globe. Humans must use these distinctive assets responsibly if they are to secure the long-term viability of farming and feeding the world [7].
1.5. Objective of the study
The recommended method creates a crop yield computation using an EGSO-GRU. Statistics on crop production show the level of agricultural output in a certain region. The initial collection and pre-processing of the dataset uses the normalization procedure.
1.6. Contribution of the paper
● Using the normalization procedure, the dataset was pre-processed.
● Researchers identify the feature used in the Enhanced Independent Component Analysis (EICA).
● The study has created an enhanced Gravitational Search Optimized Gated Recurrent Unit (EGSO-GRU) to forecast crops.
The remainder of this report is divided into the following sections: The literature survey is provided in Section 2. The method is described in Section 3. The experimental findings and discussion are described in Section 4, and Section 5 presents both the conclusion and suggestion for further research.
2.
Related works
Research in many areas must include a literature survey, commonly called a literature review or systematic review. It entails a thorough review and analysis of the body of research, which is shown in Table 1.
2.1. Problem statement
The preceding statement emphasizes the importance of cropping yield estimation in agricultural planning and management, emphasizing the critical role of modern technologies and data-driven methods in forecasting possible crop yields. Despite the growing significance of precise projections for optimal use of resources and ethical procedures, there remains a significant research gap that requires further analysis and investigation. One critical area that requires focus is the integration and enhancement of various sources of data. Although the assertion recognizes the significance of factors such as climate patterns, quality of soil, crop condition and historical data, more research is needed to determine how these elements may be efficiently linked and studied in real-time. Due to the dynamic and interlinked aspect of these influences, research may concentrate on creating more complex methods or ML methods that can exploit all of them.
Additionally, the time-related element of yield estimation necessitates further investigation. Present methods may be insufficient to address the issues given by rapid modifications to natural circumstances of climatic changes or unexpected catastrophes. Developing adaptable models that can react rapidly to changing circumstances and unanticipated interruptions may considerably improve yield expectation quality and dependability. There is also a study deficit in comprehending the socioeconomic ramifications of precise crop output estimates. While the assertion mentions the advantages of allocation of resources optimization and legislators, further research is needed to determine how exact yield projections might assist farmers' earnings, local economies and general food security. In conclusion, the research gap is defined by the need for additional methods to integrate multiple data sources, construct adaptive models and investigate the broader socioeconomic consequences of accurate crop output predictions.
3.
Proposed method
For farmers and policymakers to make informed decisions, crop yield prediction is a critical component of contemporary agriculture. Crop yields may be accurately predicted using cutting-edge technology like remote sensing, satellite photography and data analytics. These techniques analyze a variety of variables, including weather patterns, soil health, historical data and crop growth phases, to offer insightful information. In addition to increasing the effectiveness of agricultural practices, the data-driven approach also helps in resource management, providing food security and sustainability for a growing global population. Crop yield estimation is a critical issue in agriculture since it aids farmers, policymakers and researchers in decision-making about crop management, food security and resource allocation. Various techniques exist for estimating agricultural output, and technological improvements have created more precise and effective systems. We cover current cutting-edge study on implementing artificial intelligence (AI) through adaptive greenhouses with a goal to increase harvest rates, water and nitrogen intake effectiveness, illness and pest control and sustainable agriculture. Automation devices for using chemicals, irrigation produce and harvesting them, along with bio-inspired techniques for streamlining greenhouse procedures, handling energy, equipment route organizing and managing unmanned aerial vehicles (UAVs), resolving scheduling problems and evaluating images for indications of illness and pests, were between the most studied issues [31]. since neural networks approximated brain function, bio-inspired ANN algorithms fared better. Whereas, GA algorithms were preferred for agricultural organizing, applying pesticides, equipment route optimization and other simulations. AFSOA, firefly, ACOA, ABCOA and Cuckoo methods have enhanced land utilization organizing, yield optimization, agriculture resource utilization (waters, ground, crop chemicals and nutrients), along with supply chain processes (afterwards managing of fruits and veggies) [32].
When analyzing the agricultural supply chain, it is crucial to assess the effects of service aspects and the adoption of buy-online-pick-up-in-store (BOPIS) methods within an omnichannel framework. Service variables involve several concerns, such as the velocity, dependability and general superiority of services offered throughout the supply chain. This assessment encompasses the effectiveness of a delivery process, accuracy in fulfilling the needs of crops and the quality of customer service. Additionally, BOPIS initiatives allow farmers and organizations to buy crops via the Internet and pick goods up at approved sites. These techniques depend on an omnichannel platform that provides a consistent experience across internet portals, physical shops and mobile apps. Our objective is to comprehend how these components affect the agricultural supply chain to optimize operations, improve customer happiness and streamline agricultural product distribution.
Agriculture is vulnerable to weather, pests and market variations. The inventory model may modify and optimize to these uncertainties because to the ANN's pattern recognition. It can dynamically adjust crops and harvesting schedules depending on real-time weather and market needs, improving agricultural resilience. The influence of inflation on agriculture's input costs and price dynamics is also important. The integrated ANN, trained to account for inflation, helps the inventory model allocate resources and price and choose crops. This adaptability helps farmers and agricultural enterprises manage inflation and make smart decisions to stay profitable. Figure 1 shows a recommended approach for estimating agricultural production that blends traditional and modern techniques:
a) Dataset
A dataset is the starting point for understanding and improving agricultural productivity in crop yield analysis. This extensive data set includes various factors, such as seasonal climate, soil properties, crop varieties and associated yields. A sizable generic crop dataset with agricultural parameters feeds the model. As the feature dataset, another dataset was used. A website called kaggle.com was where the datasets were gathered. The crop dataset takes up 7841 kb of space. This dataset's prediction factors include temperature, rainfall, pH level, relative humidity and geographic location. These crops include wheat, rice, maize, millet, peas, pigeon peas, sugarcane and green gram. Every forecast parameter for a specific yield has a range of available values. For instance, if the crop was wheat, any value within the scope of wheat values available in the dataset can be used for the prediction parameters. It holds for all of the crops included in the dataset.
b) Pre-processing
In crop yield analysis, the pre-processing stage is essential for drawing important conclusions from unprocessed agricultural data. Data collection and cleaning are included in this first stage, which entails gathering and standardizing data on weather patterns, soil properties and historical yield records. First, it must remove all of the missing values from the input dataset since it has some missing values. All the missing values in the input dataset are removed using a pre-processing procedure. The proposed approach employs normalization and variable mean techniques.
There are several approaches to normalizing a dataset. The most effective normalization in this case is Minmax. The normalizing technique converts σmax to σmin, which lies between [A, B]. Equation (1) provides the mathematical formula for it;
In Eq (1), σmax represent the Maximum value and σmax represent the minimum value and σ∗ represent the normalized value. All attributes in the data are given the same weight after normalization. Q stands for the lowest value and B for the highest value.
c) Feature extraction using Enhanced independent component analyses (EICA)
A multivariate dataset can be divided into independent non-Gaussian components using the statistical and computational method of enhanced independent component analysis (EICA). Crop yield forecast considers several variables, including weather, soil composition, farming techniques and the use of fertilizers and pesticides. Regression models, machine learning algorithms and crop growth models are examples of conventional methods for predicting crop yields, including pertinent variables to calculate a crop's prospective yield. The removal of tiny information and the reduction of spectral differences between pixels in the same class were both accomplished through image denoising. This smoothing process also made distinguishing between pixels belonging to various courses harder, impacting how well the categorization works. ICA is a great solution to this problem since it can emphasize spectral differences between pixels of different classes while extracting the most important characteristics.
The data matrix S, which has N observations and M variables, encodes the basic equation driving ICA-based sparse representation. In the setting of image processing the N represents the total number of pixels of an image, and M is the value of every pixel's characteristic.
P values represent the primary components created by averaging the initial sample vectors. P comprises parts intended to preserve more variation than in the first parts. The covariance matrix C defines W.
In this scenario, A is a diagonal matrix with C's decreasing eigenvalues, while E contains C's rising eigenvectors. It is reasonable to suppose that since GM−μ is the nth column in the M by N matrix B.
Inputting the data into the μ=(1/M)(G1+….+GM)formula yields the average vector. One may find the covariance matrix C of size M×M using the following formula.
The scaling factor W for the E vector matrix is inferred from Eq (3) such that each independent variable Pm variance is 1. The first L significant components are kept, which reduces the dimensions of the input data S. The proposed technique, the trained framework, is represented by the formula ICA S; L, where L is the number of components to hold from the data matrix S. Basic theory and ICA equations are explained in this section.
d) Crop yield estimation of Enhanced Gravitational Search Optimized based Gated Recurrent Unit (EGSO-GRU)
3.1. Enhanced Gravitational Search Optimization (EGSO)
In contrast to other recognized population-based optimization methods, GSA's search agents are a group of masses that interact according to Newton's principles of gravity and motion. The agents' performance is seen in GSA as an object whose mass determines its performance. All items are attracted to one another by gravity, which generates an overall motion of all objects in the direction of the objects with heavier masses.
The location of the ith agent, assuming N agents (masses), A for B. Newton's theory of gravity defines the force coming from the jth mass toward the ith mass as follows:
The gravitational constant at time t is G(t), and Nj and Ni are the masses of the agents. Nj iscomputed using a fitness Nj(s)=fitj(s)−worst(s)best(s)−worst(t) comparison; best(t) represents the greatest capability of all agents, and worst(t) represents the poorest suitability of every agent. The total of the randomly weighted pressures applied by the other agents, for the ith agent,
The ith agent's acceleration is determined using the law of motion as follows:
The following equations may then be used to explain the search method for this notion.
A represents the location of the ith agent in the dth dimension in Eqs (6) and (7), whereas B denotes velocity and C denotes acceleration.
It must be noted that the performance of GSA is influenced by the gravitational constant(s), which is specified as a function of time t. H0 is the starting value, s is a constant, t is the current iterations and max_t represents the maximum iterations.
The parameters that determine GSA's performance are N-fold population growth, starting gravitational constant H0, max_t and constant b. It has been demonstrated that GSA is highly effective in solving a variety of nonlinear functions. GSA, an effective optimization method, can resolve various optimization issues. We use GSA to determine the parameters of a difficult nonlinear system.
3.2. Improvements in GSA
That draws inspiration from the law of global gravitation instead of algorithms replicating animal social behavior. While GSA is memory-less and considers the agents' present positions when updating, the searching operation that other agents have accumulated determines the path of an agent. Where p best is the ith particle's best prior position, and gbest is the sum of all particle's best initial positions, qj1 and qj2 are two random variables with values in [0, 1], d1 and d2 are positive constants, and w is the inertia weight.
The "Improved Gravitational Search Algorithm (IGSA)" uses a cutting-edge movement technique when searching according to gravity's laws and taking cues from social and memory cues. Following is a definition of the IGSA velocity updating equation:
where d1 and d are variables in the [0, 1] range and q, qj2 and qj3 are random variables. They may balance efficacy about "law of gravity" and "memory and social information" by varying the values of d and d2.It is evident from Eq (14) that a hybrid and widely-used version of GSA is IGSA. When d1 and d2 are equivalent, IGSA and GSA equal zero.
3.3. Gated Recurrent Unit (GRU)
Estimating crop yields is a crucial undertaking in contemporary agriculture since it enables farmers to allocate resources and manage crops in an educated manner. By employing Enhanced Gravitational Search Optimized based on Gated Recurrent Unit (GRU) models, a unique method has been devised to increase the accuracy of this estimation process. The GRU is a recurrent neural network well-suited for sequential data analysis, and it is used in this cutting-edge method to combine the strength of Enhanced Gravitational Search Optimization, a metaheuristic algorithm inspired by nature. This novel strategy promises to give more accurate and trustworthy crop output estimates by combining the advantages of both approaches, opening the door for increased agricultural productivity and environmentally friendly farming methods.
An alternative to the LSTM neural network is the gated recurrent unit (GRU). To increase the impact of early key data on subsequent training, it can learn to update or discard the hidden state when the sequence is too lengthy. To replace the input and forgetting gates in the LSTM, the GRU utilizes update gates and reset gates to replace output gates. The current time step input and the preceding time step's hidden state provide the information for resetting and updating gates, respectively. Figure 2 displays its flow chart:
The input is Yd for a specific time step d. The preceding time step's concealed state is Zd−1, σ. This shows that the following definitions apply to the activation function sigmoid, gate Zd and update gate Kd:
P is the paranoid parameter, andU is the weight matrix that needs to be learned. The candidate hidden state is then obtained by integrating the reset gate K and the traditional hidden state updating process during the time step:
˜Z denotes the potential concealed state, while Z denotes the hidden state. Only the input and the preceding hidden state Zd−1 are relevant to the candidate's secret state. The sign represents Hadamard Product ⊙. To ensure that the candidate concealed state's value is concerning (−1,1), employ the tanh nonlinear activation function in this instance.
The new hidden state Zd must then be determined by combining the changes to gate Hd, and old state Hd and invisible state Hd−1 may generally be equated to Hd. Combining components among Hd−1 and Hd is an update gate's purpose. Hence, the following definitions apply to a concealed status update:
The update gate in GRU aids in capturing the sequence's long-term dependencies, while the reset gate aids in capturing its short-term requirements, as shown in Algorithm 1.
4.
Result and discussion
In this paper, Python 3.11 has been used for implementing the enhanced Gravitational Search Optimized Gated Recurrent Unit (EGSO-GRU). A laptop with 32 GB of RAM, an Intel (R) processor and Windows 10 installed was used. The suggested EGS-OGRU method was thoroughly evaluated via comparison and assessment of the results. When compared to alternative approaches, the model's contribution to calculating crop yields is an indication that the suggested methodology was successful and effective in Support Vector Machine (SVM), Model Agnostic Meta-Learning (MAML), Convolution Neural Network-Recurrent Neural Network (CNN-RNN) and XGBoost. The graphic shows the accuracy (%) rate applied and the recommended approach for a mean square error (MSE), mean absolute error (MAE), specificity and Root mean square error (RMSE). Feature engineering is a method of selecting the most advantageous features based on predictions and model outcomes. Figure 3 illustrates the model performance that was obtained after 150 epochs of model execution.
Mean Absolute Error (MAE): The absolute difference between the anticipated and actual values is used to measure the differences between paired observations representing the same phenomenon.
Figure 4 and Table 2 show the Mean absolute error of the suggested approach, and the consumption prediction of the Mean fundamental error in the suggested system and the current systems are indicated. CNN-RNN has acquired 0.269, XGBoost has attained 3.764, MAML has reached 0.426 and SVM0.314, the proposed method achieves 0.199. It shows that the proposed approach has less value than the existing one.
Mean Square Error (MSE) is calculated as every squared difference between a variable's observed and predicted values divided by the variable's total number of values.
Figure 5 and Table 3 show the Mean sequence error of the suggested method and the consumption predicting of Mean sequence error in existing methods, and the suggested method is indicated. XGBoost has attained 24.360, CNN-RNN has acquired 0.415, CNN-LSTM has reached 0.215 and SVM has acquired 0.228. The proposed scheme achieves 0.071. It demonstrates that the suggested strategy is inferior to the current one, so the proposed one is high.
Root Mean Square Error (RMSE) is a rating of the predictor's quality. The RMSE, which represents the standard deviation of the residuals, is the square root of the MSE.
Figure 6 and Table 4 show the root mean square error of the proposed system, and the consumption prediction of source mean square error in the suggested system and the current systems are indicated as 0.210. CNN-RNN has acquired 0.299, XGBoost has attained 4.936%, CNN-LSTM has reached 0.299 and SVM has denoted 0.389 in the proposed method. It shows that the proposed approach has a lower value than the existing one.
Accuracy (%) The following formula determines accuracy: the corrected prediction to the total number of predictions.
The accuracy metrics for suggested and current strategies are shown in Figure 7 and Table 5. That accurate something reflects how near it is to the real value. The accuracy rate for the suggested methodology employing data analysis methods is 95.89 %, whereas, for the previous process, the accuracy rates for XGBoost, CNN-RNN, SVM and MAML classifiers were 83, 80, 89.5 and 82%, respectively. This proves unequivocally that the recommended approach outperforms the other options in terms of accuracy.
Specificity is a metric used in statistics and data analysis to assess how well a binary classification test performed. Out of all real negatives, it determines the percentage of true negatives.
The specificity metrics for current and suggested techniques are shown in Figure 8 and Table 6. 92.4% of the suggested method's specificity is calculated utilizing data analytics. The accuracy values produced by the XGBoost, CNN-RNN, SVM and MAML classifiers in the current technique are 74, 70, 71 and 69%, respectively. This demonstrates unequivocally that the recommended approach's specificity value is higher than the existing methods.
4.1. Discussion
The existing methods are XGBoost, CNN-RNN, MAML and SVM. A significant amount of data is needed for XGBoost's training. Large-scale, high-quality dataset acquisition in agriculture can be difficult because of things like crop-specific variables, soil variation and weather unpredictability. CNN-RNN models could be difficult to interpret due to their complexity. It can be challenging to understand the model's internal workings and extract useful insights for agricultural decision-making. The tasks that are faced during meta-training and meta-testing are assumed to be similar by MAML. Depending on variables, including varieties of crops, weather, condition of the soil and farming methods, work in agriculture can differ significantly. It can be difficult to modify MAML to accommodate such task variations. The function of the kernel and its parameters are important factors that affect SVM performance. It can be difficult to choose the right kernel and tuning parameters, and the best options could differ depending on the type of agricultural application. Self-organizing systems can adjust to changing circumstances. This flexibility could be useful in agriculture to deal with changing conditions, including pest outbreaks, soil variances and climatic changes.
4.2. Issue and challenges
Various issues arise in the investigation and assessment of agricultural resources using data analytics. As agricultural systems produce complex and diverse data, data quality and availability present considerable challenges. Furthermore, the complexity of biological systems, changes in the weather and market dynamics all introduce uncertainty into predictive modeling. It may be difficult to strike a balance between the approaches' scalability and the necessity for detail in analysis. Furthermore, it may be difficult to gather and share data in rural regions due to the digital divide. For the management of agricultural resources in a sustainable manner, these problems must be resolved.
5.
Conclusions
We conclude by presenting a novel approach, namely the EGSO-GRU technique, for estimating agricultural yield through the utilization of improved gravitational search optimization. In contrast to the currently employed techniques, the method put forward exhibits the following performance metrics: MAE of (0.199), MSE of (0.071), RMSE of (0.210), Accuracy of (95.89%) and specificity of (92.4%). The findings obtained from the approach suggested have shown a significant level of performance compared to the existing methodologies. One of the challenges encountered pertains to the presence of old or incomplete data regarding the accessibility of resources, hence posing difficulties in accurately assessing the current state of affairs. The process of evaluating becomes increasingly complex as a result of the diverse range of soil conditions and climatic factors.
5.1. Future scope
The integration of advanced technology like artificial intelligence, precise farming and remote sensing holds the potential to significantly improve agricultural supplies in the future. The optimization of resource distribution, crop management and process of decision-making will be greatly aided by advanced machine learning algorithms, such as deep learning and ensemble approaches.
Use of AI tools declaration
This research was not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
Shaanxi Social Science Foundation Project: Research on the theoretical logic and practical path of improving Shaanxi agricultural support and protection system in the context of rural revitalization. (NO: 2022XC02).
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.