Auto Regressive Integrated Moving Average, is a popular time series forecasting method traditionally used for time series forecasting of parameters
(Box et al., 2007; Clements 2003). It combines autoregressive (AR), differencing (I) and moving average (MA) models to capture the underlying patterns in time series data. ARIMA is particularly useful for analysing and predicting data with a clear temporal structure. Developed by Box and Jenkins the Box-Jenkins ARIMA (p, d, q) model is a commonly used method for building univariate time series forecasting models
(Box et al., 2015). The ARIMA model, which George Box and Gwilym Jenkins developed in the 1970s, offers a mathematical framework for process prediction. The Box-Jenkins modelling technique consists of finding an appropriate ARIMA process, fitting it to the data that is at hand and then using the established model to make predictions. The moving average (MA), differencing (I) and autoregressive (AR) components of the ARIMA model are integrated to represent the time series as a function of its historical values. The general ARIMA (p, d, q) formula is shown in equation 1.

Where,
Y
t = Dependent variable.
ε
t = Independently and normally distributed with zero meanand constant variance for t=1, 2, ... , n and ∅p and θq are also Estimated
(Revathi et al., 2023).
The ARIMA modelling procedure consists of several key steps for capturing and forecasting temporal patterns in a time series dataset. The key steps involved in the ARIMA model process flow are:
a. Data preparation
The time series data needed for the forecast is being collected and examined for missing values and outliers.
b. Stationarity check
The ARIMA model is developed on the assumption that the statistical properties of the data, such as the mean, variance and autocorrelation, are constant over time. Statistical tests
viz. Augmented dickey-fuller (ADF) test, kwiatkowski-phillips-schmidt-shin (KPSS) Test or phillips-perron (PP) tests are conducted to check if the data is stationary.
c. Differencing or integration
Differencing refers to the process of subtracting the previous observation from the current observation in a time series dataset to make it stationary. Equation 2 refers to the first-order differencing.

Integration refers to the number of differencing operations required to make a time series stationary.
d. Identification of model parameters
The appropriate ARIMA (p, d, q) model is identified by finding the values of the auto-regressive component (p), order of differencing component (d) and moving average component (q) using tools like autocorrelation function (ACF) and partial autocorrelation function (PACF) plots.
o Determination of order of differencing (d): The number of times differencing is applied gives the d in ARIMA (p, d, q).
o Determination of moving average component (q) using ACF: If the ACF cuts off after lag q and PACF tapers off, this indicates an MA(q) model.
o Determination of auto-regressive component (p) using PACF: If the PACF cuts off after lag p and ACF tapers off, this indicates an AR(p) model.
o Identify ARMA (p, q): If both ACF and PACF taper off, it indicates a mixed ARMA model.
e. Model selection and fitting
Use the chosen values of p, d and q and fit the ARIMA (p, d, q) model to the stationary data using statistical software.
f. Residual analysis
Residual analysis is a crucial step in the ARIMA model, which involves identifying the differences between the actual values and the values predicted by the model. A well-fitted ARIMA model resembles white noise, indicating that the data points are random and have no correlation with each other. Ljung-Box statistical test can be conducted to check the autocorrelation. The model needs a refinement if autocorrelation is detected.
g. Forecasting
A well-fitted ARIMA model can be used for predicting future time series data.
h. Model validation
The ARIMA model can be validated to quantify the errors in forecasting, by finding the mean absolute percentage error (MAPE), root mean squared error (RMSE) and mean absolute error (MAE) as calculated in equations 3 to 5.
i. Iteration
If the residual analysis shows poor performance of the model, then the parameters p, d, or q values have to be adjusted.
j. Finalise the model
The finalised model can be used for time series forecasting and further decision making.
The process flow chart of the ARIMA model is depicted in the Fig 1.
This paper employs a time series dataset on cabbage production in India from 1961 to 2022, taken from FAOSTAT, as the ‘sample data’ throughout this study, to demonstrate the usage of ARIMA modelling. Initially, the time series data is differenced to stabilise the mean and eliminate trends. Fig 2 shows a plot of the actual time series data and a plot of the differenced data derived from the sample dataset. This visualisation shows a clear comparison between the original observations and the results of differencing.
The autocorrelation and partial autocorrelation functions help to determine the appropriate autoregressive (AR) and moving average (MA) orders. Identifying the autoregressive (p) and moving average (q) orders for an ARIMA model from the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots involve analysing key patterns. Fig 3 depicts the ACF and PACF plots for the sample data. The ACF plot for (p) reveals a sharp drop following a positive spike. A significant spike at lag 1 followed by a gradual decay typically indicates (p = 1). In contrast, (q) is determined by analysing the lag in the PACF plot where a sharp drop occurs after a positive spike. A prominent spike at lag 1, followed by a gradual decay, could indicate (q = 1. Incorporating these guidelines and iteratively evaluating different combinations improves the accuracy of ARIMA model orders, laying the groundwork for effective forecasting. From the ACF and PACF plots, it is evident that the potential values for p and q are p=1 and q=1.
The model evaluation is a crucial step in assessing the accuracy and reliability of forecasting models
(Revathi et al., 2023). Common metrics used in model evaluation include mean absolute percentage error (MAPE), root mean squared error (RMSE) and mean absolute error (MAE), which quantify the extent of forecast errors. MAPE provides a percentage-wise representation of the average forecast error, while RMSE and MAE provide insights into the magnitude of errors, with lower values indicating better predictive accuracy. A variety of plots can be used to demonstrate the ARIMA model’s effectiveness. Time series plots compare actual observations to predicted values, allowing a visual assessment of model fit. Residual plots, which demonstrate the randomness of errors, aid in the validation of model assumptions.
Furthermore, autocorrelation and partial autocorrelation plots confirm the appropriateness of the chosen orders. Forecast vs. actual plots compare predicted and observed values, providing insights into the model’s predictive accuracy. These plots contribute to a comprehensive assessment of the ARIMA model’s ability to capture and forecast agricultural time series dynamics. Fig 4 gives different model evaluation plots for the ARIMA model constructed for the sample data.
Enhancements and advanced techniques
Advancements in ARIMA modelling within the agricultural sector have led to the exploration of various enhanced techniques and hybrid models, improving the strengths of ARIMA with other methodologies for better accuracy and robustness.
Hybrid models and integration
Incorporating machine learning methods, such as random forests or neural networks, into ARIMA has shown promise in dealing with nonlinear patterns and improving the predictive power of agricultural forecasting. These hybrid models can detect complicated associations in data that regular ARIMA may ignore. ARCH/GARCH models are used when homoscedastic error variance is violated and the ANN approach can be used as an alternative to traditional models.
Koutroumanidis et al. (2009) used ARIMA, ANN and a hybrid model to forecast future fuelwood prices in Greek state forest farms. The ARIMA-ANN hybrid model produced optimal forecasts, allowing for rational planning in the production and fuelwood markets.
Sujjaviriyasup et al. (2013) developed hybrid models for agricultural production planning using real data from Thailand’s orchid export and pork product. The models combine different forecasting techniques to overcome time-series forecasting errors. The SVM and ARIMA hybrid model is chosen for its precision. Experiments show that the SVM and ARIMA hybrid model achieves significant error reductions from MAE, RMSE and MAPE in the orchid export case and in the pork product case. A hybrid approach combining forecasts from linear and nonlinear time-series models, such as ARIMA-GARCH and ARIMA-ANN, has been applied for modelling and forecasting wholesale potato prices in the Agra market of India by
Mitra et al. (2017). A comparative assessment was conducted and it was found that the ARIMA-ANN hybrid model outperforms other combinations and individual counterparts for the data under consideration.
Alam et al. (2018) used ARIMA, ANN and an ARIMA-ANN hybrid approach to forecast time series data on rice yield in Aligarh district, Uttar Pradesh, from 1975 to 2013. The hybrid approach significantly reduced MAPE, indicating superior performance compared to ARIMA alone.
Purohit et al. (2021) used the ARIMA-ANN hybrid model to forecast sugarcane yield in India and the empirical results showed that it outperformed the ARIMA strategy. (
Revathi et al., 2023) creates ARIMA models and polynomial models for Kerala’s coconut industry, focusing on sustainable agricultural planning, resource allocation and policy formulation. The ARIMA model is preferred for its precision and normality in residual statistics. The polynomial regression model provides the best fit for production and productivity, with lower MAPE, RMSE and higher R
2 values. A hybrid model, created using the best-fitting polynomial and ARIMA models, offers more accurate data representation due to high R
2 and low MAPE values.
Pandit et al. (2024) compared traditional ARIMA models with an ARIMA-GA hybrid approach for forecasting milk production in India. Their results showed that the ARIMA-GA model outperformed the conventional method in terms of accuracy. The study suggests further exploration of other evolutionary algorithms within the ARIMA framework for broader applicability.
Arima extensions
Extending ARIMA to account for seasonal variations in agricultural data has been pivotal. SARIMA models enable the capture of periodic fluctuations and seasonal trends, which are common in agricultural settings (
e.g., seasonal crop cycles, weather patterns). The integration of seasonal components enhances the model’s ability to make more accurate long-term predictions. The SARIMA model was developed to fit a model that forecasts quarterly sugarcane yields in Kenya
(Mwanga et al., 2017). The study identified that SARIMA (2,1,2), (2,0,3) as the best model for predicting future sugarcane yields from 1973-2014. It predicted a fall in yields until 2020, then a steady rise. The study highlights the potential of seasonal ARIMA models in various sectors. Another study (
Sabu and Kumar, 2020) predicts monthly arecanut prices in Kerala using time-series and machine learning models. SARIMA, Holt-Winters’ Seasonal method and LSTM neural network models were used, with the LSTM neural network model being the most suitable for the data. Red lentil prices in Saskatchewan are modelled and forecasted using the Seasonal Autoregressive Integrated Moving Average model
(Divisekara et al., 2020). The model, which is based on 521 observations from 2010 to 2019, performs well in both sample and out-of-sample scenarios, assisting growers and end users in making optimal production decisions and controlling pricing risk.
ARIMA with drift is an extension of the standard ARIMA model that incorporates a linear trend or drift component. It captures the systematic increase or decrease in time series values, allowing for a better understanding of long-term patterns. This model offers improved accuracy in capturing evolving patterns and is useful in fields like economics, finance and environmental science.
Some recent approaches in agricultural forecasting involve non-parametric methods like kernel-based models or Gaussian processes. These techniques can capture complex patterns without relying on specific assumptions about the underlying data distribution. Bayesian approaches to ARIMA modelling allow for more flexible parameter estimation and uncertainty quantification. They offer a probabilistic framework that accommodates varying degrees of prior information and enables better decision-making in uncertain agricultural environments.
Applications of ARIMA in agricultural engineering
The auto-regressive integrated moving average (ARIMA) models are extensively utilised for time series forecasting of different processes in agricultural engineering. The major applications of ARIMA models in agricultural engineering are as follows:
Prediction of agricultural yield
ARIMA models are used in predicting crop yields based on soil conditions, weather parameters and farming practices by analysing historical crop yield data. The prediction of agricultural yield is helpful for farmers, agricultural scientists and policymakers in making decisions regarding the management of crops and the allocation of resources.
It was found that the univariate time series approach using ARIMA was effective in the short-term prediction of yield.
Alam et al. (2018) used ARIMA-ANN hybrid approach for the long-term forecast of rice yield for Aligarh. The short-term forecast of rice yield by 2020 was generated using ARIMA (2,1,0), while the long-term forecast up to 2025 was obtained by incorporating the residuals acquired by 2013. The hybrid approach yielded a lower mean absolute percentage error (4.65%) compared to ARIMA alone (17.677%), indicating its superior performance.
Senthamarai kannan and Karuppasamy (2020) forecasted the paddy production from four south Indian states,
viz. Andhra Pradesh, Karnataka, Kerala and Tamil Nadu use the ARIMA (
Box and Jenkins, 1976) model. The paddy production trend for 10 years was predicted using ARIMA and the accuracy of the fitted data was checked using the BIC, RMSE, MAPE, MAE, MaxAPE and MaxAE. The available observed data showed a good agreement with the forecasted values.
Madlul et al. (2020) has predicted the crop yield of wheat in Iraq using the ARIMA model. The crop yield data from 1988 to 2018 had been used to predict the yield between 2019 and 2028. ARIMA (1,0,1) was found to be the best fit model for predicting the wheat yield. The results of the study suggested an increase in whaet production in the upcoming 10 years with a higher production rate than the study period.
Mahajan et al. (2020) applied the ARIMA model using historical data from 1950-51 to 2016-17 to forecast rice production in India. ARIMA (0,2,2) was found as the best fit model for this production. Their study demonstrated the effectiveness of the model based on diagnostic tests, projecting continued growth in rice output, with an estimate of 110.64 MT for 2020-21. The findings emphasize the importance of accurate forecasting for strategic planning in agriculture, exports and national food security.
Paswan et al. (2022) used an ARIMA-ANN model for the time series prediction of sugarcane production and successfully forecasted the sugarcane yield from 2020 to 2025 using ARIMA (1,1,0) according to Box-Jenkins Methodology.
Yasmin and Moniruzzaman (2024) predicted the area, production and yield of jute in Bangladesh for the period from 2023 to 2030. The historic time series data from 1970 to 2022 was utilised for the study. Best fit ARIMA models were selected based on the results of residual analysis, ACF and PACF plots and Bayesian Information Criterion (BIC). The study suggested ARIMA (2,0,3) for area, ARIMA (1,0,2) for production and ARIMA (1,0,3) for the jute yield as the best models. However, the production and yield are projected to decline from 84.58 lakh bales and 11.59 bales per hectare in 2022 to 74.34 lakh bales with a yield of 11.22 bales per hectare by 2030. These results would help the researchers and decision makers to suggest possible alternatives in increasing the production.
Hazarika and Phukon (2025) used the Box-Jenkins methodology of ARIMA model to identify a suitable time series model for forecasting sugarcane production in Assam. The ARIMA (1,2,1) model was found to be the most appropriate fit. In 2021-22, sugarcane was cultivated over 29,768 hectares in the state, yielding a total production of 11,60,025 tonnes. Based on the selected ARIMA model, the forecast suggests that sugarcane production may reach approximately 13,14,571 tonnes by 2030-31. Sugarcane holds significant economic importance in the state as a key cash crop contributing to agricultural income and rural livelihoods.
Weather forecasting
The forecasting of various weather parameters,
viz. rainfall, temperature, sunshine hours, humidity,
etc., is beneficial for determining agronomic events like plant growth and yield, pest attacks, fungal diseases,
etc. It is also beneficial in taking necessary precautions against the probable occurrence of extreme climatic phenomena and associated natural disasters. Moreover, planning planting and harvesting dates, irrigation systems and pest management strategies in agriculture all depend on accurate weather forecasts. Various data mining techniques like artificial neural networks (ANN), machine learning and ARIMA modelling are utilized by experts for forecasting weather parameters (
Abbot and Jennifer, 2017;
Shoba and Shobha, 2014). ARIMA models are employed to forecast various weather variables such as temperature
(Chen et al., 2018), precipitation (
Bora and Hazarika, 2023), wind speed (
Eymen and Köylü, 2019), evapotranspiration
(Manjunatha et al., 2023) and relative humidity
(Shad et al., 2022, Li et al., 2019). Shivhare et al. (2019) developed an ARIMA based weather forecasting tool by incorporating ARIMA algorithm in R. The study used the daily meteorological data of 65 years (1951-2015) from the Indian Meteorological Department for training (1951 to 1975), monitoring (1975 to 1995) and validating (1995 to 2015). ARIMA (2,0,2) and ARIMA (2,1,3) models were suggested for the prediction of rainfall and temperature data respectively for the next fifteen years. The research revealed that the prediction was accurate with a root means square error of 0.0948 and 0.085 for rainfall and temperature data respectively. It was also suggested that the obtained data can be further applied in the management of solar power stations, agriculture, natural resources and tourism.
Han et al. (2010) described the drought situation of Guanzhong plain by forecasting and simulating the time series data of vegetation temperature condition index (VTCI) drought index using ARIMA models. The study suggested that AR (1) models were fitting better with the historical data and found that the time series prediction was accurate.
Salmana and Kanigoro (2021) forecasted the visibility using ARIMA models. Visibility, referred as the distance at which an object or light can be visually distinguished, is a prominent parameter in all phases in a flight operation: take off, flight and landing. The visibility depends on theweather parameters
viz. relative humidity (RH), temperature (T) and dew point (Td) as given in equation 6.

This analysis using grid method exploited the ARIMA model for the variant value of parameters p, d, q and suggested that ARIMA model is best suited fo the prediction of weather parameters, with lowest mean squared error and coefficient of variation.
Dwivedi and Shrivastava (2022) conducted a study to determine the most suitable probability distributions for analysing the monthly and annual rainfall in Navsari. Their findings indicated that the Weibull distribution best represented rainfall patterns in June and September, while Gumbel and log-normal distributions were more appropriate for July and August. For annual rainfall, the Gumbel distribution provided the best fit. Additionally, trend analysis using the Mann-Kendall test revealed a significant decreasing trend in June rainfall, with increasing trends observed in July and August. The researchers identified Seasonal ARIMA (0,0,1) and (0,1,1) as the most effective models for forecasting monthly rainfall, based on their uncorrelated residuals and the lowest root mean square error among tested models. The study also calculated rainfall values at various probabilities and recurrence intervals, offering valuable insights for the design of water conservation and erosion control structures
Chandran et al. (2023) predicted the average annual precipitation for 2010, 2015, 2020 and 2025 using the best ARIMA models for ten sub-basin of the Vaigai River in Tamil Nadu using the annual rainfall data from 1976 to 2009. The predicted outcome and observed data up to 2020 compared favourably, demonstrating the suitability of the model.
Water management
Many researchers have analysed, modelled and forecasted the time series values of hydrological data using ARIMA models for determining the probable occurrence of extreme events and for adopting proper water management measures
(Yurekli et al., 2005; Modarres, 2007).
Viccione et al. (2019) used ARIMA models for forecasting water tank levels and suggested that the ARIMA model (2,0,2) produced the best forecast results during both the calibration phase and the validation phase. The comparison of the observed and projected water levels showed proximity with lesser error in prediction. The results might help the water managers in time series forecasting of water level, particularly when flow rate data are unavailable.
Du et al. (2020) predicted the daily water consumption using the ARIMA model and incorporated the Markov chain for error correction. It was found that longer forecast periods result in the accumulation of prediction errors. The incorporation of the Markov chain error correction method increased the ability of the monitoring points to anticipate future daily water consumption statistics with more accuracy.
Subha and Kowsigan (2023) have forecasted the water demand to assess customer water usage patterns, integrating IoT-based water distribution systems featuring smart water meters for future water demand projections and infrastructure planning. The study proposed that ARIMA delivered better results with minimal errors when compared to other forecasting algorithms.
Agaj et al. (2024) evaluated the ARIMA and Error Trend and Seasonality (ETS) models using the R software package to identify the most suitable forecasting model and extract critical insights for effective water resource management and flood risk mitigation. In this study, nine years of monthly time series data (2014-2021) on water levels from the Morava e Binçës river at the Vitia station were analyzed, with 2022 data reserved for validation. The performance of both models was assessed based on root mean square error (RMSE) and mean absolute error (MAE), confirming their reliability in forecasting. The predictive analysis successfully highlighted distinct periods of high and low water levels projected between 2022 and 2024, offering valuable information for managing flood-prone areas.
Pest and disease outbreak prediction
Predicting the possible occurrence and magnitude of pest and disease attack on plants aids in adopting proper precautionary and control measures, thereby helping sustainable production.
Chiu et al. (2019) constructed a model combining autoregressive integrated moving average (ARIMA) and ARIMA with exogenous variables (ARIMAX) for predicting the potential growth in whitefly (
Trialeurodes vaporariorum) population in greenhouses. The study utilised a wireless imaging device that employs an automated insect counting algorithm to track the number of flies caught in sticky paper traps installed inside greenhouses. With input data including the rise in whitefly counts, temperature and humidity, ARIMAX was determined to be the most accurate model for predicting the fly count and thereby forecasting the amount of pesticide needed.
Varsha et al. (2023) investigated the rice blast disease outbreaks in paddy. They forecasted its possible outcome using the ARIMA-BiLSTM (BiLateral Long Short-Term Memory) model for sustainable rice production and proposed that the hybrid technique can accurately predict blast disease outbreaks in paddy crops with a mean squared error of 0.037 and a mean absolute error of 0.028.
Narava et al. (2022) have developed a temporal ARIMA model that has made it simpler to forecast the dynamics of the adult population of Helicoverpa armigera, a pest on major Indian crops. The prediction of its population allowed for the timely implementation of control measures and the right scheduling of pesticide application decisions. The optimal ARIMA model was determined to be ARIMA (1,0,1), (1,0,2) with lower BIC, RMSE, MAPE, MAE and MASE values and a higher R
2 value (0.53).
Wang and Li (2025) predicted the pest and disease outbreak in sugarcane using 33 years of meteorological and pest incidence data using a hybrid ARIMA-LSTM model and compared the results with stand-alone ARIMA and LSTM. The hybrid ARIMA-LSTM model was proposed to improve prediction accuracy by effectively capturing both linear and nonlinear patterns, resulting in lower error metrics than individual models. The performance of the models was was assessed using MSE, RMSE and MAE. The ARIMA-LSTM hybrid model demonstrated superior accuracy, achieving an MSE of 2.66, RMSE of 1.63 and MAE of 1.34, outperforming the individual ARIMA (MSE = 4.97, RMSE = 2.29, MAE = 1.79) and LSTM (MSE = 3.77, RMSE = 1.86, MAE = 1.45) models.
Livestock production forecasting
Accurate prediction of national animal feed resources is crucial for planners and policymakers
(Suresh et al., 2012). Mgaya (2019) forecasted the consumption of livestock products such as eggs, milk, chicken and cow meat to estimate the potential of farmers to increase their use of animal feed. The ARIMA models were employed for forecasting the data using FAOSTAT. It was proposed that the increase in the consumption of all livestock products will likely induce an increasing demand for animal feed. As such, it was suggested for more research to analyze the aspects like population growth and changes in consumer behavior that may impact the use of cattle products.
Jaiswal and Bhattacharjee (2022) used the ARIMA model in predicting the export of swine meat from India. Six ARIMA models were applied to time-series data from 2003 to 2019 on annual swine meat exports. The most accurate model was then used to forecast export values for the next ten years. Out of the six models ARIMA (1,0,1) suggested the positive trend in annual swine meat export from India. The export of swine meat from India saw a 2.79% decline between 2018 and 2019, primarily due to a 12.03% drop in the pig population as per the previous census. However, exports are projected to follow an upward trend, with an estimated growth of 6.54 units from 2020 to 2029, a significant increase compared to just 1.45 units during 2010 to 2019.
Waiswa (2023) modelled and predicted the cattle milk and beef production using the ARIMA model and found that ARIMA (0,1,0) fits best for cattle milk production and ARIMA (1,1,0) was found better for beef production as these provided the least Normalized BIC and MAPE values among the selected models. Predictions indicate that milk production is projected to grow by an average of 1.63% annually over the next five years (2021-2025), whereas beef production increases by an average of 0.39% during the same period.
Soil health monitoring
The information about several soil parameters such as soil moisture, soil temperature
etc. are needed to assess and appropriately manage the irrigation requirements in order to achieve the least amount of water needed for irrigation (
Puerto, 2013). The time series prediction of the soil behaviour aids in anticipating the precise management of resources.
Khanna and Kaur (2022) forecasted the behaviour of the soil using ARIMA and LSTM and the values about various parameters were compared with those obtained with the assistance of numerous sensors. The results proposed that ARIMA modelling was superior because it followed linear relationships in most cases and obtained lower error rates.
Yan et al. (2019) had done a short-term forecast of the horizontal displacement of slope soil using the ARIMA (3,1,1) model and predicted the trend of future displacements, which further helps to assess the stability of the slope.
Mohurle and Gedam (2023) have investigated the behaviour of soil over time by considering the herbicides/pesticides, total dissolved salts (TDS), suspended matter, the number of emitters employed and the leaching factor. The effects of various parameters on the drip irrigation system performance and soil conditions were effectively forecasted using ARIMA.
Vaithiyanathan and Sudalaimuthu (2023) attempted to use the ARIMA model to forecast the time series data for the macro or micronutrients in soil for the years 2021–2032 using the input data of 2005-2020. ARIMA was found to be the appropriate choice for forecasting the condition of soil in non-stationary conditions.
Fayaz et al. (2022) utilised the time series forecasting of different soil and climatic parameters for predicting the possible occurrence of landslides in the study area using the ARIMA and IBM SPSS modelling tools. The forecasting of landslide events aided in comprehending their consequences and mitigating them by formulating effective countermeasures, thereby assisting hazard management.
Fu et al. (2023) predicted the soil moisture at different depths relying on the water balance equation using the seasonal ARIMA model incorporating auto.arima function, which calculates the d, p and q values automatically. This made the process estimation of d, p and q values facile, improving the model’s suitability to the data.
Wang et al. (2024) suggested a cluster-ARIMA model for predicting the soil respiration rates. The soil respiration rates are susceptible to variation under diurnal fluctuations and environmental factors, which may decrease the accuracy of prediction using ARIMA. The Cluster-ARIMA was proposed over ARIMA as it is effective and more accurate for the prediction of non-stationary time series data like soil respiration. The experimental findings revealed that the proposed Cluster-ARIMA model delivered an impressive prediction accuracy of 98.3% in predicting soil respiration.
Market analysis and price prediction
The Box-Jenkins model using the autoregressive integrated moving average model (ARIMA) can be used specifically to analyse the market volatility, primarily for agricultural commodities and its strength resides in its adaptability to any time series with any pattern of change.
Ansari and Ahmed (2001) did the time series forecasting of the export price of tea and found the model
viable for predicting the prices of agricultural products.
Chaudhry et al. (2017) fitted the ARIMA (1,0,1) model for predicting the price of south, north and all India tea auctions and estimated a notable fluctuation in the prices attributed to several factors, including the extreme weather conditions, attack of pests, variations in labour charges,
etc.
Jadhav et al. (2017) have predicted and validated the prices of paddy, ragi and maize in Karnataka in 2016 with the time series data inputs for the years 2002 to 2016. MSE, MAPE and Theils U coefficient criteria were used for checking the accuracy of the results and the predicted prices of the selected crops were nearly identical to the actual values.
Purohit et al. (2021) used various hybrid methods for forecasting the retail and wholesale prices of tomato, potato and onion. The results of the study revealed that the best monthly retail and wholesale price of onion was forecasted by the Additive-ARIMA-ANN method, whereas the best monthly retail price of tomato was forecasted by the proposed Multiplicative-ARIMA-SVM method.
Ramos et al. (2023) forecasted the monthly prices of certain agricultural products, including eggplant, tomato, whole chicken,
etc., using the ARIMA model and proposed that the model delivered an accurate estimate when used to forecast the prices of agricultural commodities. However, the study suggested that the results would have functioned better if the data series were stationary.
Ray et al. (2023) has predicted the volatile monthly price indices of pulses
viz. gram, moong and urad using a ARIMA-LSTM hybrid model based on random forest lag selection criterion. The Long Short-Term Memory (LSTM) deep learning techniques was found satisfactory especially for predicting market prices (
Hu et al., 2020).
Ray et al. (2023) compared the price forecasting accuracy of ARIMA, ARIMA-Generalised Autoregressive Conditional Generalised Autoregressive Conditional Heteroscedasticity (ARIMA-GARCH), LSTM and ARIMA-LSTM models based on the lowest RMSE, MAPE and MASE values. The proposed hybrid Random Forest-based ARIMA-LSTM model demonstrated clear superiority over traditional statistical methods, achieving improvements ranging from 8-25% in RMSE, 2-28% in MAPE and 2-29% in MASE. This model was effectively utilized to forecast the highly volatile monthly price indices of the pulses.
In conclusion, ARIMA models offer agricultural engineers a strong tool for time series analysis and forecasting. These models help in improving decision-making, resource management and overall agricultural sustainability by utilising historical data.
Limitations of ARIMA
Although ARIMA is a powerful tool for time series modelling and forecasting, it does have some limitations. Some of them are listed below:
Data limitations
Agricultural time series data has several limitations, including data quality issues, missing values and irregularities. Inaccuracies, errors and inconsistencies can occur due to manual recording, measurement errors, or changes in data collection methods. Missing values can be caused by equipment failure, gaps in data collection, or changes in reporting procedures. Missing data must be handled carefully to ensure that time series analyses and models remain accurate. In addition, irregularities, such as sudden spikes or drops that do not follow regular seasonal patterns, can occur because of unforeseen events such as extreme weather, diseases, or policy changes (
Hyndman and Athanasopoulos, 2018).
Model complexity and assumptions
ARIMA models are effective for capturing linear trends and seasonality, but may struggle with complex patterns in agricultural time series data. These patterns may not accurately represent non-linear relationships, interactions between multiple variables or irregular trends, limiting predictive capabilities. The stationarity assumption assumes that time series data remains constant over time, but external factors like climate change, technological advancements, or shifts in agricultural practices may cause non-stationary behaviour. ARIMA models are sensitive to outliers, which can impact performance due to extreme values in agricultural time series data. Distinguishing between outliers is crucial for maintaining model accuracy (
Géron, 2022).
Limitations in the integration of advanced techniques
The integration of ARIMA models with advanced machine learning algorithms, such as neural networks, can be challenging due to compatibility, interpretability and computational efficiency. Additionally, advanced models require extensive data preprocessing, feature engineering and hyperparameter tuning, especially for agricultural datasets with irregularities and missing values, which may require substantial effort.
Addressing these challenges requires a combination of strong data preprocessing, careful model selection and a thorough understanding of the agricultural domain. In agricultural time series analysis, researchers and practitioners should be aware of the limitations inherent in both the data and the modelling techniques and strive for a balance between model complexity and practical utility.