Cereals are the primary type of grain-based food used by humans for thousands of years. Wheat one of the world’s most important cereal crops, belongs to the Poaceae (grass) family
(Goesaert et al., 2005; Zhang et al., 2022). While consumption may vary across regions, wheat grain is the most important commodity for Europeans and the global population (
Shewry et al., 2009;
Williams, 2006).
Triticum durum and
T.
aestivum are the two most distinguished wheat species, with the latter commonly referred to as soft wheat, while
T.
durum is sometimes known as durum wheat. Soft wheat is cultivated across all temperate regions (
Shewry et al., 2009). The seed consists of bran, endosperm and germ, each with a distinct chemical composition. The endosperm contains high amounts of proteins and carbohydrates, while the germ provides vitamins, trace minerals and triglycerides. Although bran is rich in dietary fiber, it is commonly separated from the grain prior to consumption (Javid
Iqbal et al., 2022; Khalid et al., 2023). Typically, grains are milled to produce flour, which serves as a primary source of nutrients in many diets. However, alternative consumption practices that retain or reincorporate bran are increasingly explored for their health benefits
(Giraldo et al., 2019). Wheat flour primarily contains starch and gluten-forming proteins. Wheat stands out among cereals for its ability to produce bread and other dough products due to its gluten protein. The gluten proteins provide dough elasticity and gas retention (
Shewry et al., 2009;
Javid Iqbal et al., 2022).
Agriculture accounts for 58% of employment and 21% of the GDP. Arable land covers 24% of the country. Approximately 49% of agricultural land is used to cultivate feed crops rather than for food consumption (
Diku, 2011). Public concern over wheat flour safety is widespread (
Zhang et al., 2022).
Machine learning and NIR spectroscopy
Machine learning models, combined with big data technologies and high-performance computing, create new opportunities for data-intensive science in precision farming and sustainable agriculture. Machine learning (ML) main categories are 1) Supervised learning (SL), 2) Unsupervised learning (UL) and 3) Reinforcement learning (RL). SP algorithms use a training dataset of labeled data to infer a function that predicts new data. UL algorithms directly examine the data and learn patterns from it without requiring human supervision. ML is widely used to solve complex problems, often involving multiple factors. ML models have found applications in the multidisciplinary agri-technologies domain for crop management, furthermore have significantly enhanced the application of near-infrared (NIR) spectroscopy across various fields, including agriculture, medical diagnostics, pharmaceutical products, environmental monitoring and the assessment of food quality and origin.
Advanced NIR food spectral analysis techniques and traditional machine learning
Traditional ML methods are vital in near-infrared (NIR) spectral analysis, addressing multi-collinearity and enhancing generalization. NIR analysis involves pre-processing, feature selection and modeling. ML algorithms like principal component analysis (PCA), partial least squares (PLS), extreme learning machines (ELM), support vector machines (SVM), support vector regression (SVR), decision trees (DT) and random forests (RF). These methods extract significant features, reduce redundancy and create predictive models for NIR applications. Recent advances focus on improving pre-processing, wavelength selection and feature extraction
(Sanjeevannavar et al., 2023). Besides traditional ML methods, NIRS is increasingly applied to (1) detecting food contamination and chemical residuals in agricultural products; (2) fostering sustainable agriculture by monitoring crops and soil nutrients; (3) optimizing harvest times for maximum yield; and (4) assessing product quality for high-value items. However, ML in near-infrared reflectance spectroscopy (NIRS) is still in its early stages of development compared to other fields due to the difficulty in acquiring specialized data needed for analysis.
Traditional methods for assessing wheat quality-such as wet chemistry, Kjeldahl protein analysis, or sedimentation testing-are often labor-intensive, time-consuming, and destructive, limiting their scalability in field or industrial settings. In contrast, near-infrared (NIR) spectroscopy offers a rapid, non-destructive alternative capable of capturing complex chemical and physical traits through spectral signatures. When paired with machine learning algorithms, NIR data can be transformed into predictive models that identify subtle patterns and correlations, even in noisy or overlapping spectral regions. This integration enhances analytical speed, reduces operational costs, and enables real-time decision-making, making it particularly suitable for modern agri-food systems. Recent studies (
Du et al., 2022;
Wang et al., 2025) have demonstrated the effectiveness of NIR-ML pipelines in predicting wheat processing traits with high accuracy, further validating their applicability in both laboratory and field contexts.
Machine learning (ML) application in agricultural raw data
NIRS devices, chemometric techniques and computer technology have significantly enhanced methodologies (
Phuong et al., 2025). Machine learning (ML) models employ a scientific approach in which a machine is trained to learn without being explicitly programmed (
Samuel, 2000). It includes three main types: (1) supervised learning (SL), (2) unsupervised learning (UL) and (3) reinforcement learning (RL). In agricultural systems, applications of ML models can be classified into (a) crop management, (b) livestock management, (c) water management and (d) soil management. Within the broader agri-tech sector, machine learning (ML) models have been employed in crop management (61%), yield forecasting (20%) and disease detection (22%) (
Liakos et al., 2018). NIR is acknowledged as one of the most promising analytical techniques available today, valued for its non-invasive, rapid and non-destructive qualities, high throughput, simple sample preparation, chemical-free operation, portability and user-friendliness for non-specialists (
Abbaspour-Gilandeh et al., 2024). The potential for further advancement is substantial, driven by improvements in optics and recent progress in data science, artificial intelligence, machine learning and deep learning. This advancement enables simultaneous measurement of multiple constituents from a single spectrum.
Recent studies have shown that machine learning-assisted NIR spectroscopy enables rapid, non-destructive prediction of wheat processing traits, outperforming conventional wet chemistry in both efficiency and field applicability (
Wang et al., 2025). Compared to conventional analytical techniques, NIR spectroscopy offers faster, non-destructive assessment of grain quality traits, particularly when paired with machine learning models that enhance predictive accuracy and scalability (
Du et al., 2022).
NIR versatility in food and feed analysis
Near-infrared (NIR) spectroscopy is a versatile analytical technique that operates within the wavelength range of 780 nm to 2500 nm. It measures the absorption, emission, reflection and diffuse reflection of light, providing valuable insights into the molecular structure and composition of various substances
(Ozaki et al., 2017). NIR spectroscopy measures absorption bands resulting from overtones and combination excitations of molecular vibrations. These bands are typically smooth and broad, which necessitates high signal-to-noise ratios and stable instrumentation for accurate quantitative analysis
(Gao et al., 2021). NIR spectroscopy determines the nutritional content of agricultural products, including cereals, fruits, vegetables and animal feed, by quantifying protein, starch, oil and micronutrients
(Johnson et al., 2020). As a rapid, non-destructive technique, it monitors product quality, detects stored-grain insects and manages food logistics (
Abbaspour-Gilandeh et al., 2024). This method evaluates seed quality, including variety discrimination, germination rate, moisture content and vigor
(Qiu et al., 2005). It aids in food safety by enabling online analysis of proteins, polysaccharides and polyphenols, as well as the detection of adulteration
(Xu et al., 2025). NIR evaluates the quality of fruits and vegetables, including soluble solids, acidity, moisture, texture, ripeness and overall quality (
Sirisomboon, 2018). Challenges such as calibration and accuracy issues stem from environmental variability, while robust calibration models can enhance reliability
(Xu et al., 2019). Technological advancements, including cloud computing, the internet of things (IoT) and machine learning, are expected to enhance real-time monitoring and predictive modeling, thereby expanding the agricultural applications of near-infrared reflectance (NIR) spectroscopy
(Phuong et al., 2025). NIR offers advantages in crop quality assessment, including non-destructive analysis, minimal sample preparation, rapidity and high accuracy across various crop types, integrating with information and communication technology (
Sirisomboon, 2018). NIR enables the rapid evaluation of crop viability, moisture content and other indicators in real-time
(Qiu et al., 2005; Barbin et al. 2013;
Johnson et al., 2020). NIRS effectively determines the nutritional composition of cereal grains, including protein, carbohydrate and lipid content (
Kays, 2015). Recent studies have also explored the use of NIR spectroscopy for predicting mineral composition in fortified wheat flours, demonstrating its versatility beyond traditional macronutrient analysis (
Martínez-Martín et al., 2023).
Near-infrared spectroscopy applications in agriculture
Near-infrared (NIR) spectroscopy is a technique used to assess various physical characteristics and chemical compounds related to the quantity and quality of agricultural products (
Shewry et al., 2009). The accuracy of this methodology relies on the NIR model. Moreover, macroconstituents like starch and fat, along with essential micronutrients such as amino acids, dietary fiber and amylose, are commonly analyzed using near-infrared reflectance (NIR) spectroscopy. It is also used in quality control, primarily by measuring flour moisture during milling or dough processing (
Chadalavada et al., 2022).
Additionally, NIRS aids in identifying food contaminants,
e.
g., mycotoxins and ergot bodies (
Delwiche, 2021). NIR offers several advantages, including cost savings and faster assessment than traditional methods. It operates without prior sample preparation or chemical agents, providing significant benefits for quality control and process monitoring at an industrial scale (
Shewry et al., 2009). NIR spectroscopy’s spectral range provides reliable, consistent insights into food quality (
Ozaki et al., 2017). Its temporal range extends from the visible to the mid-infrared
(Schuster et al., 2023).
NIR analysis involves data acquisition, noise reduction, calibration and evaluation (
Cen and He, 2007). Analyzing data establishes the relationship between the unique features of the investigated sample and its transmittance or absorption values (
Zhang et al., 2022). The scope of NIR applications continues to expand, including mineral profiling in composite flours such as wheat-lentil blends (
Martínez-Martín et al., 2023). Recent studies in Indian agricultural contexts have explored the use of NIR spectroscopy for evaluating grain quality traits, highlighting its potential for rapid, non-destructive analysis in post-harvest systems
(Venkatesan et al., 2020).