Overall classification performance and confusion matrix of the alfalfa genetic materials and salt ion types
In the classifications made according to salt ion type, the model performance decreased slightly compared to the alfalfa genetic materials. However, high accuracy rates have generally been observed (Table 1).
Tree Ensemble algorithm achieved peak accuracy of 99.60%, followed by Random Forest 99.50%, XGBoost 99.50%, GBT 99.00% and DT 97.80% for the classification of alfalfa genetic materials (Table 1). Tree Ensemble again outperformed with accuracy of 92.20%, followed by XGBoost (91.20%), DT (90.90%) and Random Forest (90.60%), in terms of salt ion type classification. KNN showed the lowest accuracy of 85.60%, aligning with MLP with accuracy of 83.10%, consistent with
Pazoki and Pazoki (2011), who reported 86.48% accuracy for wheat cultivar classification (
e.g., NaCl vs. KCl), demonstrating robustness to stress variability under treatments. This study’s approach, which was based on physiological responses, outperformed
Ghamari (2012), who used morphological responses in chickpeas (79.00% accuracy). However,
Ajaz and Hussain (2015) achieved a higher accuracy of 95.20% for wheat classification using MLP, likely because of simpler class distinctions. Tree-based ensembles consistently performed well, emphasizing their suitability for complex biological data involving interacting stressors (salt, temperature and genetics). The dominance of Tree Ensemble also aligns with its hybrid architecture that combines bagging (Random Forest) and boosting (XGBoost) to mitigate overfitting while capturing nonlinear interactions (
Brownlee, 2016;
Chen and Guestrin, 2016). In addition, the results show that these methods can successfully learn complex structures in biological data and improve classification accuracy.
The models clearly captured class-specific patterns, reflecting their capacity to generalize over stress variability, even if salinity stress-induced responses overlap with physiological responses. Near-perfect genetic material classification likely originates from fixed genomic differences, whereas salt ion confusion (
e.g., NaCl vs. KCl) reflects shared ion-specific toxicity pathways. This emphasizes the strength and flexibility of the ML algorithms used in difficult biological classification problems. Along with overall accuracy, the model performance at the class level was assessed using a confusion matrix. The confusion matrices for alfalfa genetic material and salt ion type classifications are shown in Fig 2.
Average confusion matrix obtained from the results of the ten-fold cross-validation was presented (Fig 2). The intraclass accuracy exceeded 98.9% for alfalfa genetic material (CUF-101, 99.50%; Gen3, 98.90%; Gen9, 99.40%; Gen16, 99.70%), with interclass confusion <0.60% (Fig 2). Misclassifications occurred primarily between Gen3-Gen9, reflecting their shared genetic background. CaCl
2, KCl and NaCl, which were the salt ion types in this study, showed high but differential accuracies (89.30%, 91.90% and 90.70%, respectively). This indicates that the model can classify between alfalfa genetic materials with higher success rates than salt ion types. Cross-prediction errors occurred between CaCl
2-KCl (5.50%) and CaCl
2-NaCl (6.20%), indicating overlapping physiological responses in alfalfa genetic material. Shared osmotic stress pathways across chloride salts (
Baha, 2022), in which NaCl/KCl induced near-identical reductions in GP and MGD. CaCl
2 and NaCl also caused statistically similar RL and DW responses in alfalfa (
Gao et al., 2023). These findings show that physiological processes whereby moderate salinity levels from various salt ion types may induce similar osmotic stress or ion toxicity effects (
Quan et al., 2021), producing similar patterns in RL, DW and FW.
The models demonstrated a high capacity for classification of alfalfa genetic materials and salt ion types, with correct classification rates exceeding 90.00%. These findings indicate that the developed models exhibit balanced and reliable performance in terms of overall accuracy and class level. For alfalfa genetic material classification; Tree Ensemble, Random Forest and XGBoost algorithms achieved accuracy rates exceeding 99.00% with error rates below 0.50%. These algorithms yielded balanced results at the class level, with correct classification rates for each alfalfa genetic material exceeding 98.00%. This suggests that the alfalfa genetic material was physiologically more influential to be classified. Higher accuracy values in alfalfa genetic material classification (Table 1) showed that the models effectively learned patterns associated with alfalfa genetic material-specific responses, including weight values, germination features and growth dynamics. These inter genotypic variations provide models with robust decision boundaries that facilitate high-performance classifications. However, the accuracy rates decreased for salt ion type classification. Different salt ion types can induce partially overlapping physiological responses in plants. However, Tree Ensemble, XGBoost and Random Forest algorithms achieved satisfactory classification, with an accuracy of over 90% (Table 1). These results indicate that salinity-induced physiological responses have a learnable structure.
Class-wise evaluation of classification algorithms for alfalfa genetic materials and salt ion types
For each class, the performance of several ML algorithms was tested using alfalfa genetic materials and salt ion types. Tree-based ensemble algorithms, such as Random Forest, XGBoost and GBT, often have high accuracy, showing that they are good at handling complex data, as shown in Table 2. The models accurately classified alfalfa genetic material, showing differences in their genetic responses. However, they did not perform as well with salt ion types because of the similar stress reactions of different salt ions. However, the models were fairly accurate, suggesting that some responses to salt stress assist in the classification. Tables 2 and 3 provide a detailed look at how each algorithm classifies between the alfalfa genetic materials and salt ion types. Variations in genetic material are easier to classify because they show clear patterns. However, it is difficult to separate the effects of different salt ions. Overall, the models performed well and were balanced across classes. The algorithms were evaluated using overall accuracy and detailed metrics, such as recall, precision, specificity, F1-score and sensitivity for each class, as described by
Koklu and Ozkan (2020),
Kautz et al. (2017).
The highest performance in alfalfa genetic material classification was obtained with Tree Ensemble, XGBoost and Random Forest algorithms (Table 2). In particular, Tree Ensemble algorithm achieved recall, precision and F1-score values greater than 99.00% for all alfalfa genetic materials. Similarly, the XGBoost and Random Forest algorithms also achieved high and balanced class success in alfalfa genetic materials, especially in Gen9 and CUF-101, with F1-score values above 0.995. Algorithms such as GBT and DT yielded similar results, whereas MLP and KNN were separated by their lower-class performance values, as in
Pazoki and Pazoki (2011). When evaluated at the class level, most of the models reached precision, recall and F1-score values above 99.00% in alfalfa genetic material classification, whereas these values remained in the range of 89.00-92.00% in salt ion type classification.
Table 3 shows the performances of the algorithms at the class level for different salt ion types (CaCl
2, KCl and NaCl). Tree Ensemble showed the most powerful performance here as well, achieving high class success rates with recall = 0.937 and F1-score = 0.929 for NaCl. In particular, the KCl was classified with very high accuracy by both XGBoost and Tree Ensemble (recall > 0.93). The class-level performance was slightly lower for salt ion types than for alfalfa genetic material classification. This variation could be explained by the physiological responses of different salt ion types, which often overlap and are less distinct than the more consistent and marked variations observed between alfalfa genetic materials. Supporting the results of this study,
Gao et al. (2023) and
Baha (2022) reported near-identical germination responses GP, MGD and RL, FW under the effects CaCl
2 -NaCl and NaCl-KCl, in the same order. The higher misclassification between NaCl and KCl reflects shared Cl- toxicity pathways and osmotic effects, which dominate physiological responses at moderate salinity.
Feature importance analysis considering ten-fold cross-validation via normalized information gain revealed context-dependent predictive contributions (Fig 3). In alfalfa genetic material classification, DW showed maximal influence (29.20%), followed by GE (24.60%), GP (18.00%), SV (16.90%) and FW (16.50%), whereas GI, RL and PL showed minimal influence (<5.00% combined). Features linked to physiological responses during germination, such as DW and GE, are more important in alfalfa genetic material classification.
Gao et al. (2023) stated that genetic variations showed stronger physiological responses with these features at the germination stage.
Okumuş et al. (2024) noted that feature importance of forage pea cultivars’ germination features, such as DW, FW and RL, has been more noticeable than salt doses and controlled temperatures on some ML algorithms. DW, which was mentioned as total dry mass in their study, also one of most important features for selection on soybean genotypes under salinity and drought conditions by study of
de Oliveira et al. (2023). The consistency in the importance of DW across different legume species under stress conditions suggests its potential as a reliable indicator of stress tolerance. This finding could have significant implications for breeding programs aimed at developing cultivars that are more resilient. Germination features such as GI, MGD, RL and PL show lower influence in the classification of alfalfa genetic materials.
In salt ion type classification, the influences of features via normalized information gain order differed. PL (13.20%) and RL (11.70%) showed maximal influences on salt ion types, followed by FW (11.00%) and DW (10.40%). These features reflect physiological adaptations and stress reactions caused by different salt ions. Though DW was not as dominant as in alfalfa genetic material classification, it remained active in model decisions. In the classification of salt ion types, focusing on length measurements during germination is more influential. These changes in feature importance offer new insights into model behavior and plant responses, showing which features are more important under different stress conditions (
Molnar, 2020). The learning process prioritizes important features based on context, supporting the biological understanding of classification results.
Heat map analysis of classification alfalfa genetic materials under salt ion types and controlled temperatures
The goal of these analyses was to find out which synthetic genotypes are better or worse at handling certain types of stressful conditions from a classification point of view and to see how to stress-specific responses affect the model. Two separate heat map analyses were done to see how stress factors affect the classification performance of alfalfa genetic materials (Fig 4 and 5).
Heat map analysis of the model outputs revealed that alfalfa genetic material classification performance varied according to the salt ion type applied. Gen16, especially under the CaCl
2 treatment, reached the highest number of correct classifications with 479 samples and was the synthetic genotype that could be separated most clearly by the model under these conditions. In general, alfalfa genetic material had a high level of classification performance under all three salt ion types.
Alfalfa genetic material classification performance under three different temperature conditions (Tmin, Tmid and Tmax) was comparatively analyzed and the reflections on model stability were presented (Fig 5). Gen16 achieved the highest classification accuracy with 487 samples in the Tmid condition. This indicates that the Tmid is the condition under which Gen16 is the most clearly classified synthetic genotype by the model. However, the classification accuracy decreased to 410 samples in the Tmin condition, suggesting that low temperatures both weaken biological responses and limit the model’ classification ability. Gen3 provided the best accuracy with 479 samples under Tmax, indicating that the model could successfully classify this synthetic genotype, even under high-temperature conditions. When both stress conditions were evaluated together, it was revealed that the classification performance varied not only depending on the alfalfa genetic material but also on the type of stressful conditions to which the alfalfa genetic material was exposed and some synthetic genotypes could be classified more clearly under certain conditions. This finding is consistent with studies showing that alfalfa genetic material responds to environmental stress in specific ways and that temperature tolerance varies depending on alfalfa genetic material (
Parent and Tardieu, 2012;
Basbag et al., 2017).
Decision trees based on alfalfa genetic materials and salt ion types
Decision trees enable the interpretation and intuitive evaluation of model decisions by visualizing features in the classification model (
Blockeel et al., 2023;
de Oliveira et al., 2023;
Zhao et al., 2024). The decision tree structure shows how the model proceeds in alfalfa genetic material classification and which features are prioritized (Fig 6 and 7). The root node is DW, indicating that this response is the primary criterion for separating alfalfa genetic material. After the first division based on DW, the model was directed toward features such as FW, GE, GI, MGD and SV at the second level. This indicates that weight features play a significant role in the classification of alfalfa genetic material. Gen3 and Gen16 synthetic genotypes were separated through different sub-pathways; GE and GI were used in some branches, whereas FW and SV were used in other branches. At extreme nodes, the classification accuracy was quite high (close to 100%) and was achieved with few decision rules.
Decision structure of the tree-based model was presented for salt ion type classification (CaCl
2, NaCl and KCl) (Fig 7). The PL at the root node indicated this feature was the first decision criterion for salt ion type classification. After the initial PL split, the model focused on features like FW and MGD at the second level. This structure shows that physiological responses to salt stress are important for classification ability. In the decision path on the lower left branch, it was observed that the samples belonging to the KCl were classified with high accuracy. This suggests that more complex decision paths are required for the CaCl
2 and that the model creates deeper structures to separate this class.