Fruit diseases in agriculture cause the global industry a loss of around 40% of crops per year, with annual economic implications standing at $220 billion (
Hossain et al., 2024). For fruit farmers, disease can easily translate to 30-50% loss of income, increasing difficult circumstances regarding productivity and food security. With the anticipated population growth to 9.7 billion by 2050, there is a growing need for efficient, large-scale fruit disease detection.
Conventional methods of detection are based on visual inspection, which is slow, subjective and difficult to apply on a large-scale farm. Emerging alternatives, such as machine learning and computer vision algorithms, have shown great promise in enabling early detection and accurate classification of fruit diseases. These techniques not only offer improved speed and scalability but also pave the way for better agricultural practices and enhanced food security. Detecting and classifying fruit diseases is crucial to improve agricultural yield and quality factors that directly affect farmers’ income and food availability. Early diagnosis reduces damage and production costs while enhancing overall productivity and profitability.
The use of machine learning and deep learning algorithms in image processing tasks has demonstrated significant potential in improving the accuracy and efficiency of fruit disease detection systems. These approaches can overcome the inherent limitations of traditional, labor-intensive techniques. Automated systems can reduce dependency on human experts and ensure timely responses to disease outbreaks. When effectively deployed, such technologies can revolutionize agriculture by facilitating timely and precise diagnosis, thus improving disease management and informed decision-making.
However, existing fruit disease datasets come with notable limitations. Although some datasets are extensive, many lack sufficient variety in disease types, lighting conditions and environmental contexts. Additionally, many datasets are limited to specific fruit varieties or a narrow set of diseases, reducing the generalizability of models trained on them. For real-world deployment, there is a pressing need to continuously diversify and expand these datasets to reflect realistic agricultural conditions.
This survey paper provides a detailed overview of current advancements in fruit disease detection and classification. It highlights the contributions of various researchers who have applied innovative machine learning (ML) and deep learning (DL) approaches to this field. The paper addresses both the opportunities and challenges involved in integrating these technologies into real-world farming systems. It advocates for continuous progress in achieving higher accuracy at reduced costs and emphasizes the importance of future research in scaling these approaches for broader application.
The paper is organized into six major sections. Section 2 presents the Materials and Methodology, describing the datasets used and offering a general methodology flowchart. It also explains traditional fruit disease detection techniques, machine learning-based methods and more advanced deep learning approaches. Section 3 provides a Comparative Analysis of various techniques-comparing traditional, ML and DL methods in terms of accuracy, effectiveness and scalability. Section 4 outlines the key challenges and future scope in fruit disease detection, emphasizing areas for further research and improvement. Finally, Section 5 concludes the paper by summarizing key findings and contributions from this research study.
All the studies were conducted at the Research Center of Sreenidhi Institute of Science and Technology, Hyderabad, during the year 2025.
Datasets
The availability of high-quality datasets is essential for advancing research in fruit disease detection and classification. Numerous specialized datasets have been created to facilitate the training and testing of machine learning and deep learning models across various fruit types and disease categories.
A widely used resource is the PlantVillage dataset (
Hughes and Salathé, 2015), which contains over 54,000 images of both healthy and diseased plant leaves from 14 crop species, including fruits such as apples, grapes and tomatoes. This dataset has become a standard benchmark for evaluating algorithm performance in plant pathology, especially in fruit disease detection.
For apple disease detection, the Kaggle Plant Pathology 2020 dataset
(Thapa et al., 2020) holds particular importance. It comprises thousands of images grouped into four categories: healthy, multiple diseases, rust and scab. This dataset has played a key role in developing robust models for classifying apple diseases and tackling multi-class classification challenges.
Regarding citrus fruits, the Citrus Disease Image Gallery (
Gottwald and Irey, 2007) provides a detailed image collection illustrating various stages of disease development in citrus plants. It includes examples of common diseases such as citrus greening and black spot, making it a valuable asset for researchers focused on citrus disease identification.
The Strawberry Disease Dataset (
Barbedo, 2016) focuses on diseases impacting strawberries, including powdery mildew and leaf blight. This dataset has been applied to develop models that can accurately detect these specific diseases, showcasing the capability of machine learning techniques in addressing fruit-specific problems.
Similarly, the Banana Disease Dataset includes images of banana leaves (
Amara and Algergawy, 2017) infected with diseases like black sigatoka and banana bunchy top virus. Researchers have employed this dataset to train models for early disease detection and effective management, which is crucial for sustaining banana crop production.
For mango-related diseases, the Mango Leaf Dataset and Mango Disease Dataset (
Pujari and Byadgi, 2015) serve as valuable resources. These include images of mango leaves and fruits showing signs of diseases such as anthracnose, powdery mildew and sooty mold. These datasets have been used to develop detection and classification models crucial for protecting mango yields.
Collectively, these datasets span a wide variety of fruit types and diseases, providing a strong foundation for creating and evaluating fruit disease detection models. The continuous development, diversification and curation of such datasets are key to enhance the accuracy, robustness and scalability of machine learning applications in agriculture.
Fruit disease detection and classification techniques
Identifying fruit diseases is a critical aspect of agricultural management, as it directly influences both the quantity and quality of production. When diseases are identified early and correctly, timely interventions can reduce crop loss and maintain produce quality. Modern algorithms, image processing techniques and feature extraction support fruit disease detection, while machine learning (ML) and deep learning (DL) further enhance and automate the process.
This section outlines the entire process of detecting fruit diseases, from capturing raw images to generating treatment recommendations. As shown in the flowchart in Fig 1, the pipeline consists of several key stages expanded throughout the detection process.
It begins with the raw capture of fruit images using cameras or sensors. These images then undergo a noise reduction step to remove distortions and enhance quality. Image enhancement follows, adjusting parameters such as brightness and contrast to highlight critical features. Segmentation is then applied to distinguish diseased regions from healthy areas, helping to isolate affected zones for detailed analysis.
There are additional stages such as colour and texture analysis, where the system identifies major visual features like variations in colour and texture patterns that indicate disease presence. Shape and edge detection further help outline the diseased regions accurately. These extracted features are then fed into machine learning and deep learning models for automatic classification of diseases, ultimately leading to more accurate and scalable results. The detected features guide the system in identifying the specific type of disease affecting the fruit.
Finally, the system recommends suitable treatments or interventions to control the diagnosed diseases. These recommendations are vital for effective agricultural management. This structured methodology allows the system to deliver accurate disease diagnoses, improving response times in real-world crop health management scenarios.
Traditional techniques in fruit disease detection
Traditional fruit disease detection methods rely heavily on image processing, where segmentation and feature extraction are crucial. Various image processing techniques using colour, texture and shape features have been applied to detect diseases like apple scab, powdery mildew and anthracnose. Techniques such as K-means clustering are used for image segmentation, while artificial neural networks (ANN) are used for classification. Global and adaptive thresholding methods are widely employed to differentiate healthy from diseased regions based on pixel intensity. Edge detection techniques like the Canny edge detector and Sobel operator are also commonly used to highlight infected area boundaries
(Gaikwad et al., 2017; Goel and Pandey, 2022). Additionally, region-based techniques such as watershed transformation and region growing have proven effective in segmenting connected diseased areas
(Gaikwad et al., 2017; Doh et al., 2019).
Although conventional image processing methods like threshold-based classification have been successful in identifying diseases such as downy mildew, early detection remains critical for minimizing disease spread and crop losses (
Ayyub and Manjramkar, 2019). Feature extraction focusing on colour, texture and shape used to be a time-consuming process, with classifications typically done using metrics like area, perimeter and compactness
(Gaikwad et al., 2017). While helpful, these methods struggle to scale and maintain accuracy when applied to large datasets or complex disease characteristics.
Machine learning approaches
Machine learning (ML) techniques have significantly automated fruit disease detection, especially in segmentation, feature extraction and classification. Classical ML algorithms used for segmenting and classifying diseased regions based on colour and texture features include K-means clustering and decision trees (
Ayyub and Manjramkar, 2019;
Awate et al., 2015). Support Vector Machines (SVM) have also been applied to classify bacterial blight in pomegranates, highlighting the need for early intervention to prevent rapid disease spread (
Pawar and Jadhav, 2017). ML has shown effectiveness in distinguishing between healthy and diseased regions in various fruits
(Doh et al., 2019; Ayyub and Manjramkar, 2019). For example, diseases like blueberry rot and anthracnose leaf spot have been successfully detected using ML algorithms such as Random Forest (RF) and SVM, proving that early detection leads to improved crop health
(Sullca et al., 2019).
These models often use feature vectors derived from traditional image processing techniques. Recent work has integrated these vectors into models like decision trees and random forests, increasing automation levels in classification
(Awate et al., 2015). Combining feature extraction with ANN has also helped identify diseases like citrus canker and apple scab. These methods usually rely on colour and texture-based classification into healthy or diseased fruit
(Awate et al., 2015). While ML reduces the need for manual inspection, its dependency on manually extracted features can limit effectiveness across diverse datasets and environmental conditions.
Deep learning approaches
Deep learning (DL) has brought transformative advancements to fruit disease detection, making it more autonomous and scalable. Methods like Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM) and Generative Adversarial Networks (GAN) have been used to detect diseases such as black rot, early blight and late blight. These advances have notably improved early disease detection, a critical factor in disease management (
Goel and Pandey, 2022). CNN architectures like U-Net and Mask R-CNN are particularly effective at automatically segmenting diseased areas and providing detailed outlines of infection
(Sullca et al., 2019; Yang et al., 2020). These CNNs learn colour, texture and spatial features directly from images, which greatly improves detection accuracy (
Goel and Pandey, 2022;
Sullca et al., 2019; Yang et al., 2020). For instance, CNNs have been able to identify powdery mildew and gray mold in strawberries, showcasing their ability to detect complex disease patterns early
(Abbas et al., 2021).
Other DL models like VGGNet, ResNet and DenseNet have been used for disease classification, achieving high accuracy with large datasets
(Doh et al., 2019; Yang et al., 2020). Fine-tuning pre-trained models for specific datasets via transfer learning has also shown to significantly reduce training time while enhancing performance (
Goel and Pandey, 2022;
Yang et al., 2020). Moreover, integrated DL architec- tures that combine segmentation and classification into one framework enable end-to-end solutions, streamlining the detection process
(Doh et al., 2019; Pawar and Jadhav, 2017).
Newer self-supervised multi-network models have been proposed for diagnosing diseases like early and late blight in tomatoes, offering improved accuracy and robustness
(Yang et al., 2020). Real-time strawberry disease detection using CNNs is another promising application (
Lamb and Chuah, 2018;
Abbas et al., 2021). YOLOv8-based models have also been used to assess both ripeness and health of fruits, indicating a shift from traditional manual methods to more advanced automated systems
(Xiao et al., 2020; Kazi and Kutubuddin, 2023).
Recent studies have explored novel strategies for fruit disease detection and classification. For example, deep learning-based real-time fungal disease detection in strawberries shows how CNNs can help track disease spread in fields (
Lamb and Chuah, 2018). Combining image processing and CNN techniques has proven effective in identifying diseased strawberry leaves in real-time
(Abbas et al., 2021). Pattern recognition has also been used to automate disease identification for better agricultural management
(Ouyang et al., 2012). CNNs have demonstrated high precision in differentiating between healthy and diseased leaves even when the visual differences are minimal
(Xiao et al., 2020).
Hybrid approaches, such as YOLOv8+ with image processing, have been applied to detect strawberry ripeness and disease, offering holistic agricultural solutions that address both fruit quality and disease
(Wang et al., 2024). A general review of fruit disease detection techniques has confirmed the vital role these methods play in early diagnosis, disease control, increased yield and overall crop health (
Kazi and Kutubuddin, 2023).
Recent studies have demonstrated the growing role of deep learning and artificial intelligence in improving disease detection and diagnosis in agricultural crops. (
Pakruddin and Hemavathy, 2024) developed a deep learning-based model for pomegranate fruit disease detection and classification, achieving high accuracy through optimized image-based feature extraction techniques. Building on this, their subsequent work evaluated various deep transfer learning models for detecting bacterial blight disease in pomegranate fruits, highlighting the superior performance of fine-tuned architectures such as ResNet and InceptionV3 in classification accuracy (
Pakruddin and Hemavathy, 2025).
Similarly, (
Patil and More, 2025) conducted a comparative and optimization study of deep learning models for grape leaf disease identification, emphasizing the effectiveness of hyper parameter tuning and model selection in improving diagnostic precision. Complementing these works,
(Mehta et al., 2025) provided a comprehensive review on AI-based disease diagnosis in crops, discussing advancements in neural networks, image preprocessing and dataset augmentation that collectively enhance early disease detection and sustainable agricultural practices.
Comparative analysis of techniques
The field of fruit disease detection and classification has experienced major growth due to technological innovations. A variety of methods including conventional image processing, machine learning and deep learning have been explored in research studies. Table 1 provides a comparative overview, summarizing authors, datasets, implemented models and accuracy results achieved across different approaches.
Table 1 presents a comprehensive overview of multiple studies conducted in the field of fruit disease detection and classification, highlighting the evolution and diversity of approaches over time. It integrates both conventional image processing methods and advanced machine learning and deep learning models. Accuracy in disease identification has significantly improved, transitioning from early methods that relied on manually engineered features such as colour and texture, to more sophisticated techniques like convolutional neural networks (CNNs) and other deep architectures.
Some notable insights drawn from the table include the effectiveness of models such as CNNs and YOLOv8+, which have achieved accuracy levels above 96% in several cases, particularly for strawberry disease diagnosis. This reflects the increasing reliance on deep learning methods, which enhance classification by automating feature extraction. The progression in dataset diversification and model sophistication demonstrates continuous efforts to improve the resilience and applicability of fruit disease detection systems in real-world agricultural environments. Overall, these advancements indicate promising directions for future research and practical deployment.
Challenges and future directions
Developing effective fruit disease detection and classification models presents several challenges, including variability in disease symptoms, limitations in dataset quality and issues with model complexity. A major challenge lies in the fact that environmental conditions influence the visual manifestation of disease symptoms, making consistent and accurate detection more difficult
(Yang et al., 2020). Traditional models relying on handcrafted features like colour and texture often fail to capture the full variability of disease patterns, resulting in decreased performance. Moreover, the lack of high-quality, annotated datasets impairs model generalization across different fruit types and environmental conditions (
Pawar and Jadhav, 2017). Deep learning models are often constrained by high computational costs and intricate architectures, making them difficult to deploy in real-time agricultural scenarios (
Goel and Pandey, 2022). Additionally, models trained on localized data may not generalize well, increasing the risk of misclassification due to similarities in symptom appearance across different diseases (
Pawar and Jadhav, 2017).
Overfitting remains another concern, especially when models are trained on limited datasets, as seen in strawberry disease studies
(Abbas et al., 2021). Background interference and poor image quality further complicate detection, especially when symptoms are faint or resemble those of other diseases
(Sullca et al., 2019). Tasks that combine multiple objectives, such as fruit grading and disease identification, demand unique feature sets and architectures, thereby increasing system complexity (
Kazi and Kutubuddin, 2023).
Key challenges include
• Variability in environmental conditions alters symptom appearance, making detection more complex.
• Handcrafted features often fail to represent the full diversity of disease patterns, reducing model accuracy.
• Lack of high-quality annotated datasets limits model generalizability.
• High computational requirements and complex model architectures hinder real-time application.
• Region-specific training data often results in poor generalization and misclassification.
• Overfitting is common with small datasets.
• Image noise and low quality obscure subtle disease symptoms.
• Multi-tasking (
e.g., grading + detection) increases model complexity, requiring task-specific design.
To overcome these challenges, several strategies have been proposed. First, expanding datasets to include a wider variety of fruit diseases, environmental conditions and geographic regions will improve model robustness (
Pawar and Jadhav, 2017). Integrating CNNs and other DL models for automated feature extraction can reduce reliance on manual inputs while boosting detection accuracy
(Yang et al., 2020). Transfer learning can help minimize the need for large annotated datasets and reduce computational load (
Goel and Pandey, 2022). Lightweight and adaptive models should be developed for real-time deployment, capable of responding to dynamic agricultural environments. This can be enhanced by incorporating auxiliary data sources like weather and soil conditions
(Sullca et al., 2019). Explainable AI (XAI) is gaining interest as a way to make predictions more transparent and trustworthy in practical applications (
Goel and Pandey, 2022). Data fusion methods such as using multispectral or hyper spectral imaging offer further accuracy improvements
(Doh et al., 2019). Hybrid systems that combine traditional image processing and DL approaches show promise for addressing the limitations of existing techniques
(Awate et al., 2015).
Future directions include
• Expanding datasets with greater disease diversity, environmental variance and geographic range.
• Employing CNNs and similar models for automated, high-accuracy feature extraction.
• Using transfer learning to reduce reliance on large datasets and computational resources.
• Developing lightweight, real-time adaptive models that incorporate contextual data like weather and soil conditions.
• Promoting Explainable AI for greater model interpretability in field deployments.
• Applying data fusion (
e.g., multispectral/hyper spectral imaging) to increase detection precision.
• Exploring hybrid models that integrate image processing with DL to enhance performance and flexibility.
Overall, advancing fruit disease detection requires progress in DL algorithms, dataset development and real-time applicability. Emphasis should be placed on better data, optimized models and contextual integration to ensure accurate, reliable and scalable systems for agricultural use (
Lamb and Chuah, 2018;
Abbas et al., 2021; Xiao et al., 2020; Ouyang et al., 2012; Wang et al., 2024).