In this part, the results obtained from VGGNet19 model are presented in detail. The training and validation metrics were evaluated for 100 epochs and shown in Fig 2. The VGGNet19 model shows improvement over the 100 epochs, with both training and validation losses steadily decreasing. The training loss starts at 1.8844 and drops to 0.0011 by the end, indicating effective learning on the training data. Similarly, the validation loss decreases but stabilizes around 0.1–0.2 after epoch 60, suggesting the model is not generalizing as well to unseen data. Training accuracy steadily increases from 0.3776 to 0.9964, reflecting better performance on the training set. However, validation accuracy plateaus around 0.93–0.95, indicating that the model’s ability to generalize has limitations. The gap between training and validation accuracy points to potential overfitting, where the model fits the training data well but struggles to generalize to the validation set. Despite these fluctuations, the overall trends suggest that with further adjustments, such as regularization, the model’s generalization could improve.
The confusion matrix provides useful insights into the model’s performance (Fig 3). It shows both true positives (diagonal values) and misclassifications (off-diagonal values). The model performs well for most diseases. For example, Healthy is correctly classified 24 out of 25 times with only one misclassification to Gill disease. Gill disease is correctly identified in 26 out of 28 instances with two misclassifications. Red disease is correctly classified 25 times, but there are some misclassifications for Aeromoniasis and Saprolegniasis. Aeromoniasis is correctly classified 23 times with minor errors to Saprolegniasis and Healthy.
Some diseases are more challenging. White spot disease has three misclassifications, while Parasitic disease is confused with Red disease and Gill disease. These errors likely occur due to similarities in symptoms. The diagonal values are much higher than the off-diagonal values, showing that the model is generally accurate. However, misclassifications still occur, particularly between diseases with overlapping symptoms. The heatmap presented in Fig 3 highlights the model’s strengths and areas that need improvement.
The model performance outputs are summarized in Table 2. The VGGNet19 model performs well across most disease categories. It achieves high precision and recall for Healthy, with an F1 score of 0.9410. Aeromoniasis has the highest precision at 0.9580, but its recall is 0.8850, meaning some cases are missed. Gill disease has good recall at 0.9290, but its precision is 0.8670, leading to some false positives. Saprolegniasis performs well with an F1 score of 0.8800, but its precision is slightly lower at 0.8460. Parasitic disease has a balanced precision of 0.9570 and recall of 0.9170, with an F1 score of 0.9360. White spot disease shows strong precision (0.9500), but its recall is lower at 0.8640, resulting in missed cases. The overall accuracy of the model is 90.96%. The macro averages for precision, recall and F1 score are around 0.91. The weighted averages are slightly higher due to class distribution. Overall, the model performs well, but misclassifications still occur. Fine-tuning could improve accuracy, especially for categories like White spot disease.
The ROC AUC scores show different levels of performance across disease categories (Fig 4). The model performs well with Red disease, achieving an AUC of 0.86. Aeromoniasis has an AUC of 0.89, indicating strong classification. Gill disease and Healthy both have excellent AUC scores of 0.91, showing high accuracy. Saprolegniasis has a lower AUC of 0.71, indicating difficulty in distinguishing it from other diseases. Parasitic disease performs reasonably well with an AUC of 0.82. White spot disease has the lowest AUC at 0.67, suggesting the model struggles with this class. Overall, the model is effective for most diseases but needs improvement for Saprolegniasis and White spot disease.
The results obtained from the VGGNet19 model indicate strong classification performance for fish disease detection, with an overall accuracy of 90.96%. The model effectively learns from training data, as shown by the decreasing loss and increasing accuracy over 100 epochs. However, the observed gap between training and validation accuracy suggests potential overfitting. While the training accuracy reaches 99.64%, the validation accuracy stabilizes around 93-95%, indicating that the model struggles with generalization to unseen data. This highlights the need for additional regularization techniques, such as dropout layers or data augmentation, to enhance robustness.
The confusion matrix analysis reveals that most disease classes are correctly classified with high precision and recall. Notably, Healthy samples are accurately identified 96% of the time and Aeromoniasis achieves the highest precision at 95.8%. However, certain disease classes, such as White spot disease and Saprolegniasis, present challenges, as evidenced by their relatively lower recall values of 86.4% and 71%, respectively. These misclassifications likely arise due to visual similarities between disease symptoms, suggesting the need for more discriminative features or the inclusion of additional training samples for these classes.
Despite the overall strong classification performance, some diseases are misclassified more frequently than others. White spot disease, for instance, is often confused with Parasitic disease and Gill disease. This may be due to overlapping visual symptoms in infected fish. Similarly, Saprolegniasis exhibits a lower AUC score of 0.71, indicating difficulty in distinguishing it from other infections. The model’s limitations in these areas suggest that additional feature extraction techniques, such as attention mechanisms or ensemble learning, could help improve classification accuracy for the more challenging disease categories.
The ROC-AUC analysis further supports these findings, showing that most diseases have AUC scores above 0.80, indicating strong classification ability. However, White spot disease has the lowest AUC (0.67), demonstrating the need for improvement in detecting this particular condition. The imbalance in disease recognition performance suggests that additional data preprocessing, such as synthetic data generation or class-balancing techniques, could help mitigate these issues.
Future improvements to the model could involve hyperparameter tuning, employing more advanced architectures such as transformer-based models and leveraging domain-specific augmentation techniques to enhance generalization. Additionally, integrating multi-modal data, such as water quality parameters and behavioral analysis, may further refine disease classification accuracy.
The comparison of results obtained in the presented study highlights the importance of the proposed model.
Maruf et al. (2024) focused on classifying freshwater fish diseases in Bangladesh using deep learning techniques. The authors proposed two ensemble models: a baseline Averaged Ensemble (AE) model and a novel Performance Metric-Infused Weighted Ensemble (PMIWE) model. The study achieved a testing accuracy of 97.53%, with precision, recall and F1-score all at 97%. To enhance the inter-pretability and trustworthiness of the model, the authors employed Grad-CAM (Gradient-weighted Class Activation Mapping), an explainable artificial intelligence (XAI) technique.
Azhar et al. (2024) focused on detecting Protozoan white spot disease, caused by Cryptocaryon irritans in saltwater fish, using an intelligent system based on a Convolutional Neural Network (CNN). The study used GoogleNet, a deep learning algorithm, to identify infected fish from raw underwater images. The model achieved an accuracy of 90%, offering a promising solution for early screening of this contagious disease.
Mia et al. (2022) aimed to improve fish disease recognition to help remote farmers manage their farms. They developed expert systems using smartphone images for disease identification. A segmentation algorithm is selected to distinguish healthy and diseased areas. In this work, eight classification algorithms are tested and achieved a remarkable accuracy of 88.87% with the Random Forest algorithm.
Ahmed et al. (2021) conducted a study on fish disease detection using image-based ML techniques in aquaculture. The research had two main phases. The first phase involved image pre-processing and segmentation to reduce noise and enhance image quality. The second phase focused on feature extraction for disease classification using a Support Vector Machine (SVM) algorithm with a kernel function. The processed images were tested on a salmon fish dataset, both with and without image augmentation. The SVM model achieved accuracies of 91.42% with augmentation and 94.12% without augmentation, demonstrating its effectiveness in fish disease detection.
In summary, the VGGNet19 model demonstrates high accuracy in fish disease detection, with promising classification performance for most disease categories. However, challenges remain in classifying visually similar diseases, particularly White spot disease and Saprolegniasis. Addressing these limitations through advanced model optimization techniques could further enhance disease detection capabilities and improve practical applicability in aquaculture health monitoring systems.