Data selection
The absence of easily available data is one of the biggest problems facing researchers today. Most research relies on data that has been obtained and kept secret by the author. The present study effort would not have been possible without the difficulty of data gathering. The information was gathered in September and October from the relevant government agencies and nearby farms. The image data was collected under the oversight of agricultural professionals and in compliance with accepted practices. They are nabbed in September and October when cotton leaf illnesses are at their peak. Both time and perspective were used to capture these images. JPEG and PNG formats were used to save the photographs. Additionally, several pictures came from a reliable government source
(Durmus et al., 2017). The following text describes the datasets in depth. Alternia leaf, Gray Mildew, Leaf Reddening and Healthy leaf are the four types of data included in this collection (Fig 2, Fig 3, Fig 4 and Fig 5).
Data analysis
To acquire data in a specified analytic format, an original picture must undergo a sequence of adjustments known as preprocessing. Cotton pictures may vary in brightness due to weather conditions, individual cameras, or other causes. It indicates an uneven distribution of intensity throughout the picture. Therefore, the picture will show the noise. Images of the same leaf taken by the same camera at various times of the year might seem quite different from one another. To ensure that every leaf seemed to have the same brightness, we used intensity normalization. Images are uniformly scaled to match the dimensionality restrictions of the models being used in practice. Then, the illness categories are labeled using the Label Encoder Python module
(Ashourloo et al., 2016).
Dataset selection, preprocessing and spitting
• Obtain a large dataset of labeled disease categories images of cotton leaves.
• Verify that the dataset includes representations of a range of conditions and variations in cotton leaf diseases.
• Clean and preprocess the dataset, addressing issues such as noise, outliers and inconsistencies.
• Perform image augmentation to increase the diversity of the dataset, ensuring better model generalization.
• Divide the dataset into training, validation and testing sets to train and evaluate the models effectively.
Software/Languages
Python is used for programming because of its wide library and prominence in machine learning.
Modules/Libraries
Machine Learning Libraries and Deep Learning Architectures
• Use Scikit-learn for implementing traditional machine learning models such as Support Vector Machines (SVM) and Random Forest.
• Implement Convolutional Neural Networks (CNN) using TensorFlow and Keras for deep learning.
• Implement and compare deep learning architectures like Inceptionv3, VGG16 and ResNet50 to evaluate their performance.
• Employ Scikit-learn metrics for evaluating model performance, including accuracy, precision, recall and F1-score.
Work flow chart of algorithm
The procedure follows for results evaluation is presented in Fig 6.
Data transformation
Before the classification models analyze the photos, they are converted into NumPy arrays to normalize the RGB values. One of the most typical issues with picture data is the presence of irregularities within the dataset. Some are the wrong size or shape, some are rectangular instead of square and so on. Overfitting occurs often because of the abundance of data in the training set. With the Image Data Generator pre-processing module in Keras, we can solve these issues by augmenting the training set data with synthetic pictures to boost the model’s classification accuracy. In addition to the enhancement, this study uses batch normalization and dropout layers on the CNN model to improve validation accuracy on the test set
(Dutta et al., 2014; Kalpana et al., 2023). In this research, we utilise the following settings from the Keras Image Data Generator class to enhance our data
•
Rotation range: Used to rotate the loaded image by the number of degrees specified
•
Width shift range: Shift the image down the horizontal axis, with common values ranging from 0 to 1.
•
Height shift range: Move the image up and down the vertical axis, with typical values ranging from 0 to 1.
•
Shear range: By fixing one axis and stretching the image angle according to the defined shear angle, the image is slanted.
•
Zoom range: Zooms in on specific portions of an image at random, allowing algorithms to better train on those highlighted features.
•
Horizontal flip: Used to horizontally flip an image. Very handy for data generalization.
•
Fill mode: Points in the image with null pixel values will be filled.
In this research, we apply the aforementioned image augmentation method to improve the generalization of the neural network’s performance on the training dataset over more conventional machine learning techniques, such as transfer training and convolutional neural network training. In this research, we apply the aforementioned image augmentation method to improve the generalization of the neural network’s performance on the training dataset over more conventional machine learning techniques, such as transfer learning and convolutional neural networks. For the convolutional neural network method, the only modification made to the test dataset is batch normalization
(Bhosale et al., 2023).
Integration of ML and TL techniques
Several models were developed and compared against one another to see which one was the most effective. Two distinct techniques were used to develop the models. The initial strategy was to build models from scratch, which entailed carrying out procedures like segmentation and feature extraction before using SVM, RF and training them using just research data. The second approach included developing deep learning models with numerous layers, such as Inception v3, VGG16 and ResNet, all of which were descended from CNN. These leverage pre-trained weights from an image net dataset and use transfer learning methods (
Jain and Jaidka, 2023). To analyze the data, just the last few layers were retrained. In addition, several experiments were done to fine-tune the parameters of the models that were adopted to find the most effective model for detecting and labeling cotton illnesses examination (
Shreelakshmi and Raju, 2023).
Process and evaluation method
Since this study is concerned with classifying data, accuracy is one of the metrics used to assess the effectiveness of the model. Comparisons are made between models developed from scratch and those developed via transfer learning utilizing F-1 score and precision, recall scores. What each of these indicators entails for the present study is briefly discussed below. A true positive (TP) occurs when the value of the observed event matches the value predicted by the model. A false positive (FP) occurs when the observed value of an event contradicts the negative prediction. A true negative (TN) is defined as an occurrence for which both the observed and expected values are negative. The observed event value is negative, while the anticipated value is positive; this is a false negative (FN) (
Ingole and Padole, 2023). The tuples that were successfully labeled as positive by the classifier are called true positives (TP), while the corresponding tuples that were correctly labeled as negative are called true negatives (TN). Incorrectly classified negative tuples are known as false positives (FP). In a similar vein, false negatives (FN) refer to mislabeled positive tuples. The method used to get the accuracy ratings in Table 1 may be seen above.
Random forest
Among supervised machine learning methods, random forest stands out. It is a popular method because it is effective, easy to use and flexible. Its nonlinear character and the fact that it can do classification and regression tasks make it very adaptable to many kinds of data and situations. Named a “forest” because of the abundance of “decision trees” it includes. The information from various trees is then blended to provide the most accurate forecasts. The forest guarantees a more precise answer than a single decision tree since it considers a greater number of groups and choices. Furthermore, it introduces uncertainty into the model by picking the optimal feature from a pool of features chosen at random. Overall, these benefits lead to a model with a great deal of variation (
Asha, 2023).
Support vector machine
It is an application of the supervised learning strategy, which may be used for both regression and classification issues. The primary goal of this technique is to locate a hyperplane that effectively partitions the characteristics of the various classes. The method aims to maximize the distance between the data points and the hyperplane, or support vectors, by selecting the optimum line. It employs soft margins and kernel methods, with polynomial and RBF kernels being the most common, to improve outcomes if the classes are not separated by linearity, as is the case with maize disease classification
(Kanaga et al., 2022).
Transfer learning
By initially training a neural network model on a subject that is analogous to the situation at hand, transfer learning is a deep learning approach. A freshly trained model incorporates some or all of the learned model’s layers. Transfer learning helps since it shortens the time it takes to train a model and decreases the amount of computing power needed because the network is already trained. By adding a few additional thick layers to the previously trained network, it may learn to recognize photos from a different dataset. Utilizing previously learned features and weights that have been trained by previously trained models, the VGG16, ResNet50 and Inception-v3 transfer learning models are applied to a cotton plant leaf image dataset
(Malunao et al., 2022).
VGG16
Convolutional neural networks (CNNs) like the VGG16 are widely used but also quite simple (Fig 7). There are a total of 16 weighted layers in this network. Only 3×3 convolutional layers are used in this deep learning architecture, with the number of filters increasing with layer depth. In addition to the 5 layers and the softmax classifier on the two fully connected layers containing 4096 neurons, max pooling is utilized to minimize the volume size. Therefore, in this research, we use VGG architectures for image identification in the diagnosis of leaf diseases in plants (
Wang, 2022).
ResNet50
ResNet was described as a game-changing strategy for creating deep neural network models by stacking a huge number of residual blocks without increasing the number of parameters or the degree of complexity of the computation. To train the ResNet-50 (residual neural network), a variation of the ResNet with 50 deep layers, at least one million photos from the ImageNet collection were employed. Most ResNet models bypass layers that have nonlinearities since they utilize ReLU and batch normalization. To avoid using weights, the well-known HighwayNet model employs a second weight matrix. The architecture that ResNet-50 adopts is known as convolutional block sequencing with average pooling. As the last stage of classification, Softmax is used. There are a total of five convolutional layers in ResNet-50; they are labeled conv1, conv2 x, conv3 x, conv4 x and conv5 x. In the first stage of processing (conv1 layer), the input picture is passed through a convolutional layer with 64 filters and a 7 × 7 kernel size. A max-pooling layer (conv2 layer) with a stride length of 2 then processes it and lastly, a pooling layer (conv3 layer) with the same stride length completes the chain. Due to the interconnected nature of residual network topologies, the conv2x method pairs off the layers. layers with kernel sizes of 3 × 3 and 256 filters, respectively; a further layer with kernel sizes of 3 ´ 3 and 64 filters, repeated three times; and a final layer with kernel sizes of 3 × 3 and 256 filters, respectively. These layers correlate to the layers that are positioned between the pool with the first layer
(Kanaga et al., 2022).
Inception-v3
The Inception-v3 model is the newest generation of the Inception microarchitecture, occasionally referred to as GoogLeNet or Going Deeper with Convolutions (Fig 8). The Inception module’s purpose is to lower the model’s computational cost by breaking down big filter dimensions into smaller convolutions and conducting aggressive regularization via labeling smoothing. Fig 8 shows that the computational cost of a 5×5 convolutions filter is 25/9=2.78 times that of a layer of 3×3 convolution, hence decreasing the number of parameters by 28% may be achieved by utilizing two layers of 3×3 filters (3*3+3*3=18)
(Malunao et al., 2022).