An extensive review of recent studies was carried out in order to recognize relevant methods and benchmark models for crop disease detection.
Mehta et al., (2025) provided an extensive review on plant disease recognition and categorization. Table 1 provides a summary of selected literature on different crops, datasets and types of machine/deep learning methods for disease detection. The table provides the details on the models employed, accuracies obtained and the significant findings, as well as the future directions suggested by individual authors.
The proposed model that we have named as CropNet system is a deep learning based framework for the detection of plant leaf diseases and generally, it works in some main stages (as illustrated in Fig 1), such as data collection; data processing; data augmentation; data splitting CropNet model building and performance analysis.
Data collection
The soybean was chosen as the test crop. Acquisition of an appropriate training set for the soybean leaf disease images is essential and it depends greatly on occurrences and severity of diseases, which are a function influenced by environmental factors, such as location. The dataset was gathered during multiple growing seasons with the objective of collecting diseased leaf images. This took approximately three consecutive years from 2022. The images were captured in the south-west part of Maharashtra state of India (Fig 2) from an EOS 1500D Kit camera with a resolution of 24.1 megapixels. High-resolution images are 1990X1280X3 and preserve the fine-grained details that are crucial to disease recognition.
Data preprocessing and augmentation
The Preprocessing is critical to improve image quality and robust training on Convolutional Neural Networks (CNNs). This includes scaling, normalization and noise filtering, which enhance accuracy and speed up the convergence. The normalization typically used is given in equation (1).

Where,
I= The input image,
μ= Mean.
σ= Standard deviation of the dataset.
Data augmentation artificially increases the size of datasets by applying transformations, such as rotation, flipping cropping, to the original images. This helps CNN produce better generalization by feeding in a lot of different versions of the same image, thus reducing its overfitting. Schemes such as random zoom, brightness and affine transformations help the model to be more robust. Equation (2) represents an affine transformation.

Where,
(A, B)= The original coordinates,
(A′, B′)= The transformed coordinates.
The matrix

= Scaling and rotation.
[t
A, t
B]
T= Translation.
Preprocessing and augmentation make the data sufficiently varied. They are also used for the normalization of CNNs with respect to changes in lighting, scale and orientation of the images.
Data splitting
Data separation is crucial for training a robust CNN model with good generalization capability to unseen data. It works to prevent overfitting. The dataset is split into a training set (for model training), a validation set (for tuning hyperparameters) and a test set (for unbiased evaluation). Sufficient splitting prevents data leakage, ensures unbiased evaluations and leads to more reliable accuracy.
CorpNet model
Constructing the CNN involves designing a deep learning framework for image classification
(Pakruddin et al., 2025). The architecture typically includes convolutional layers, activation functions, pooling layers, fully connected layers and an output layer. The proposed 15-layer CropNet model is built using three convolution layers followed by ReLU Activation functions, as shown in Fig 3(a). The details analysis of CropNet concerning the number of parameters at each layer is shown in Fig 3(b).
The forward propagation in a convolutional layer is mathematically expressed by equations (3) and (4) below.

Where,
*= Convolution operation.
W[I]= The learnable filters (kernels) at layer l.
A[I - 1]= The input activation of the preceding layer.
b[I]= Bias vector.
g[I]= The ReLU activation function which is defined by equation (5).
Following convolution, max-pooling layers are used for downsampling, reducing spatial dimensions and computational complexity. The max-pooling operation for a given pooling window Ri,j is defined in equation (6).
Where,
am,n,k= The activation at position (m, n) in the k-th feature map.
The output layer uses a softmax activation function to generate the probability distribution over the C target classes as presented by equation (7).
Where,

= The predicted probability for class c.
z= The input vector to the output layer.
The model is trained by minimizing the categorical cross-entropy loss function, which measures the discrepancy between the predicted probability distribution

and the true label distribution y (typically one-hot encoded):
The model is trained by minimizing the categorical cross-entropy loss function, which measures the discrepancy between the predicted probability distribution(w) and the true label distribution (y), commonly expressed by equation (8).
Where,
N= The number of samples in the batch.
C= The number of classes.
Parameter optimization is carried out with the Adam optimizer, which adaptively adjusts the learning rate for every parameter. The corresponding update rules are expressed in equations (9) through (12) as below:
With hyperparameters set to η=0.001 (learning rate), β1 =0.9, β2 =0.999 and ϵ=10-8.
The research work was carried out in Bharati Vidyapeeth’s College of Engineering, Kolhapur, Maharashtra, India, during the years 2022-2025.