The imaging by multispectral approach can provide images across many wavelengths of the electromagnetic spectrum, including both the range of visible light (400–700 nm) and areas above and below this frequency, such as infrared and ultraviolet. The recent advances, particularly with light-emitting diodes capable of emitting three primary colors—red, green, and blue (RGB)—have greatly accelerated the development of this technology. Multispectral imaging has proven highly effective in fruit and vegetable grading and is essential for exporters. Using it, they can get to know their product better, ensuring quality and higher prices. Additionally, multispectral imaging may find other applications, such as disease prevention—a field left open to further study, the management of irrigation, and the optimization of yields in agricultural produce. Multispectral data can analyze the various characteristics and features of fruits, vegetables, trees, etc. This study is aimed at the potential correlation between sugar content in apples and data obtained through multispectral imaging techniques.
Figure 1 is a diagram showing the multispectral imaging systems being put together. A light source that can emit spectral wavelengths of possible broad range, traversing visible and non-visible spectra, is selected. This light passes through a multispectral filter disk, which takes out specific wavelengths for observation. An apple sample is exposed to each of these sequential wavelengths, and the image is captured using a dedicated camera; the disk turns around in different spectral bands for comprehensive data collection to be conducted. The camera contains no infrared filter so that it can take various wavelengths. The captured images are processed using a convolutional neural network (CNN) algorithm called AppleNet to judge the fruit by its multispectral properties.

Setup of multispectral imaging.
Red Delicious (USA)
Royal Gala
Red Delicious (New Zealand)
Washington
Kinnaur.
Four apples of each variety were juiced, and the sugar content in % Brix was determined from the apple juice with a hand-held refractometer. These measurements then acted as the basis for correlating sugar content with imaging spectral data.
Multispectral pictures of all five apple categories were taken in the imaging chamber. These images were combined for a dataset and tested on AppleNet through a CNN. The model interpreted spectral data to find patterns and correlations between the sugar content of the apples.
Output from AppleNet: After processing the combined multispectrum images, the algorithm's accuracy in predicting and classifying apples based on sugar content was 65%. This shows that multispectral imaging by machine learning has tremendous potential for fruit quality assessment. However, there is still room to improve on accuracy and methodology.
The results showcase the effectiveness of multispectral imaging in assessing fruit quality, specifically focused on the sugar content of apples. In addition, by combining cutting-edge imaging technology with next-generation machine learning software, there are promising horizons for agricultural research quality assurance and productivity: further fine-tuning the photographic process and programmed network model and possibly heading off into new territories as yet unknown.
The evaluation of fruit quality is vitally important to consumers. Traditionally, human experts are employed to inspect fruits for quality. But manual sorting of visually inspecting fruits is laborious, slow, incompatible, and fallible. With the advent of multispectral imaging techniques, it is possible to automate the grading process, lessen labor costs, and improve efficiency and accuracy in the classification process of fruits. Both internal and external attributes determine fruit quality. Internal qualities include aroma, taste (basic features), texture, skin firmness, disease presence, and organic residues [2]. It also includes nutritional aspects such as sugar content, titratable acidity (TA), starch, pH, organic acids, and total soluble solids (TSS) value [3]. External attributes, on the other hand, are associated with the size, color, shape, and blemishes on the skin [4]. Critical parameters in fruit quality assessment include ripeness and skin blemishes. Both farmers and consumers benefit from automated grading techniques—which streamline the process of delivering superior products. Yan et al. [5] presented a lightweight apple detection algorithm for picking robots, targeting determining whether an apple is graspable or ungraspable in images of apple trees. The model is based on an enhanced YOLOv5s framework. Key changes included converting the BottleneckCSP module to BottleneckCSP-2, incorporating an Squeeze and Excitation (SE) module into the leading network, and adjusting anchor box dimensions. The model improved to a recall rate of 91.48%, a precision of 83.83%, a mAP of 86.75, and an F1 score of 87.49. It can detect apples in a range of situations, on average taking 0.015 seconds per image for processing.
In their 2018 report, Chunxiao Tang et al. [6] used multispectral imaging and near-infrared spectroscopy (350–1200 nm) to predict the sugar content of Fuji apples. Sensitive wavelengths are selected by backward interval partial least squares, optimal wavelengths by stepwise multiple linear regress propagation, and eventually obtained sugar content. The prediction mode shows a high similarity (r = 0.8861) between forecast and actual carbohydrate levels, with the room mean square error of calibration (RMSEC) of 0.8738 brix. This indicates that it is reasonably accurate.
Jin Wang et al. [7] developed a machine system capable of gauging the defects, shape, and size of Red Fuji apples using image processing and near-infrared spectroscopy. Their spectral reflectance data were placed through preprocessing, and a Competitive Adaptive Reweighted Sampling-Partial Least Squares (CARS-PLS) algorithm was used to build the Brix prediction model. Their method was strikingly accurate in detecting blemishes, shapes, and sizes (96.67%, 95.00%, and 94.67%, respectively), indicating a promising future for online classification.
Khodabakhshian et al. [8] concocted a multispectral imaging system to test pomegranate quality, which included TSS, pH, and TA readings. The multi-linear regression shows a high coefficient of determination, with an ratio of prediction to deviation (RPD) value of 6.7 and an RMSEC of 0.21 brix. It can predict quality accurately.
In 2019, Lianou et al. [9] applied a multispectral imaging method to classify different types of vanilla cream into sections according to their microbiological quality. Their approach obtained a classification accuracy value of 91.7% using the spectral features that distinguished fresh from stale samples.
Liu et al. [10] focused on multispectral imaging to evaluate the quality and stage of ripeness of strawberries. The Back Propagation Neural Network (BPNN) towered in predicting TSS content and firmness among different models. Still, it could not beat the logistic regression tree or new integration-based neural networks. It attained 100% accuracy in ripeness classification.
Santoyo et al. proposed a method to monitor banana ripeness using a multispectral photo technique. The study used a multispectral imaging technique to detect visible and near-infrared (NIR), revealing brown peel spots. The result enables both ripeness tracking and texture types or homogeneity as it discriminates both. In 2021, Lohumi et al. [12] developed a real-time fluorescence imaging system to detect vegetable impurities. In an industrial setting, the system achieved accuracy levels of better than 95%.
Using a multispectral imaging technique and multilayer perceptron (MLP) grader, in 2021, Naeem et al. [13] classified medicinal plant leaves according to their spectral and texture features. As a result, their method attained classification results with very high accuracy; for most kinds of plants, this was >99% success.
For nondestructive discrimination of rice variety, Liu et al. [10] studied the application of multiband imaging and single-space chemometric analysis. Meanwhile, techniques such as Partial Least Squares Discriminant Analysis (PLS-DA) and Least Squares Support Vector Machine (LS-SVM) can achieve up to 94% accuracy in various distinctions based on spectral data only. This body of work shows the potential for quality assessment via multi-spectral imaging combined with data analysis on fingerprints. It points out several areas in agricultural product grading that need to be looked at carefully and provides several motivations toward this end.
The proposed algorithm is AppleNet, a CNN architecture for analyzing multispectral images (Figure 2). The focus here is entirely different as AppleNet has an architecture significantly different from traditional CNN architectures like AlexNet, GoogleNet, VGG, and DenseNet, which all handle RGB images and are optimized to make object detection tasks in RGB color space as efficient as possible. AppleNet's design employs CNN technology to effectively process multispectral imaging data, which is crucial for a wide range of non-destructive fruit sorting and quality control applications.

The proposed network of AppleNet.
CNNs consist of multiple layers interconnected and designed to process input data hierarchically, extracting features and patterns relevant to specific tasks. AppleNet comprises a series of layers, each fulfilling a different role in turning raw image data into meaningful output values. The layers are carefully structured so the network can effectively categorize and analyze multispectral images. Detailed descriptions of each of the layers in AppleNet's architecture are given below:
The architecture begins with the image input layer, which is designed to accept 128 × 128 × 3-pixel images. While multiband images typically have more than three bands, the network treats them like a common RGB dataset with modifications to make these spectral band images compatible with current CNN frameworks. This is done to allow them both to remain within the familiar environment of existing CNN frameworks and simultaneously achieve an integration of multispectral data.
The convolutional layer is the backbone of AppleNet. Each convolutional layer performs a convolution operation on the input image data, which will be presented as a 3D array of numbers. Filters or kernels are used in this layer to extract essential features from the input data, such as edges, textures, and patterns. In AppleNet, each convolution operation uses a 5 × 5 filter that slides across the image, producing feature maps. These spatial hierarchies are captured in the feature maps and are crucial to understanding complex patterns in multispectral images. The convolutional layer also has padding and stride settings to optimize the extraction of features while preserving image resolution. Keep as much relevant spatial information as possible, and do not complicate things more than necessary. With several convolutional layers stacked on top of one another, AppleNet gradually learns more abstract and complex representations of the input image data.
Following each convolutional layer, there is a rectified linear unit (ReLU) activation function. The ReLU layer operates on each element of the stacked feature maps, and any negative value is set to zero. This operation introduces nonlinearity into network models, allowing for more complex relationships with input data and output predictions. The rectified feature map generated by the ReLU layer are made available to the next layer, enhancing the network's ability to learn hierarchical features. This step is key in mining subtle differences, sometimes seen as variations in quality or sweetness for multispectral scan data like fruit samples.
Their spatial dimensions are reduced with pooling layers to down-sample the feature maps. This means that the computational requirements are further reduced. Also, it helps to stop overtraining by taking out extra information that is not included (in 1D format), which smooths out the maps into a steady data stream.
The image dimensionality is reduced by selecting the Most Valuable Element of each area filter captured in different Feature maps [17]. Compared with the last convolutional layer, this process further reduces pixels in an output.
The max pooling layer's output is established by connecting it to a fully connected layer consisting of three output layers for image classification and recognition. The fully connected layer connects all neurons in the previous layer. Technically termed a feature vector, the result predicts the output—the AppleNet inputs data to pass through a transformation such as the max-pooling layer. The classes are set up in this place to learn the relationship between extracted features and target labels. The relations that are established by this layer can be understood. The output of the fully connected layer is further connected to multiple output layers. These are arranged in a way that is designed explicitly to distinguish any given image into one of the four Apple classes.
The Softmax layer is an integral part of the classification process in AppleNet. The output from the fully connected layer enters this layer. It converts those results into probabilities for each class, ensuring that the probabilities sum up to 1.0 [19]. By so doing, it offers a simple question: Which class should be chosen most probably when seeing a particular picture taken before? Another helpful function is that, in training, applying the probability viewpoint to learning speeds up convergence rates.
Features are passed to the final classification layer after a series of processes and calculations. According to the probabilities computed at the Softmax layer for each type, the layer identifies the input image belonging to one of the four output classes—Class 10, Class 12, Class 13, and Class 15. It is the final stage of the neural network's classification process [20].
AppleNet is specifically made to encode the unique challenges of multispectral imaging data. Its hierarchical structure allows hierarchical sharing between the complex spatial and spectral relations, sometimes decisive for apple sweetness grading. Furthermore, nothing is left to chance by using dedicated layers like ReLU, max pooling, or Softmax to ensure robust feature extraction and classification in the face of high-dimensional input data.
Besides, AppleNet's ability to adapt to different spectral features makes it a versatile tool for tomorrow's fruit grading applications. And its seamless integration with existing pipelines of software and hardware makes it equally well-suited for real-world uses. AppleNet is a CNN network used to grade apples by sweetness with the help of multispectral imaging. The AppleNet analyzes the multispectral data of the apples by combining convolutional layers, pooling operations, and advanced activation functions. It also represents AppleNet's overall coherence as a network. By its robust architecture and capable classification, AppleNet promises to be a critical tool for automatic fruit grading systems: its advent makes way for less demanding, more accurate methods.
Figure 3 shows the design and constituent elements of the Multispectral Imaging Chamber. The experimental setup for capturing the Multispectral image of the Apple Fruit is depicted in Figure 4, while Figure 5 offers an overview of the diverse components composing the Multispectral Imaging Chamber.

Multispectral imaging chamber.

Experimental setup for capturing the multispectral images.

Components of experimental setup.
The hardware configuration encompasses components like Logitech Digital HD Portable 1080p Webcam, RGB LED Soft Ring Light MJ26, and a wooden enclosure with a door of 30 cm × 45 cm × 45 cm.
These components constitute the essential elements of the hardware setup, contributing to the practical functionality of the Multispectral Imaging Chamber.
For imaging process, the following steps are used.
In this step, please add a plate or internet sign and place the apple onto it.
Make sure the dome has an LED ring at the top and is fixed with a camera in its center.
Make sure that the apple is in the right place and is not moving anywhere while doing any work, even when capturing it. Also, the chamber must be tightened to keep out all air and light.
Connect USB cables from the camera to the computer—one for the camera and one for your lighting system.
The first thing to do is turn on the LED ring light to illuminate the chamber.
Use the camera application on the computer; invoke it by pressing the Windows button. You'll notice it is all black because of the darkness. The chamber overpowers any light source your camera does have with its complete darkness.
Capture the initial black image as part of your imaging sequence.
Turn on the white light to view the picture inside the chamber. Snap your first photograph here.
Press the RGB button to switch the lighting color and capture another image.
Continue for eight different colors, letting each color settle before taking a shot
For automatic mode, follow the steps above; for manual mode, turn off power between each image capture.
Three positions will be used for photographing the apple: horizontal, vertical, and at rest on a table.
Do the whole thing three times to allow for noise and anomalies.
Check to ensure that 27 images of the apples (3 times 9 photos each, including 1 in black and 8 in color) are all there in their designated folders.
This will present a complete dataset with minimal variation amongst photographs for verification purposes.
To provide a merged picture of the apple fruit in all the spectral bands, follow the following steps in MATLAB:
Go into MATLAB and create a Merge folder to store all nine images sequentially.
Next select all images by pressing Ctrl + A, then right-click on the first one and rename it a(1). The other eight are named a(2), a(3), a(4), a(5), a(6), a(7), a(8) and a(9). This is done for ease of use as well.
The images a(1) to a(9) should be read in MATLAB and assigned to variables i through i8 using the accompanying MATLAB code, making sure that the correct path is specified for the “merge” directory.
After that, combine all of them from i to i8 into the variable i9, as shown by the MATLAB code.
Then display that “concatenated” image i9 by calling MATLAB's show command.
This process suffices for merging those photographs in MATLAB and the final merged photo can be seen in its entirety.
Figure 6A–C depict concatenated images of Red Delicious USA, Royal Gala, and Washington Apple, respectively.

(A) Red Delicious USA, (B) Royal Gala, and (C) Washington.
The dataset is prepared by focusing on Apple fruit. Eight wavelengths are chosen, with six images at each wavelength, including a black reference image across this range. So, the data set consists of 81 photos of each type of fruit at one moment and these eight different wavelengths. A Refractometer is used, which will manually record data on each apple's sweetness in % Brix. Fruits shall be sorted by solid sugar content and other approved methods. The apple types selected for solid sugar content-based grading included the following five varieties: Red Delicious USA, Red Delicious New Zealand, Royal Gala, Washington, and Kinnaur. After repeating the above procedure for all five types of apples, the Lookup table for sugar content in apples is obtained, as shown in Table 1.
Sugar content
Name of the Apple | Apples | Sugar content (% Brix) |
---|---|---|
Kinnaur Apple | Apple 1 to Apple 4 | 10, 10, 15, 12.5 |
Red Delicious NZ | Apple 1 to Apple 4 | 12.8, 13.8, 12.1, 10 |
Red Delicious USA | Apple 1 to Apple 4 | 10, 12, 10, 10 |
Royal Gala | Apple 1 to Apple 4 | 10, 12, 13, 13 |
Washington Apple | Apple 1 to Apple 4 | 10, 12.5, 10, 10 |
For Grading by Sweetness: 1,539 images
Each “image capture” in the proposed dataset consists of nine individual images, physically captured and combined via software. This approach significantly increases the proposed adequate dataset size.
The proposed work employed data augmentation techniques to enhance the proposed dataset further and improve model robustness. This is a standard practice in deep learning to expand the training dataset artificially, expose the model to various image variations, and improve generalization.
While the raw number of capture sessions may appear low, combining multiple images per capture and data augmentation techniques allows us to meet the data requirements for practical deep-learning model training and testing. This approach balances the practical constraints of data collection with the need for a comprehensive dataset, ensuring that proposed models are trained on a diverse and substantial set of images.
Convolutional Layer: The core operation in the convolutional layer is described by the convolution formula:
Where:
I - then input image (128 × 128 × 3)
K - the kernel (5 × 5)
f (i, j) - the output feature map
The output size of the convolutional layer is given by:
ReLU Activation Function: Applied element-wise to the output of the convolutional layer:
Max Pooling Layer: The max pooling operation is defined as:
The output size after max pooling is:
Fully Connected Layer: The operation in the fully connected layer is given by:
Where:
W - the weight matrix
x - the input vector
b - the bias vector
σ - the activation function (ReLU in this case)
Softmax Layer: The softmax function for the final classification is:
Input Layer: 128 × 128 × 3 (RGB image)
Convolutional Layer:
Number of filters: Typically, 32 or 64
Kernel size: 5 × 5
Stride: 1
Padding: “same” to maintain spatial dimensions
Max Pooling Layer:
Pool size: 4 × 4
Stride: 4
Fully Connected Layer:
Number of neurons: Typically, 128 or 256
Output Layer:
Four neurons (for the four apple classes)
Learning Rate: Typically start with 0.001 and implement learning rate decay
Batch Size: Usually 32 or 64, depending on memory constraints
Number of Epochs: Start with 100, implement early stopping
Optimizer: Adam optimizer with β1 = 0.9, β2 = 0.999, ɛ = 1e-8
Regularization: L2 regularization with λ = 0.0001
Dropout Rate: Typically, 0.5 for fully connected layers
Additional Convolutional Layers: Add 1–2 more convolutional layers to capture hierarchical features. Each additional layer would follow the same mathematical formulas as the first convolutional layer.
Batch Normalization: Apply batch normalization after convolutional layers:
Where μ_B and σ B^2 are the mean and variance of the mini-batch, respectively, and γ and β are learnable parameters.
Data Augmentation: Implement transformations like rotation, flipping, and zooming. This doesn't change the network architecture but expands the training data.
Residual Connections: To practice skip connections:
More profound Architecture: Increase depth while managing vanishing gradients using techniques like residual connections.
Transfer Learning: Utilize pre-trained models as feature extractors, fine-tuning the last few layers for apple classification.
A refractometer (Figure 7) is a device that measures a substance's refractive index. The proposed work uses refractometer for measurement of sugar contents of the Apple fruit. First, make a folder called Sweetness, which will consist of readings of apple sugar content. For the proposed work, four folders are made, and then the images of the apples with that particular reading are put in the corresponding folder. The folders are named 10, 12, 13, and 15, representing the sugar content level in % Brix. After the folders with relevant Apple pictures are ready, process pictures to the AppleNet in MATLAB (a CNN algorithm). After the photos are processed through AppleNet, the accuracy achieved is 65% for grading by Apple Sweetness. Figure 8 shows accuracy and Loss vs. iteration. Figure 9 shows the number of observations and loss vs. class labels.

Refractometer.

Accuracy and loss vs. iteration.

Number of observations and loss vs. class labels.
The results of this study suggest that multispectral imaging, when used in conjunction with AppleNet CNN architecture, shows the effectiveness of grading apple fruit by sweetness. A 65% accuracy in sweetness classification indicates a significant advance in non-invasive, automated grading systems. Correlation studies between sugar content and multispectral images, measured by a refractometer, give detailed, type-specific results. However, one must also recognize that making a straight correlation between them is very difficult because the sweetness of fruit involves many complex factors other than a simple feature, like its spectral features. AppleNet's performance, specifically designed for multispectral images, denotes the possibility of tailoring CNNs to characteristics peculiar to fruit grading.
Yet, its lower accuracy reflects an essential need for broader datasets: it is an unmeasurable problem attempting general predictions of various apple varieties when constraints are drawn from a narrow dataset. The limitations of this study, in particular its relatively small dataset size and the impact on model accuracy caused by it, offer a guide toward future research. Possibilities for improvement in the accuracy of analytical models can be found by expanding the dataset to incorporate different kinds of apples, including variations in environmental factors during image capture and minimizing noise sources attacking imaging systems. The concrete significance of this research for the Apple industry lies in enabling fully automatic grading systems. In a future that is already taking shape, with the continuous development of multispectral imaging techniques, this study deepens the comprehension of its application in fruit quality assessment. It introduces new farm management concepts into practice. Table 2 compares AppleNet with other prominent deep-learning techniques used in fruit classification and grading. This includes comparisons with established architectures such as AlexNet, ResNet, Inception, VGG, MobileNet, and other custom CNNs designed for similar tasks.
Comparison of proposed technique with existing deep learning techniques
Name of the network | No of layers | Type of network | Accuracy (%) | No of parameters | Error rate (%) | Features |
---|---|---|---|---|---|---|
AlexNet | 8 | CNN | 84.7 | 62 millions | 15.03 | Deeper |
GoogleNet | 22 | CNN | 93.33 | 4 millions | 6.67 | Increased computational efficiency |
DenseNet | 5 | CNN | 93.34 | 8 millions | 6.66 | Strong gradient flow |
VGG-16 | 16 | CNN | 92.3 | 138 millions | 7.30 | Fixed-size kernels |
VGG-19 | 19 | CNN | 92.3 | 143 millions | 7.30 | Fixed-size kernels |
InceptionNet-v3 | 48 | CNN | 93.3 | 6.4 millions | 6.70 | Wider parallel kernels |
AppleNet (Proposed method) | 7 | CNN | 65 (For sweetness) | 196 millions | 35 (For sweetness) | Low computation requirement |
CNN, convolutional neural network; VGG, Visual Geometry Group.
The key difference lies in the meticulous network optimization for the proposed particular use case. AppleNet represents a carefully tuned CNN configuration, where we
- a)
Determined the optimal number of convolutional and fully connected layers
- b)
Fine-tuned the max pooling operations
- c)
Selected the most effective optimization solver
Each parameter and architectural decision in AppleNet resulted from extensive experimentation and iterative refinement. Our goal was to maximize accuracy specifically for Apple grading tasks.
While AppleNet doesn't introduce fundamental modifications to the CNN structure, its value lies in its specialized configuration for Apple quality assessment, which may not be directly evident from the network architecture alone.
The main contribution of AppleNet is not in proposing a new neural network type but in demonstrating how careful optimization of a CNN can yield superior results for a specific agricultural application.
The comparison now includes detailed performance metrics like precision, recall, and f1-score, as shown in Table 3.
Grading by Sweetness performance metrics of apple fruit
Class name | Precision | Recall | F1-Score |
---|---|---|---|
Class-1-10% Brix sugar content | 0.6500 | 0.6500 | 0.6500 |
Class-2-12% Brix sugar content | 0.6475 | 0.6475 | 0.6475 |
Class-3-13% Brix sugar content 13 | 0.6500 | 0.6500 | 0.6500 |
Class-4-15% Brix sugar content | 0.5638 | 0.5638 | 0.5638 |
Accuracy | 65% | ||
Misclassification rate | 0.3719 | ||
Macro-F1 | 0.6278 | ||
Weighted-F1 | 0.6281 |
The table above (Table 4) compares results from the proposed system with recent research conducted by various researchers.
Comparison of the proposed system with existing work
System | Year | Technology | Application | Accuracy |
---|---|---|---|---|
Proposed system (AppleNet) | 2024 | CNN-based multispectral imaging | Grading Apples by Sweetness | 65% |
Wang et al. [26] | 2023 | Near-infrared spectroscopy and multispectral imaging | Red Fuji Apples (defects, shape, sweetness) | 96.67% (defects), 94.67% (size & shape), near-perfect (sweetness) |
Yan et al. [5] | 2023 | Enhanced YOLOv5s framework | Real-time Apple detection for picking robots | Recall: 91.48%, precision: 83.83%, F1: 87.49% |
Iqbal et al. [22] | 2023 | Raman spectroscopy | Apple quality and safety | High sensitivity to quality and safety parameters |
Liu et al. [10] | 2024 | Multi-band imaging and chemometric analysis | Rice seed quality and variety discrimination | Up to 94% for spectral-based distinctions |
CNN, convolutional neural network.
This study implements a multispectral imaging system into five varieties of apples to investigate apple sweetness and the typology of apples. The five varieties are Kinnaur Apple, NZ Red Delicious, USA Red Delicious (tart), Royal Gala Apple, and Washington Apple. The first step is to design a multispectral imaging chamber to catch detailed multispectral images of the apples. The images are then organized into a single dataset and processed with our developed CNN model, AppleNet. The model performs with 65.0% accuracy when this system is used to predict the sweetness of these apples. However, this accuracy is constrained by the dataset's size, a limitation known to the project. Future work will focus mainly on expanding the dataset—because only then will this system stand to be bettered in its performance.