Precision agriculture has emerged as a transformative approach in modern farming, enabling efficient resource utilization, real-time monitoring, and data-driven decision-making to improve productivity and sustainability. One of the fundamental challenges in precision agriculture is the accurate identification and classification of plant species, particularly distinguishing between crop plants and invasive weeds. Weeds compete with crops for nutrients, light, and space, often resulting in significant yield losses [1,2,3,4]. Effective weed management is crucial to maintaining agricultural output, and it depends heavily on timely and accurate detection. Traditionally, farmers have relied on manual observation and broad-spectrum herbicide application, which not only increases labor costs and chemical usage but also leads to environmental degradation and herbicide resistance. In this context, developing intelligent, automated systems for weed and crop classification can significantly enhance the efficiency and sustainability of agricultural practices.
The advent of computer vision and machine learning has opened new avenues for automating plant species recognition. Machine learning algorithms can be trained to identify specific patterns in image data, such as color, shape, texture, and spatial features, enabling them to differentiate between species with high accuracy [5,6,7,8,9]. Unlike manual techniques, these methods can process vast amounts of data in real-time, making them highly suitable for deployment in autonomous systems like drones, robots, and smart spraying machines. However, the effectiveness of such systems depends on the robustness and generalization capabilities of the underlying models. Variations in lighting, occlusion, soil background, and plant growth stages present additional challenges in real-world applications, necessitating the use of advanced algorithms that can adapt to complex field environments [10,11,12,13,14].
This study addresses these challenges by developing a comprehensive machine learning framework that leverages optimized classification techniques and ensemble learning to enhance the recognition accuracy of weed and crop species. The dataset used in this study comprises 11,500 high-resolution images, collected from two major sources: the publicly available Plant Seedlings Classification dataset on Kaggle and a custom-curated weeds dataset. It includes 10 crop species such as maize, sugar beet, and chickweed, as well as 5 weed types including thistle, crabgrass, and wild oat. Each image is labeled with the corresponding plant species and was captured under natural lighting conditions to simulate real-world scenarios. The diversity of the dataset ensures that the developed models can generalize well across different environments and plant appearances [15,16,17,18,19,20,21,22].
The proposed framework emphasizes the use of feature-rich image representations and robust machine learning algorithms. Initially, raw images are preprocessed through resizing, normalization, and background filtering to remove noise and enhance feature clarity. Hand-crafted features such as color histograms, local binary patterns (LBP), Haralick texture descriptors, and shape-based metrics are extracted to capture the unique characteristics of each plant species. These features are essential for classical machine learning algorithms, which rely on meaningful data representations rather than end-to-end learning. To further improve the efficiency of the model, feature selection techniques are employed to reduce dimensionality while retaining the most informative attributes. Recursive feature elimination (RFE) and principal component analysis (PCA) are used in combination to select optimal subsets of features that contribute most to classification accuracy.
A variety of supervised machine learning algorithms are evaluated in this study. Decision trees and support vector machines (SVM) are initially considered due to their interpretability and effectiveness in classification tasks. However, their limitations in scalability and performance under complex feature interactions motivate the use of more advanced models such as gradient boosting machines (GBM), eXtreme Gradient Boosting (XGBoost), and light gradient boosting machine (LightGBM). These ensemble methods combine multiple weak learners to form a strong predictive model, offering better generalization and higher accuracy. Additionally, ensemble stacking is explored by combining the outputs of multiple base classifiers through a meta-model, enhancing the model’s ability to capture diverse decision boundaries. The major contributions of this research are as follows:
A comprehensive machine learning framework is designed that integrates handcrafted features (color, texture, and shape), dimensionality reduction techniques, and ensemble classifiers to accurately differentiate crop and weed species. The study applies and compares three prominent feature selection approaches—PCA, mutual information (MI), and RFE—to reduce redundancy and enhance classification performance, especially for minority weed classes. A novel ensemble architecture based on stacking is implemented, combining random forest (RF), SVM, XGBoost, and LightGBM. This ensemble surpasses individual model performance in both accuracy and generalization. The model is evaluated on a large-scale, heterogeneous dataset comprising 11,500 annotated images collected from public and custom sources, representing 14 species under real-world conditions. The framework ensures fast inference (under 80 ms/image) and uses lightweight, interpretable algorithms, making it suitable for integration into drones, embedded systems, or edge computing platforms in precision farming.
Automated classification of weeds and crops has become a critical research focus in precision agriculture due to its potential to reduce manual labor, optimize herbicide usage, and increase overall crop productivity. Recent developments in image analysis, machine learning, and remote sensing have led to increasingly sophisticated systems capable of recognizing plant species under diverse agricultural conditions. Studies have explored both classical machine learning techniques and modern deep learning architectures, each offering unique advantages and challenges depending on the data, computational resources, and deployment environment.
Deep learning, especially convolutional neural networks (CNNs), has demonstrated exceptional performance in plant image classification tasks. Zhuang et al. [1] evaluated various deep convolutional neural networks (DCNNs) for detecting broadleaf weed seedlings in wheat. Their study highlighted the superior accuracy of DCNN models but also emphasized the challenges of computational complexity and the need for large labeled datasets. Similarly, Picon et al. [2] proposed a deep learning-based semantic segmentation approach for detecting multiple weed and corn species. Their framework utilized both synthetic and real datasets to enhance model generalization, showing that deep learning models could perform well in complex field environments when adequately trained.
Complementing image-based learning, multimodal data fusion techniques have also been introduced. Gai et al. [3] explored the fusion of color and depth images to improve crop plant detection for robotic weed control. Their findings indicate that combining spatial and spectral data enhances object separation, especially when occlusion and overlapping leaves are present. While deep learning dominates many of these approaches, studies like this demonstrate the potential for integrating handcrafted and learned features to improve model robustness.
Despite the promise of deep learning, traditional machine learning remains highly relevant, particularly when computational efficiency and interpretability are essential. Al-Badri et al. [4] provided an extensive review of machine learning techniques for weed classification, discussing the strengths and limitations of models such as SVM, RF, k-nearest neighbors (k-NN), and decision trees. The authors noted that while deep learning models tend to outperform in accuracy, classical machine learning approaches are advantageous in low-resource settings and offer greater transparency in decision-making. This review further underscores the importance of feature engineering and selection in boosting the accuracy of non-deep learning models.
Other studies have extended machine learning applications to real-time monitoring using UAVs and aerial imagery. Zhao et al. [5] applied deep learning (UNet) to extract lodging in rice fields using UAV-captured images. Although the focus was not weed detection, their work demonstrated how airborne imagery could support high-throughput agricultural monitoring. Similarly, Osorio et al. [6] used multispectral images and deep neural networks to detect weeds in lettuce crops, highlighting the value of spectrum-based data for differentiating vegetation types.
Recent research has also addressed the challenges of limited datasets and domain adaptation. Moldvai et al. [8] investigated weed detection using computer vision with a constrained image dataset. Their study showed that robust feature extraction and classifier tuning can yield acceptable performance even with relatively small data volumes. Zhu et al. [9] improved the YOLOx object detector by integrating lightweight attention modules, achieving efficient weed detection with reduced computational cost. These developments suggest a growing trend toward more lightweight, scalable models suitable for deployment on drones or embedded devices.
Another emerging strategy is the combination of strong classification with weak localization for pretraining, as seen in the work by Zhang et al. [10]. Their method enhanced object detectors in remote sensing imagery, which is particularly relevant in agriculture, where weed distribution often needs to be localized for targeted treatment. Hasan et al. [11] further contributed to this domain by providing an object-level benchmark dataset specifically for weed classification, aiding standardized model comparison and evaluation.
Semantic segmentation remains a vital technique in weed-crop discrimination. You et al. [12] proposed a deep neural network-based semantic segmentation framework to detect weeds and crops. Their model achieved high spatial resolution in plant detection, emphasizing the value of spatially aware models in field conditions. However, the reliance on pixel-level annotations remains a barrier for large-scale implementation.
Studies are also exploring the use of time-series and light condition variation in plant recognition. Sakeef et al. [13] developed a machine learning model to classify plant genotypes grown under various light conditions using multiscale time-series data. This approach is particularly useful in phenotyping and crop monitoring, offering insights into how environmental factors influence plant appearance and model performance. Razfar et al. [14] proposed custom lightweight deep learning models for weed detection in soybean crops. Their work emphasized the trade-off between model complexity and deployment feasibility in real-world applications.
While deep learning dominates much of the current literature, the need for computationally efficient, interpretable, and scalable solutions persists. Rajendran and Thirunavukkarasu [7] demonstrated that traditional machine learning algorithms could achieve strong classification performance when combined with effective feature extraction techniques. Their study, based on the same dataset as that used in the current work, showed that classifiers like RF and SVM can be competitive when trained on meaningful features derived from color, texture, and shape.
Collectively, these studies highlight several trends in the domain of weed and crop species classification. First, there is a growing emphasis on ensemble models and hybrid learning strategies, which combine the strengths of multiple algorithms to improve accuracy and generalization. Second, the importance of feature selection and dimensionality reduction is widely acknowledged, particularly in machine learning workflows where model simplicity and speed are prioritized. Techniques such as PCA and RFE have shown promise in reducing overfitting and enhancing model performance.
The existing literature demonstrates significant progress in both machine learning and deep learning applications for plant species classification. However, there remains a need for high-accuracy, low-complexity frameworks that can be easily adapted to real-world agricultural settings. This study contributes to that goal by focusing on optimized machine learning and ensemble techniques to classify weed and crop species efficiently using well-engineered features and robust validation strategies.
The dataset utilized in this study comprises a total of 11,500 high-resolution images representing 15 plant species, including both crop and weed categories. The crop classes (10 in total) are sourced from the Plant Seedlings Classification dataset available on Kaggle, while the weed classes (5 in total) are extracted from a custom-curated weed image repository. Each image is labeled with the respective plant species and belongs to one of the two major categories: crop or weed. Table 1 shows the dataset details.
Dataset details
| Class type | Species name | Number of images | Source |
|---|---|---|---|
| Crop | Maize, sugar beet, cleavers, black-grass, etc. | 10,000 | Kaggle (plant seedlings) |
| Weed | Dandelion, thistle, bindweed, wild oat, crabgrass | 1,500 | Custom-curated weed dataset |
| Total | 15 classes | 11,500 |
The bold values in the tables represent the best-performing results for the respective evaluation metrics.
All images are RGB and captured under natural daylight, with minimal preprocessing in their raw form. The images vary in size and orientation, prompting the need for normalization and augmentation techniques during preprocessing.
This work introduces a multi-stage machine learning framework for effective weed and crop classification in agricultural fields. The goal is to enhance recognition performance and real-time usability through the application of feature extraction, selection, and ensemble modeling. The key contributions of the proposed work include the following:
Robust feature engineering pipeline combining color, texture, and shape descriptors;
Use of dimensionality reduction techniques to mitigate the curse of dimensionality and boost generalization;
Employment of ensemble learning methods—particularly Voting, Bagging, and Stacking models—to improve classification accuracy;
Optimization of each component for real-time performance and interpretability without deep learning dependencies.
Assume that the dataset be defined as:
where xi ∈ Rd represents the feature vector of the ith image, and yi ∈ {1,2,..., k} is the corresponding class label.D = \left\{ {\left( {{x_i},{y_i}} \right)} \right\}_{i = 1}^n
The following machine learning models are applied:
SVM:
\mathop {\min }\limits_{w,b,\xi } {1 \over 2}{\left\| w \right\|^2} + C\sum\nolimits_{i = 1}^n {{\xi _i}} \;\;\;{\rm{st}}.\;\;\;{y_i}\left( {{w^{\rm{T}}}{x_i} + b} \right) \ge 1 - {\xi _i},{\xi _i} \ge 0 RF:
H\left( x \right) = {\rm{majority}} - {\rm{vote}}\left\{ {{h_t}\left( x \right)} \right\}_{t = 1}^T
XGBoost:
\matrix{ {{{\ddot y}_i} = \sum\nolimits_{t = 1}^T {{f_t}} \left( {{x_i}} \right),{f_t} \in F} \hfill \cr {L\left( \phi \right) = \sum\limits_i {\left[ {\left( {{y_i},{{\ddot y}_t}} \right)} \right.} + \sum\limits_i {\Omega \left( {{f_t}} \right),\Omega \left( f \right)} = {\gamma ^T} + {1 \over 2}\lambda {{\left\| w \right\|}^2}} \hfill \cr } LightGBM:
{\rm{Split}}\_{\rm{Gain}} = {1 \over 2}\left[ {{{G_L^2} \over {{H_L} + \lambda }} + {{G_R^2} \over {{H_R} + \lambda }} + {{{{({G_L} + {G_R})}^2}} \over {{H_L} + {H_R} + \lambda }}} \right] - \gamma Voting Classifier (Hard Voting):
y = \mathop {\arg \max }\limits_j \sum\limits_{i = 1}^M {1{f_i}\left( x \right) = j} Stacking Classifier:
h\left( x \right) = g\left( {{f_1}\left( x \right),{f_2}\left( x \right) \ldots .{f_M}\left( x \right)} \right)
The proposed machine learning architecture is composed of the following major components.
This is the foundational stage where raw image data is acquired and prepared for further analysis. Agricultural images often come with considerable variability in resolution, lighting conditions, orientations, and background clutter. Without careful preprocessing, this variability can degrade classification performance.
Image Acquisition:
- ο
Images are collected from two datasets: the Kaggle Plant Seedlings Classification dataset and a curated weed image dataset.
- ο
All images represent real-world field scenarios with varying illumination and soil conditions.
- ο
Preprocessing Steps:
- ο
Resizing: All images are resized to a fixed dimension (e.g., 128 × 128) to maintain consistency and reduce computational burden.
- ο
Normalization: Pixel values are normalized (e.g., scaled to [0, 1]) to reduce brightness-related bias.
- ο
Noise Reduction: Median and Gaussian filters are optionally applied to remove graininess from soil and background.
- ο
Augmentation: Techniques like horizontal flipping, rotation, zooming, and brightness adjustment improve model generalization, especially in cases of class imbalance.
- ο
Effective preprocessing directly enhances the quality of extracted features in the next layer. It ensures that noise and irrelevant visual information (e.g., soil or shadows) do not obscure critical plant traits.
This layer transforms preprocessed image data into structured numerical representations (feature vectors) that encode relevant visual characteristics. In classical machine learning, feature engineering is a crucial step because the model’s predictive performance depends heavily on the quality of the features (Figure 1).

Overview of the proposed work. LightGBM, light gradient boosting machine; PCA, principal component analysis; RF, random forest; RFE, recursive feature elimination; SVM, support vector machine.
Color Features:
- ο
RGB mean and standard deviation per channel
- ο
HSV histograms (to capture perceptual color information under varying light).
- ο
Texture Features:
- ο
Gray-Level Co-occurrence Matrix (GLCM): properties: contrast, homogeneity, energy, entropy
- ο
LBP: local texture descriptors effective for capturing leaf surface textures.
- ο
Shape Features:
- ο
Aspect ratio (width to height)
- ο
Solidity (area/convex area)
- ο
Perimeter-to-area ratio
- ο
Roundness and elongation metrics from contours.
- ο
Crop and weed species often differ subtly in color tone, leaf texture, or structure.
By focusing on domain-relevant descriptors, we reduce the dimensionality of input data while enhancing semantic separability between classes.
Carefully engineered features also reduce the need for large training datasets, which is beneficial when data availability is limited.
In the proposed framework, dimensionality reduction and feature selection are critical steps that streamline the model’s input space by eliminating redundant, irrelevant, or noisy features. This is essential for reducing computational complexity, improving model interpretability, and enhancing the performance of machine learning classifiers—particularly in high-dimensional image feature data. The techniques employed include PCA, RFE, and MI analysis, each of which contributes uniquely to optimizing the feature space.
PCA is an unsupervised linear transformation technique used to reduce the dimensionality of a dataset while preserving as much variance as possible. Given a dataset X ∈ Rn×d, PCA computes the covariance matrix
RFE is a supervised feature selection method that eliminates features recursively based on their importance in predicting the target variable. It uses a base estimator (e.g., a linear SVM or a RF classifier) to assign importance scores to features. At each iteration, the feature with the lowest importance (e.g., smallest absolute coefficient |wj| in linear models) is removed, and the model is retrained. This process continues until the desired number of features d′ is retained. RFE is particularly useful for datasets where some features have minimal or no contribution to class discrimination. By pruning such features, RFE enhances the model’s generalization ability and reduces the risk of overfitting. Furthermore, because it is model-specific, RFE can capture complex interactions when used with tree-based classifiers or ensemble learners.
MI is a non-parametric method that quantifies the dependency between a feature and the target variable. Unlike correlation, which captures linear relationships, MI measures both linear and non-linear associations. For a discrete feature Xj and target variable Y, the MI is defined as:
The Classifier Ensemble Module forms the central decision-making engine of the proposed system. It operates on the optimized feature vectors produced by the preceding layers and aims to accurately predict the species label (weed or crop type) by leveraging the strengths of multiple machine learning classifiers. Rather than depending on a single model, this module adopts a multi-model strategy through ensemble learning, which enhances prediction accuracy, improves generalization, and ensures robustness to dataset noise and variability. The module is composed of multiple base learners, each with unique learning mechanisms and biases, followed by ensemble strategies—Voting and Stacking—to synthesize their predictions.
RF is an ensemble method itself, consisting of a collection of decision trees trained on different bootstrapped subsets of the training data. For a given input vector x, the prediction
SVMs are margin-based classifiers that seek to find the optimal hyperplane separating classes in the feature space. For a binary classification problem, the decision function is defined as:
XGBoost and LightGBM are gradient boosting frameworks that build an ensemble of weak learners (typically decision trees) in a forward stage-wise manner. At each iteration t, a new model ft(x) is added to correct the errors of the previous model:
The voting classifier combines the predictions from multiple base models. In hard voting, the final prediction
Stacking is a more sophisticated ensemble technique that combines base learners by training a meta-classifier (e.g., Logistic Regression or SVM) on their outputs. If the base models produce predictions {f1(x),f2(x),...,fM(x)}, the meta-classifier g learns a new mapping:
Given:
Image dataset
, where xi ∈ Rh×w×c and yi ∈ {1,2,...,K}D = \left\{ {\left( {{x_i},{y_i}} \right)} \right\}_{i = 1}^n Feature extractors F1,F2,...,Fm
Dimensionality reduction function ϕ : Rd → Rd
Base classifiers C1,C2,...,CM
Meta-classifier G (for stacking)
Step 1: Preprocessing
For all
Step 2: Feature Extraction
For each preprocessed image
Step 3: Dimensionality Reduction/Feature Selection
Reduce feature dimensionality using φ (e.g., PCA or RFE):
Step 4: Training Base Classifiers
For each base model Cj, train on reduced features:
Step 5A: Voting Ensemble (Hard Voting)
Step 5B: Stacking Ensemble
Construct meta-feature vector:
Train meta-classifier G:
Step 6: Evaluation using Accuracy, Precision, Recall and F1 score
Output:
Final predicted label yi ∈ {1,2,..., K} for each image
Evaluation metrics: Accuracy, Precision, Recall, F1-score, Confusion Matrix.
The proposed algorithm aims to accurately classify weed and crop species using a machine learning pipeline that integrates feature engineering, dimensionality reduction, and ensemble learning. The method is carefully structured into six key stages, each contributing to the robustness, scalability, and real-time applicability of the system in agricultural environments.
The first stage begins with image preprocessing, where raw images are standardized to ensure consistency. This includes resizing the images to a fixed dimension, normalizing the pixel values to handle lighting variations, and applying data augmentation techniques such as rotation, flipping, and contrast adjustment. These operations prepare the images for uniform processing in subsequent stages and improve the model’s generalization capability.
In the second stage, feature extraction is performed. Instead of using deep learning, which often requires extensive computational resources, the algorithm relies on handcrafted features derived from image properties. These features include color histograms, texture descriptors like LBP, and shape characteristics such as aspect ratio and leaf contour details. The result is a structured numerical representation of each image that captures biologically meaningful traits relevant for distinguishing different plant species.
Next, the algorithm applies dimensionality reduction and feature selection. This is a crucial stage where irrelevant or redundant features are eliminated to reduce computational load and avoid overfitting. Techniques such as PCA are used to project high-dimensional feature data into a lower-dimensional space while preserving most of the variance. Additionally, supervised methods like RFE and MI analysis help identify and retain only the most discriminative features for classification. This results in a compact and efficient feature set for the next stage.
The fourth stage involves training multiple classifiers, each using the reduced feature set. Several machine learning models are used as base learners, including RFs, which are good at handling noisy features; SVMs, which perform well in high-dimensional settings; and boosting algorithms like XGBoost and LightGBM, which build strong predictive models through iterative learning. Each classifier independently learns patterns from the training data and produces its own prediction for each image.
In the fifth stage, the system combines the outputs of the individual classifiers using ensemble techniques. Two strategies are implemented: voting and stacking. In the voting method, each classifier contributes a vote, and the final prediction is determined by the majority or by averaging probabilities. In the stacking approach, the outputs of the base classifiers are fed into a second-level model, known as a meta-classifier, which learns how to best combine these predictions. This strategy often improves accuracy by capturing complex inter-model relationships.
Finally, the sixth stage performs evaluation of the system using standard metrics such as accuracy, precision, recall, and F1-score. These metrics help assess the model’s effectiveness in classifying both majority and minority classes. A confusion matrix is also generated to analyze misclassifications across species.
To evaluate the effectiveness of the proposed ensemble-based machine learning framework for weed and crop species classification, extensive experiments were conducted using the combined dataset comprising 11,500 images across 14 classes (10 crop and 4 weed types). Various performance metrics, including accuracy, precision, recall, F1-score, and execution time, were used to assess both individual classifiers and ensemble configurations.
All experiments were conducted in Python 3.10 (Python Software Foundation), using the following tools:
scikit-learn (v1.3) (Scikit-learn community developers, sponsored by NumFOCUS) for feature extraction, PCA, RFE, SVM, RF, and ensemble models;
XGBoost (v1.7) (Distributed Machine Learning Community (initially by Tianqi Chen)) and LightGBM (v4.1) (Microsoft Research) for gradient boosting classifiers;
Pandas/NumPy for data manipulation and statistical operations;
Matplotlib (John D. Hunter, now maintained by Matplotlib Development Team)/Seaborn (Michael Waskom and contributors) for visualization;
System specifications: Intel Core i7 CPU, 32GB RAM, running on Ubuntu 22.04 with no GPU acceleration (CPU-only benchmark).
Feature selection plays a crucial role in optimizing the performance of machine learning classifiers, particularly when working with high-dimensional image-based datasets in agriculture. Redundant or irrelevant features not only increase computational overhead but can also lead to model overfitting and reduced generalizability in field environments. As highlighted in Al-Badri et al. [4], the careful selection and reduction of features are especially important in agricultural vision systems due to the complexity of background noise, lighting variation, and intra-class diversity in plant images. Accordingly, this study investigates multiple feature selection techniques—ranging from unsupervised projections to supervised relevance estimation—to assess their impact on model accuracy and training time.
Table 2 presents the comparative evaluation of four different feature selection configurations, including the use of all raw features, PCA, MI-based filtering, and RFE with an RF base estimator.
Feature selection comparison
| Feature set | No. of features | PCA used | Accuracy (%) | Training time (s) |
|---|---|---|---|---|
| All raw features | 52 | No | 91.6 | 48.3 |
| PCA (95% variance) | 20 | Yes | 93.2 | 33.2 |
| Mutual info selected | 18 | No | 94.0 | 31.4 |
| RFE (with RF) | 15 | No | 94.7 | 29.5 |
The bold values in the tables represent the best-performing results for the respective evaluation metrics.
PCA, principal component analysis; RF, random forest; RFE, recursive feature elimination.
The results clearly demonstrate the effectiveness of dimensionality reduction. Using all 52 raw features yielded the lowest accuracy (91.6%) and the highest training time (48.3 s), indicating a significant presence of noisy or redundant data. By contrast, PCA, which transforms the data into a lower-dimensional orthogonal space while retaining the highest variance, improved accuracy to 93.2% and reduced training time to 33.2 s. However, the best performance was achieved using RFE, which iteratively selects the most informative subset of features based on their predictive power. RFE with RF reduced the feature count to 15 and achieved the highest accuracy of 94.7% with the shortest training time (29.5 s), highlighting its ability to maintain model quality while improving efficiency.
Table 3 presents a detailed evaluation of the classifiers using key performance metrics: accuracy, precision, recall, F1-score, and inference time per image.
Individual classifier performance
| Classifier | Accuracy (%) | Precision (%) | Recall (%) | F1-score (%) | Inference time (ms/image) |
|---|---|---|---|---|---|
| RF | 94.1 | 93.8 | 93.6 | 93.7 | 92.4 |
| SVM (RBF) | 92.7 | 91.9 | 92.3 | 92.1 | 108.7 |
| XGBoost | 94.5 | 94.2 | 94.0 | 94.1 | 79.6 |
| LightGBM | 94.8 | 94.6 | 94.3 | 94.4 | 76.3 |
LightGBM, light gradient boosting machine; RF, random forest; SVM, support vector machines.
Among the evaluated models, LightGBM emerged as the best-performing individual classifier, achieving the highest accuracy (94.8%), precision (94.6%), and F1-score (94.4%), with the lowest inference time (76.3 ms/image). This confirms LightGBM’s strength in handling structured, tabular features with efficient gradient boosting mechanisms, as also demonstrated by Hasan et al. [11] in weed detection applications.
XGBoost closely followed, delivering high accuracy (94.5%) with slightly increased inference time. The RF classifier, though marginally less accurate (94.1%), maintained competitive scores across all metrics and offers high interpretability and robustness, making it a viable alternative in edge-deployable systems. On the other hand, SVM with an RBF kernel showed the lowest performance (92.7% accuracy) and the highest inference time, possibly due to its computational complexity when dealing with multi-class and high-dimensional input spaces.
Table 4 summarizes the comparative performance of these ensemble approaches using standard evaluation metrics.
Ensemble model evaluation
| Ensemble strategy | Base models | Accuracy (%) | Precision (%) | Recall (%) | F1-score (%) |
|---|---|---|---|---|---|
| Hard voting | RF + SVM + LightGBM | 95.1 | 95.0 | 94.9 | 94.9 |
| Soft voting | RF + XGB + LightGBM | 95.3 | 95.2 | 95.0 | 95.1 |
| Stacking (LogReg) | All four classifiers | 95.6 | 95.4 | 95.3 | 95.3 |
The bold values in the tables represent the best-performing results for the respective evaluation metrics.
LightGBM, light gradient boosting machine; RF, random forest; SVM, support vector machines.
The stacking ensemble strategy, which integrates predictions from all four base classifiers (RF, SVM, XGBoost, and LightGBM) using a meta-classifier (Logistic Regression), produced the highest overall performance, achieving 95.6% accuracy and an F1-score of 95.3%. This validates the advantage of stacking in capturing non-linear relationships between classifier outputs and refining final predictions. The ability of stacking to learn how to optimally weigh base model predictions is a significant factor behind its superior performance.
Soft voting, which combines predicted probabilities from the base models, also demonstrated strong results with 95.3% accuracy, slightly outperforming hard voting. The improvement suggests that probabilistic averaging provides a smoother decision boundary, especially in cases of borderline predictions, which are common in multi-class agricultural classification problems.
Although hard voting yielded the lowest accuracy among the three (95.1%), it still outperformed all individual classifiers (Table 2), reinforcing the premise that even simple ensemble techniques can provide measurable gains in predictive robustness.
Fine-grained evaluation of classification performance by species is essential in agricultural applications, especially when the model must differentiate between crops and morphologically similar weeds in real-world field conditions. To understand species-specific behavior, the F1-score—a balanced measure of precision and recall—was calculated for selected representative classes, including both crop and weed species as in Figure 2. These include Maize and Charlock (crops), as well as Bindweed and Wild Oat (weeds), which often present visual similarities in early growth stages, making them challenging to distinguish.

F1 score analysis. SVM, support vector machine.
Table 5 shows the class-wise F1-scores achieved by three individual classifiers and the proposed stacking ensemble.
Class-wise F1-scores (selected classes)
| Class name | RF | SVM | LightGBM | Ensemble (stacked) |
|---|---|---|---|---|
| Maize | 97.1 | 96.2 | 97.3 | 98.0 |
| Charlock | 93.5 | 91.4 | 94.0 | 95.1 |
| Bindweed (weed) | 91.0 | 88.7 | 92.6 | 93.4 |
| Wild oat (weed) | 89.6 | 87.9 | 91.3 | 92.5 |
The bold values in the tables represent the best-performing results for the respective evaluation metrics.
LightGBM, light gradient boosting machine; RF, random forest; SVM, support vector machine.
As observed, the stacking ensemble consistently outperforms individual classifiers across all selected species, reinforcing its ability to generalize well across both major (crops) and minor (weed) classes.
For Maize, a class with high intra-class consistency and strong feature definition, all classifiers perform well, with the ensemble achieving a near-perfect 98.0% F1-score.
In the case of Charlock, the ensemble improves upon the best base model (LightGBM) by over 1 percentage point, demonstrating enhanced decision boundary learning when base predictions are fused intelligently.
More importantly, weed classes, which are typically harder to identify due to feature overlaps and class imbalance, show significant gains with the ensemble. For instance, the F1-score for Bindweed rises from 88.7% (SVM) to 93.4%, highlighting the value of ensembling for minority class performance—a common challenge in real-time weed detection systems (Zhuang et al. [1]; Al-Badri et al. [4]).
Similarly, Wild Oat, another visually ambiguous weed, benefits from the ensemble approach, achieving a 92.5% F1-score, the highest among all models.
Table 6 presents a comparative overview of several prominent studies in the domain of plant classification and weed detection, focusing on methodology, dataset size, accuracy, feature selection approach, and whether ensemble techniques were used. These studies span both classical and deep learning approaches, highlighting the spectrum of strategies employed in agricultural image classification research as in Figure 3.

Accuracy analysis.
Comparative performance with existing works
| Study | Dataset size | Methodology | Accuracy (%) | Feature selection | Ensemble used |
|---|---|---|---|---|---|
| Rajendran and Thirunavukkarasu [7] | 11,500 | Traditional ML + textural | 94.3 | Manual selection | No |
| Gai et al. [3], JFR | 4,000 | Color + Depth Fusion + ML | 90.7 | PCA | No |
| Hasan et al. [11], Crop Prot. | 8,500 | Lightweight DL (YOLO-based) | 93.5 | CNN features | No |
| Moldvai et al. [8], Applied Sciences | 3,200 | Vision + CNN (small dataset) | 92.1 | Image Thresholding | No |
| Proposed Method (2025) | 11,500 | ML + feature engineering + stacking | 95.6 | PCA + RFE | Yes (stacking) |
The bold values in the tables represent the best-performing results for the respective evaluation metrics.
CNN, convolutional neural network; PCA, principal component analysis; RFE, recursive feature elimination.
The proposed method distinguishes itself through the exclusive use of machine learning algorithms enhanced by automated feature engineering, dimensionality reduction, and a stacking-based ensemble, enabling strong performance without the need for computationally intensive deep learning architectures.
The proposed system achieves the highest accuracy of 95.6%, outperforming not only classical machine learning systems but also recent deep learning-based methods. For example, Hasan et al. [11] employed a YOLO-based lightweight CNN architecture on a sizable dataset but achieved only 93.5% accuracy—suggesting that while DL models are powerful, they may be limited by hardware constraints and data complexity in field scenarios. Similarly, Gai et al. [3] introduced a fusion-based method combining color and depth data, yet reached only 90.7% accuracy, possibly due to over-dependence on depth sensors, which are sensitive to outdoor variability.
While the proposed ensemble model achieved a high overall accuracy of 95.6%, a deeper examination of the model’s predictions reveals meaningful patterns in the misclassified instances, which can guide future improvements in both data and model design.
A confusion matrix analysis as in Table 7 revealed that misclassifications were more frequent among morphologically similar species, particularly in early growth stages. The most notable confusion occurred between the following:
Charlock vs. Common Chickweed: Both species share similar leaf textures and color hues in early seedling stages, leading to frequent misidentification by all base classifiers.
Wild Oat vs. Fat Hen: These classes were occasionally confused due to similar elongated leaf structures and overlapping pixel intensities in cluttered backgrounds.
Class-wise error rates
| Confused class pair | Misclassification rate (%) | Primary cause | Suggested solution |
|---|---|---|---|
| Charlock ↔ chickweed | 4.2 | High visual similarity | Integrate texture-based GLCM features |
| Wild oat ↔ fat hen | 3.7 | Overlapping foliage in images | Use morphological shape descriptors |
| Bindweed ↔ cleavers | 3.5 | Poor contrast under shadows | Adaptive histogram equalization (CLAHE) |
GLCM, gray-level co-occurrence matrix.
Lighting Variability: Some misclassifications were traced to shadows and inconsistent illumination, which negatively impacted color-based features.
Background Noise: Presence of soil, mulch, or other vegetation often introduced background clutter that misled shape-based features.
Class Imbalance: Weed species such as Bindweed and Wild Oat had relatively fewer samples compared with major crops like maize and sugar beet, affecting classifier confidence.
To address these issues, several enhancements are proposed:
Augment the dataset with synthetic samples (e.g., SMOTE or GAN-based augmentation) for underrepresented weed classes to balance training.
Improve feature extraction by incorporating advanced descriptors such as the following:
LBP for leaf texture;
Gabor filters for edge orientation;
Morphological operations for shape isolation.
Context-aware segmentation to isolate plants from noisy backgrounds using image segmentation tools such as GrabCut or Watershed.
Ensemble refinement by incorporating error-aware meta-learners that adaptively re-weight classifier contributions based on historical confusion patterns.
Manual inspection of select misclassified images further confirmed that the model’s errors were not random but closely tied to image conditions and intra-class similarities—indicating that improvements in data diversity and preprocessing pipelines would likely yield higher performance than merely increasing model complexity.
To further improve interpretability and transparency, we recommend incorporating explainability techniques such as Gradient-weighted Class Activation Mapping (Grad-CAM). Although primarily designed for neural network, similar visual attribution approaches or surrogate model explanations (e.g., SHAP or LIME) can be applied to tree-based ensemble models to reveal which image regions or feature patterns influence decisions. This would help highlight the role of environmental factors—such as lighting, shadow, and background clutter—in misclassifications, thereby guiding future improvements in data preprocessing and model design.
This study presents a comprehensive and efficient machine learning framework for the classification of crop and weed species using handcrafted features, dimensionality reduction, and ensemble-based classification strategies. Leveraging a diverse dataset comprising 11,500 images sourced from both public (Kaggle) and custom-curated collections, the system integrates a robust pipeline that includes feature engineering, RFE, and advanced ensemble methods such as stacking. The framework emphasizes interpretability, computational efficiency, and deployment readiness, making it suitable for real-time agricultural applications such as drone-based surveillance and autonomous weed control systems. Experimental evaluations demonstrate that the proposed approach outperforms several state-of-the-art methods—including both classical machine learning and lightweight deep learning models—achieving a peak classification accuracy of 95.6% using the stacked ensemble model. Feature selection techniques such as RFE and MI not only enhanced accuracy but also significantly reduced training time and complexity. Among individual classifiers, LightGBM offered the best performance, but the stacking ensemble yielded consistent improvements across all major and minor plant classes, particularly in challenging weed categories like Bindweed and Wild Oat. Comparative analysis with existing literature confirms that this machine learning-based strategy provides a compelling alternative to deep learning approaches, especially in resource-constrained environments where computational overhead and explainability are critical considerations. Unlike black-box neural networks, the modular and transparent nature of this pipeline ensures that it can be audited, optimized, and adapted to other agricultural tasks or new species with relative ease.