Abstract
This study evaluates the efficacy of machine learning models for predicting soil nitrogen (N), phosphorus (P), and potassium (K) concentrations from near-infrared (NIR) spectral data (750–2499 nm). A comparative analysis was conducted on models from three distinct categories: linear-based, non-linear kernel-based, and neural network-based, using data from 145 soil samples collected across four Indonesian provinces. Regularised linear modelling performed best: Ridge Regression achieved R²/MAE of 0.999/0.00005% for N-Total, 0.868/0.01408% for P-Total, and 0.763/0.01239% for K-Total. The non-linear SVR delivered moderate fit (R² = 0.821 for N-Total) but produced the lowest MAE for K (0.00829%) at lower explained variance (R² = 0.419). Neural network-based models underperformed the linear baselines. This study demonstrates that, for this dataset, a simpler regularised linear model outperformed more complex architectures, underscoring the critical role of rigorous model selection in developing accurate spectroscopic tools for precision agriculture. Future research could explore hybrid or ensemble methods to combine the strengths of different model types, potentially improving prediction accuracy and robustness.