Accurate mode-choice forecasts are vital for effective transportation planning. Transit agencies and city planners rely on precise predictions, but unreliable forecasts can misdirect even the most behaviorally grounded insights. For decades, discrete choice models (DCMs), notably Multinomial Logit (MNL) and Mixed Multinomial Logit (MMNL), have explained why travelers choose particular modes via interpretable parameters, yet they often underperform in forecast accuracy. More recently, machine learning methods (e.g., tree-based algorithms) have come to capture complex, nonlinear patterns, often outperforming DCMs in point-prediction accuracy. However, they lack built-in confidence measures, limiting their use in risk-aware decision making. In this work, we help narrow this gap by wrapping our best ML model in an Inductive Mondrian Conformal Prediction (IMCP) layer with per-mode calibration at 90% nominal coverage. We leverage a survey of approximately 8,000 Italian employees, capturing their socioeconomic attributes and travel habits. Using a tailored preprocessing pipeline, we compare XGBoost, Random Forest, and CatBoost, observing that XGBoost performs best on the test set with an overall accuracy of 89.7% and a macro-average F1 score of 83.6%. Our IMCP layer then produces distribution-free prediction sets that contain the true mode at least 90% of the time, both overall and within each individual mode category. Singleton prediction sets can be treated as high-confidence forecast for capacity planning, while multilabel sets (and the occasional empty sets for highly ambiguous cases) highlight where uncertainty is greatest and pinpoint exactly which individuals merit follow-up surveys or targeted incentives.
© 2025 Ramin Bohlouli, Ken Koshy Varghese, Guido Gentile, Mohamed Eldafrawi, published by Transport and Telecommunication Institute
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.