Abstract
Stroke is a major and deadly health concern on a global scale, requiring fast and precise methods for effective management. The current research explores six machine learning models: logistic regression, k-nearest neighbors (KNN), support vector machine (SVM), decision tree, random forest, and eXtreme Gradient Boosting (XGBoost), to improve stroke diagnosis. By applying Local Interpretable Model-agnostic Explanations (LIME), this work bridges the gap of interpretability in conventional machine learning models, making it easier for healthcare experts to understand generated model predictions. 5,109 clinical cases with features including age, gender, hypertension, heart disease, average glucose level, and details of patient lifestyle as risk variables, were used to train the applied algorithms. As a result, Logistic Regression had an accuracy of 77%, whereas KNN and SVM had accuracies of 92% and 89%, respectively. The decision tree classifier achieved high precision and accuracy of 95%; however, the random forest and XGBoost models achieved the highest accuracy (97%) and AUC (99%), respectively, outperforming all applied classifiers. The importance of various attributes for each prediction was assessed using LIME, supporting a clear and transparent understanding of the model predictions. Case-based analyses revealed that age, gender, BMI, average glucose level, as well as stressful lifestyle conditions were the major risk factors for stroke. This study highlights the importance of explainable artificial intelligence in assisting healthcare professionals and offering transparent, reliable, and effective personalized treatments relative to specific patient needs.