Have a personal or library account? Click to login
Predicting Win-Loss outcomes in MLB regular season games – A comparative study using data mining methods Cover

Predicting Win-Loss outcomes in MLB regular season games – A comparative study using data mining methods

By: C. Soto Valero  
Open Access
|Dec 2016

References

  1. Ahmad, A., & Dey, L. (2005). A feature selection technique for classificatory analysis. Pattern Recognition Letters, 26(1), 43-56. doi: 10.1016/j.patrec.2004.08.015
  2. Alcalá-Fdez, J., Sánchez, L., García, S., Jesus, M. J., Ventura, S., Garrell, J. M., . . . Herrera, F. (2008). KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 13(3), 307-318. doi: 10.1007/s00500-008-0323-y
  3. Aslan, B. G., & Inceoglu, M. M. (2007). A comparative study on neural network based soccer result prediction. Paper presented at the Seventh International Conference on Intelligent Systems Design and Applications.10.1109/ISDA.2007.12
  4. Baumer, B., & Zimbalist, A. (2014). Quantifying Market Inefficiencies in the Baseball Players’ Market. Eastern Economic Journal, 40(4), 488-498. doi: 10.1057/eej.2013.43
  5. Burges, C. J. C. (1998). A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2), 121-167. doi: 10.1023/a:1009715923555
  6. Chang, J., & Zenilman, J. (2013). A study of sabermetrics in Major League Baseball: The impact of Moneyball on free agent salaries.
  7. Dash, M., & Liu, H. (2003). Consistency-based search in feature selection. Artificial Intelligence, 151(1-2), 155-176. doi: 10.1016/S0004-3702(03)00079-1
  8. Delen, D., Cogdell, D., & Kasap, N. (2012). A comparative analysis of data mining methods in predicting NCAA bowl outcomes. International Journal of Forecasting, 28(2), 543-552. doi: 10.1016/j.ijforecast.2011.05.002
  9. Demens, S. (2015). Riding a probabilistic support vector machine to the Stanley Cup. Journal of Quantitative Analysis in Sports, 11(4), 205-218. doi: 10.1515/jqas-2014-0093
  10. Edelmann-Nusser, J., Hohmann, A., & Henneberg, B. (2002). Modeling and prediction of competitive performance in swimming upon neural networks. European Journal of Sport Science, 2(2), 1-10. doi: 10.1080/17461390200072201
  11. Fischer, A., Do, M., Stein, T., Asfour, T., Dillmann, R., & Schwameder, H. (2011). Recognition of Individual Kinematic Patterns during Walking and Running-A Comparison of Artificial Neural Networks and Support Vector Machines. International Journal of Computer Science in Sport, 10(1).
  12. Gartheeban, G., & Guttag, J. (2013). A data-driven method for in-game decision making in MLB: when to pull a starting pitcher. Paper presented at the Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining.10.1145/2487575.2487660
  13. Gutierrez-Osuna, R. (2002). The k nearest neighbor rule (k-nnr). k-NN Lecture Notes.
  14. Haghighat, M., Rastegari, H., & Nourafza, N. (2013). A review of data mining techniques for result prediction in sports. Advances in Computer Science: an International Journal, 2(5), 7-12.
  15. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11(1), 10-18. doi: 10.1145/1656274.1656278
  16. Hall, M. A., & Holmes, G. (2003). Benchmarking attribute selection techniques for discrete class data mining. Knowledge and Data Engineering, IEEE Transactions on, 15(6), 1437-1447. doi: 10.1109/TKDE.2003.1245283
  17. Han, J., & Kamber, M. (2006). Data Mining Concepts and Techniques (2nd ed.): Morgan Kaufmann Publishers.
  18. Haykin, S. (2008). Neural networks and learning machines (3rd ed.). New Jersey: Prentice Hall.
  19. Healey, G. (2015). Modeling the Probability of a Strikeout for a Batter/Pitcher Matchup. Knowledge and Data Engineering, IEEE Transactions on, 27(9), 2415-2423. doi: 10.1109/TKDE.2015.2416735
  20. Hornik, K., Stinchcombe, M., & White, H. (1990). Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Networks, 3(5), 551-560. doi: 10.1016/0893-6080(90)90005-6
  21. Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. (2001). Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Computation, 13(3), 637-649. doi: 10.1162/089976601300014493
  22. Liao, S.-H., Chu, P.-H., & Hsiao, P.-Y. (2012). Data mining techniques and applications - A decade review from 2000 to 2011. Expert Systems with Applications, 39(12), 11303-11311. doi: 10.1016/j.eswa.2012.02.063
  23. Loh, W.-Y. (2014). Fifty Years of Classification and Regression Trees. International Statistical Review, 82(3), 329-348. doi: 10.1111/insr.12016
  24. Loughin, T. M., & Bargen, J. L. (2008). Assessing pitcher and catcher influences on base stealing in Major League Baseball. Journal of sports sciences, 26(1), 15-20. doi: 10.1080/02640410701287255
  25. Menéndez, H. D., Vázquez, M., & Camacho, D. (2015). Mixed Clustering Methods to Forecast Baseball Trends. In D. Camacho, L. Braubach, S. Venticinque & C. Badica (Eds.), Intelligent Distributed Computing VIII (pp. 175-184). Cham: Springer International Publishing.
  26. Morgan, S., Williams, M. D., & Barnes, C. (2013). Applying decision tree induction for identification of important attributes in one-versus-one player interactions: A hockey exemplar. Journal of sports sciences, 31(10), 1031-1037. doi: 10.1080/02640414.2013.770906
  27. Ockerman, S., & Nabity, M. (2014). Predicting the Cy Young Award Winner. PURE Insights, 3(1), 9.
  28. Percy, D. F. (2015). Strategy selection and outcome prediction in sport using dynamic learning for stochastic processes. Journal of the Operational Research Society, 66(11), 1840-1849. doi: 10.1057/jors.2014.137
  29. Robertson, S., Back, N., & Bartlett, J. D. (2015). Explaining match outcome in elite Australian Rules football using team performance indicators. Journal of sports sciences, 1-8. doi: 10.1080/02640414.2015.1066026
  30. Robinson, S. J. (2014). Extracting Individual Offensive Production from Baseball Run Distributions. International Journal of Computer Science in Sport, 13(2).
  31. Robnik-Šikonja, M., & Kononenko, I. (1997). An adaptation of Relief for attribute estimation in regression. Paper presented at the Machine Learning: Proceedings of the Fourteenth International Conference (ICML’97).
  32. Rosenfeld, J. W., Fisher, J. I., Adler, D., & Morris, C. (2010). Predicting overtime with the Pythagorean formula. Journal of Quantitative Analysis in Sports, 6(2). doi: 10.2202/1559-0410.1244
  33. Sauer, R. D., Waller, J. K., & Hakes, J. K. (2010). The progress of the betting in a baseball game. Public Choice, 142(3-4), 297-313. doi: 10.1007/s11127-009-9544-6
  34. Schumaker, R. P., Solieman, O. K., & Chen, H. (2010a). Greyhound racing using support vector machines. Sports Data Mining (pp. 117-125): Springer US.10.1007/978-1-4419-6730-5_11
  35. Schumaker, R. P., Solieman, O. K., & Chen, H. (2010b). Sports Data Mining: Springer US.10.1007/978-1-4419-6730-5
  36. Shearer, C. (2000). The CRISP-DM model: the new blueprint for data mining. Journal of Data Warehousing, 5, 13-22.
  37. Smith, E. E., & Groetzinger, J. D. (2010). Do fans matter? The effect of attendance on the outcomes of Major League Baseball games. Journal of Quantitative Analysis in Sports, 6(1). doi: 10.2202/1559-0410.1192
  38. Soto Valero, C., & González Castellanos, M. (2015). Sabermetría y nuevas tendencias en el análisis estadístico del juego de béisbol [Sabermetrics and new trends in statistical analysis of baseball]. Retos, 28(2), 122-127.10.47197/retos.v0i28.34826
  39. Stekler, H. O., Sendor, D., & Verlander, R. (2010). Issues in sports forecasting. International Journal of Forecasting, 26(3), 606-621. doi: 10.1016/j.ijforecast.2010.01.003
  40. Sykora, M., Chung, P. W. H., Folland, J. P., Halkon, B. J., & Edirisinghe, E. A. (2015). Advances in Sports Informatics Research Computational Intelligence in Information Systems (pp. 265-274): Springer.
  41. Tin Kam, H., & Basu, M. (2002). Complexity measures of supervised classification problems. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(3), 289-300. doi: 10.1109/34.990132
  42. Trawiński, K. (2010). A fuzzy classification system for prediction of the results of the basketball games. Paper presented at the Fuzzy Systems (FUZZ), 2010 IEEE International Conference.10.1109/FUZZY.2010.5584399
  43. Witnauer, W. D., Rogers, R. G., & Saint Onge, J. M. (2007). Major league baseball career length in the 20th century. Population research and policy review, 26(4), 371-386. doi: 10.1007/s11113-007-9038-5
  44. Witten, I. H., Frank, E., & Hall, M. A. (2011). Data Mining Practical Machine Learning Tools and Techniques (3rd ed.): Morgan Kaufmann Publishers.
  45. Wolf, G. H. (2015). The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball by Benjamin Baumer and Andrew Zimbalist (review). Journal of Sport History, 42(2), 239-241.10.5406/jsporthistory.42.2.0239
  46. Wolpert, D. H., & Macready, W. G. (1997). No Free Lunch Theorems for Optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67-82. doi: 10.1109/4235.585893
  47. Yang, T. Y., & Swartz, T. (2004). A Two-Stage Bayesian Model for Predicting Winners in Major League Baseball. Journal of Data Science, 2, 61-73.10.6339/JDS.2004.02(1).142
  48. Young, W. A., Holland, W. S., & Weckman, G. R. (2008). Determining hall of fame status for major league baseball using an artificial neural network. Journal of Quantitative Analysis in Sports, 4(4). doi: 10.2202/1559-0410.1131
  49. Yuan, L.-H., Liu, A., Yeh, A., Kaufman, A., Reece, A., Bull, P., . . . Bornn, L. (2015). A mixture-of-modelers approach to forecasting NCAA tournament outcomes. Journal of Quantitative Analysis in Sports, 11(1), 13-27. doi: 10.1515/jqas-2014-0056
  50. Zeng, X., & Martinez, T. R. (2000). Distribution-balanced stratified cross-validation for accuracy estimation. Journal of Experimental & Theoretical Artificial Intelligence, 12(1), 1-12. doi: 10.1080/095281300146272
Language: English
Page range: 91 - 112
Published on: Dec 17, 2016
Published by: International Association of Computer Science in Sport
In partnership with: Paradigm Publishing Services
Publication frequency: 2 issues per year

© 2016 C. Soto Valero, published by International Association of Computer Science in Sport
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.