Have a personal or library account? Click to login
Decision-Making Enhancement in a Big Data Environment: Application of the K-Means Algorithm to Mixed Data Cover

Decision-Making Enhancement in a Big Data Environment: Application of the K-Means Algorithm to Mixed Data

Open Access
|Aug 2019

References

  1. [1] Ahmed Abbasi, Suprateek Sarker, and Roger HL Chiang. Big data research in information systems: Toward an inclusive research agenda. Journal of the Association for Information Systems, 17(2):I, 2016.10.17705/1jais.00423
  2. [2] Ritu Agarwal and Vasant Dhar. Big data, data science, and analytics: The opportunity and challenge for is research. Information Systems Research, 25(3):443–448, 2014.10.1287/isre.2014.0546
  3. [3] Amir Ahmad and Lipika Dey. A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering, 63(2):503–527, 2007.10.1016/j.datak.2007.03.016
  4. [4] Pavel Berkhin. A survey of clustering data mining techniques. In Grouping multidimensional data, pages 25–71. Springer, 2006.10.1007/3-540-28349-8_2
  5. [5] Xiao Cai, Feiping Nie, and Heng Huang. Multi-view k-means clustering on big data. In Twenty-Third International Joint conference on artificial intelligence, 2013.
  6. [6] Xiaoli Cui, Pingfei Zhu, Xin Yang, Keqiu Li, and Changqing Ji. Optimized big data k-means clustering using mapreduce. The Journal of Supercomputing, 70(3):1249–1259, 2014.10.1007/s11227-014-1225-7
  7. [7] Kenneth Cukier and Viktor Mayer-Schoenberger. The rise of big data: How it’s changing the way we think about the world. Foreign Aff., 92:28, 2013.
  8. [8] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.10.1145/1327452.1327492
  9. [9] Yuri Demchenko, Canh Ngo, and Peter Membrey. Architecture framework and components for the big data ecosystem. Journal of System and Network Engineering, pages 1–31, 2013.10.1109/CTS.2014.6867550
  10. [10] Dany Di Tullio and D Sandy Staples. The governance and control of open source software projects. Journal of Management Information Systems, 30(3):49–80, 2013.10.2753/MIS0742-1222300303
  11. [11] Gal Engelberg, Oded Koren, and Nir Perel. Big data performance evaluation analysis using apache pig. International Journal of Software Engineering and Its Applications, 10(11):429–440, 2016.10.14257/ijseia.2016.10.11.34
  12. [12] Johann Füller, Katja Hutter, Julia Hautz, and Kurt Matzler. User roles and contributions in innovation-contest communities. Journal of Management Information Systems, 31(1):273–308, 2014.10.2753/MIS0742-1222310111
  13. [13] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, pages 20–43, Bolton Landing, NY, 2003.10.1145/945445.945450
  14. [14] Shanshan Guo, Xitong Guo, Yulin Fang, and Doug Vogel. How doctors gain social and economic returns in online health-care communities: a professional capital perspective. Journal of Management Information Systems, 34(2):487–519, 2017.10.1080/07421222.2017.1334480
  15. [15] Bock Hans-Hermann. Origins and extensions of the k-means algorithm in cluster analysis. Journal Electronique dHistoire des Probabilités et de la Statistique Electronic Journal for History of Probability and Statistics, 4:48–49, 2008.
  16. [16] Doug Henschen. Why sears is going all-in on hadoop. Information week. Retrieved July, 1:2014, 2012.
  17. [17] Joshua Zhexue Huang, Michael K Ng, Hongqiang Rong, and Zichen Li. Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence, (5):657–668, 2005.10.1109/TPAMI.2005.9515875789
  18. [18] Zhexue Huang. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, 2(3):283–304, 1998.10.1023/A:1009769707641
  19. [19] Cisco Visual Networking Index. The zettabyte era–trends and analysis. Cisco white paper, 2013.
  20. [20] Anil K Jain. Data clustering: 50 years beyond k-means. Pattern recognition letters, 31(8):651–666, 2010.10.1016/j.patrec.2009.09.011
  21. [21] Tapas Kanungo, David M Mount, Nathan S Netanyahu, Christine D Piatko, Ruth Silverman, and Angela Y Wu. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis & Machine Intelligence, (7):881–892, 2002.10.1109/TPAMI.2002.1017616
  22. [22] Daniel Kendal, Oded Koren, and Nir Perel. Pig vs. hive use case analysis. International Journal of Database Theory and Application, 9(12):267–276, 2016.10.14257/ijdta.2016.9.12.24
  23. [23] Oded Koren, Carina Antonia Hallin, Nir Perel, and Dror Bendet. Enhancement of the k-means algorithm for mixed data in big data platforms. In Proceedings of SAI Intelligent Systems Conference, pages 1025–1040. Springer, 2018.10.1007/978-3-030-01054-6_71
  24. [24] Sara Landset, Taghi M Khoshgoftaar, Aaron N Richter, and Tawfiq Hasanin. A survey of open source tools for machine learning with big data in the hadoop ecosystem. Journal of Big Data, 2(1):24, 2015.10.1186/s40537-015-0032-1
  25. [25] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H Byers. Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute, 2011.
  26. [26] R Angelin Preethi and J Elavarasi. Big data analytics using hadoop tools, pache hive vs apache pig. International Journal of Emerging Technology in Computer Science & Electronics, 24(3), 2017.
  27. [27] Arun Rai. Editor’s comments: Synergies between big data and theory. MIS quarterly, 40(2):iii–ix, 2016.
  28. [28] Henri Ralambondrainy. A conceptual version of the k-means algorithm. Pattern Recognition Letters, 16(11):1147–1157, 1995.10.1016/0167-8655(95)00075-R
  29. [29] Alok R Saboo, V Kumar, and Insu Park. Using big data to model time-varying effects for marketing resource (re) allocation. MIS Quarterly, 40(4), 2016.10.25300/MISQ/2016/40.4.06
  30. [30] Ohn Mar San, Van-Nam Huynh, and Yoshiteru Nakamori. An alternative extension of the k-means algorithm for clustering categorical data. International Journal of Applied Mathematics and Computer Science, 14:241–247, 2004.
  31. [31] Prasanna Tambe. Big data investment, skills, and firm value. Management Science, 60(6):1452–1469, 2014.10.1287/mnsc.2014.1899
  32. [32] Tom White. Hadoop: The definitive guide. O’Reilly Media, Inc., 2012.
  33. [33] Rui Xu and Donald C Wunsch. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3):645–678, 2005.10.1109/TNN.2005.84514115940994
Language: English
Page range: 293 - 302
Submitted on: May 8, 2019
Accepted on: Jul 25, 2019
Published on: Aug 30, 2019
Published by: SAN University
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2019 Oded Koren, Carina Antonia Hallin, Nir Perel, Dror Bendet, published by SAN University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.