Decision-Making Enhancement in a Big Data Environment: Application of the K-Means Algorithm to Mixed Data

Oded Koren; Carina Antonia Hallin; Nir Perel; Dror Bendet

doi:10.2478/jaiscr-2019-0010

.blurhash-client-img { display: none !important; }

Decision-Making Enhancement in a Big Data Environment: Application of the K-Means Algorithm to Mixed Data

Journal of Artificial Intelligence and Soft Computing Research

Volume 9 (2019): Issue 4 (October 2019)

By: Oded Koren, Carina Antonia Hallin, Nir Perel and Dror Bendet

Open Access

|Aug 2019

[1] Ahmed Abbasi, Suprateek Sarker, and Roger HL Chiang. Big data research in information systems: Toward an inclusive research agenda. Journal of the Association for Information Systems, 17(2):I, 2016.10.17705/1jais.00423
Search in Google Scholar Back to article
[2] Ritu Agarwal and Vasant Dhar. Big data, data science, and analytics: The opportunity and challenge for is research. Information Systems Research, 25(3):443–448, 2014.10.1287/isre.2014.0546
Open DOI Search in Google Scholar Back to article
[3] Amir Ahmad and Lipika Dey. A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering, 63(2):503–527, 2007.10.1016/j.datak.2007.03.016
Open DOI Search in Google Scholar Back to article
[4] Pavel Berkhin. A survey of clustering data mining techniques. In Grouping multidimensional data, pages 25–71. Springer, 2006.10.1007/3-540-28349-8_2
Search in Google Scholar Back to article
[5] Xiao Cai, Feiping Nie, and Heng Huang. Multi-view k-means clustering on big data. In Twenty-Third International Joint conference on artificial intelligence, 2013.
Search in Google Scholar Back to article
[6] Xiaoli Cui, Pingfei Zhu, Xin Yang, Keqiu Li, and Changqing Ji. Optimized big data k-means clustering using mapreduce. The Journal of Supercomputing, 70(3):1249–1259, 2014.10.1007/s11227-014-1225-7
Search in Google Scholar Back to article
[7] Kenneth Cukier and Viktor Mayer-Schoenberger. The rise of big data: How it’s changing the way we think about the world. Foreign Aff., 92:28, 2013.
Search in Google Scholar Back to article
[8] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.10.1145/1327452.1327492
Search in Google Scholar Back to article
[9] Yuri Demchenko, Canh Ngo, and Peter Membrey. Architecture framework and components for the big data ecosystem. Journal of System and Network Engineering, pages 1–31, 2013.10.1109/CTS.2014.6867550
Search in Google Scholar Back to article
[10] Dany Di Tullio and D Sandy Staples. The governance and control of open source software projects. Journal of Management Information Systems, 30(3):49–80, 2013.10.2753/MIS0742-1222300303
Search in Google Scholar Back to article
[11] Gal Engelberg, Oded Koren, and Nir Perel. Big data performance evaluation analysis using apache pig. International Journal of Software Engineering and Its Applications, 10(11):429–440, 2016.10.14257/ijseia.2016.10.11.34
Search in Google Scholar Back to article
[12] Johann Füller, Katja Hutter, Julia Hautz, and Kurt Matzler. User roles and contributions in innovation-contest communities. Journal of Management Information Systems, 31(1):273–308, 2014.10.2753/MIS0742-1222310111
Open DOI Search in Google Scholar Back to article
[13] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, pages 20–43, Bolton Landing, NY, 2003.10.1145/945445.945450
Search in Google Scholar Back to article
[14] Shanshan Guo, Xitong Guo, Yulin Fang, and Doug Vogel. How doctors gain social and economic returns in online health-care communities: a professional capital perspective. Journal of Management Information Systems, 34(2):487–519, 2017.10.1080/07421222.2017.1334480
Search in Google Scholar Back to article
[15] Bock Hans-Hermann. Origins and extensions of the k-means algorithm in cluster analysis. Journal Electronique dHistoire des Probabilités et de la Statistique Electronic Journal for History of Probability and Statistics, 4:48–49, 2008.
Search in Google Scholar Back to article
[16] Doug Henschen. Why sears is going all-in on hadoop. Information week. Retrieved July, 1:2014, 2012.
Search in Google Scholar Back to article
[17] Joshua Zhexue Huang, Michael K Ng, Hongqiang Rong, and Zichen Li. Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence, (5):657–668, 2005.10.1109/TPAMI.2005.9515875789
Open DOI Search in Google Scholar Back to article
[18] Zhexue Huang. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, 2(3):283–304, 1998.10.1023/A:1009769707641
Search in Google Scholar Back to article
[19] Cisco Visual Networking Index. The zettabyte era–trends and analysis. Cisco white paper, 2013.
Search in Google Scholar Back to article
[20] Anil K Jain. Data clustering: 50 years beyond k-means. Pattern recognition letters, 31(8):651–666, 2010.10.1016/j.patrec.2009.09.011
Search in Google Scholar Back to article
[21] Tapas Kanungo, David M Mount, Nathan S Netanyahu, Christine D Piatko, Ruth Silverman, and Angela Y Wu. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis & Machine Intelligence, (7):881–892, 2002.10.1109/TPAMI.2002.1017616
Open DOI Search in Google Scholar Back to article
[22] Daniel Kendal, Oded Koren, and Nir Perel. Pig vs. hive use case analysis. International Journal of Database Theory and Application, 9(12):267–276, 2016.10.14257/ijdta.2016.9.12.24
Search in Google Scholar Back to article
[23] Oded Koren, Carina Antonia Hallin, Nir Perel, and Dror Bendet. Enhancement of the k-means algorithm for mixed data in big data platforms. In Proceedings of SAI Intelligent Systems Conference, pages 1025–1040. Springer, 2018.10.1007/978-3-030-01054-6_71
Search in Google Scholar Back to article
[24] Sara Landset, Taghi M Khoshgoftaar, Aaron N Richter, and Tawfiq Hasanin. A survey of open source tools for machine learning with big data in the hadoop ecosystem. Journal of Big Data, 2(1):24, 2015.10.1186/s40537-015-0032-1
Search in Google Scholar Back to article
[25] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H Byers. Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute, 2011.
Search in Google Scholar Back to article
[26] R Angelin Preethi and J Elavarasi. Big data analytics using hadoop tools, pache hive vs apache pig. International Journal of Emerging Technology in Computer Science & Electronics, 24(3), 2017.
Search in Google Scholar Back to article
[27] Arun Rai. Editor’s comments: Synergies between big data and theory. MIS quarterly, 40(2):iii–ix, 2016.
Search in Google Scholar Back to article
[28] Henri Ralambondrainy. A conceptual version of the k-means algorithm. Pattern Recognition Letters, 16(11):1147–1157, 1995.10.1016/0167-8655(95)00075-R
Search in Google Scholar Back to article
[29] Alok R Saboo, V Kumar, and Insu Park. Using big data to model time-varying effects for marketing resource (re) allocation. MIS Quarterly, 40(4), 2016.10.25300/MISQ/2016/40.4.06
Search in Google Scholar Back to article
[30] Ohn Mar San, Van-Nam Huynh, and Yoshiteru Nakamori. An alternative extension of the k-means algorithm for clustering categorical data. International Journal of Applied Mathematics and Computer Science, 14:241–247, 2004.
Search in Google Scholar Back to article
[31] Prasanna Tambe. Big data investment, skills, and firm value. Management Science, 60(6):1452–1469, 2014.10.1287/mnsc.2014.1899
Open DOI Search in Google Scholar Back to article
[32] Tom White. Hadoop: The definitive guide. O’Reilly Media, Inc., 2012.
Search in Google Scholar Back to article
[33] Rui Xu and Donald C Wunsch. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3):645–678, 2005.10.1109/TNN.2005.84514115940994
Open DOI Search in Google Scholar Back to article

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.2478/jaiscr-2019-0010

Journal RSS Feed

Language: English

Page range: 293 - 302

Submitted on: May 8, 2019

Accepted on: Jul 25, 2019

Published on: Aug 30, 2019

Published by: SAN University

In partnership with: Paradigm Publishing Services

Keywords:

Big data,

mixed data,

Hadoop,

K-means,

decision making

Related subjects:

Computer sciences,

Databases and data mining,

Artificial intelligence

© 2019 Oded Koren, Carina Antonia Hallin, Nir Perel, Dror Bendet, published by SAN University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 9 (2019): Issue 4 (October 2019)