Research and Improvement of Apriori Algorithm Based on Hadoop

Pengfei, Gao; Jianguo, Wang; Pengcheng, Liu

doi:10.21307/ijanmc-2019-012

Abstract

Association rules can forcefully get a horizontal relation in the big data, the Apriori algorithm is one of the most significant association rules. Traditional mining based on parallel Apriori algorithms needs much more time in data IO with the increasing size of large transaction database. This paper improves the Apriori algorithm from compressing transactions, reducing the number of scans and simplifying candidate set generation. And then the improved algorithm is parallelized on the Hadoop framework. The experiments show that this improved algorithm is suitable for large-scale data mining and has good scalability and effectiveness.

References

K. WANG, Y. HE, J. HAN. Mining Frequent Itemsets Using Support Constraints. Proc2000 Int. Conf. Very Large Data Bases[J]. Cairo, Egypt, 2000. 9: 43-52.
Search in Google Scholar Back to article
Yan Xiaofei. Research on Association Rule Mining Algorithm[D]. Chongqing: Chongqing University, 2009:15-21.
Search in Google Scholar Back to article
AGRAWAL R.SRIKANT R.Fast algorithm for mining a ssociation rules[C]//Proceedings of 20th Int. Conf. Very Large Data Bases(VLDB). Morgan Kaufman Press, 1994:487-499.
Search in Google Scholar Back to article
REN W J, YU B W. Improved Apriori Algorithm Based on Matrix Reducation[J]. Computer and Modern, 2015, 10. 2-3. (in Chinese)
Search in Google Scholar Back to article
GUNARATHNE T, WU TL, QIU J, et al. MapReduce in the Clouds for Science[C]//2010 IEEE Second International Conference on Cloud Computing Technology and Science (Cloudcom). IEEE,2010;565-572
Search in Google Scholar Back to article
DEAN J, GHEMAWAT S. MapReduce: simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1): 107-113.
Search in Google Scholar Back to article
HE B S, TAO M, YUAN X M. Alternating direction me-thod with Gaussian back substitution for Separable convex programming [J]. SIAM J. Optimization, 2012, 22(2): 313-340.
Search in Google Scholar Back to article
HE B S, LIAO L Z, YUAN X M. Alternating projection based prediction-correction methods for structured variational inequalities[J]. Computational Mathematics, 2006, 24(6):693-710.
Search in Google Scholar Back to article
CHEN Z M, WAN L, YANG Q Z. An Inexact Direction Methodfor Structured Variational Inequalities[J]. Journal of Optimization Theory & Applications, 2014, 163(2): 439-459.
Search in Google Scholar Back to article
Lu Jiaheng. Hadoop Combat [M]. Beijing: Mechanical Industry Press, 2011: 17-128.
Search in Google Scholar Back to article

Research and Improvement of Apriori Algorithm Based on Hadoop

Abstract

Paradigm

My account