Have a personal or library account? Click to login
SparkNN: A Distributed In-Memory Data Partitioning for KNN Queries on Big Spatial Data Cover

SparkNN: A Distributed In-Memory Data Partitioning for KNN Queries on Big Spatial Data

Open Access
|Aug 2020

References

  1. 1Aji, A, Wang, F, Vo, H, Lee, R, Liu, Q, Zhang, X and Saltz, J. 2013. Hadoop gis: A high performance spatial data warehousing system over mapreduce. Proceedings of the VLDB Endowment, 6(11): 10091020. DOI: 10.14778/2536222.2536227
  2. 2Al Aghbari, Z, Bahutair, M and Kamel, I. 2019. Geosimmr: A mapreduce algorithm for detecting communities based on distance and interest in social networks. Data Science Journal, 18(1): 110. DOI: 10.5334/dsj-2019-013
  3. 3Al Aghbari, Z, Kamel, I and Awad, T. 2012. On clustering large number of data streams. Intelligent Data Analysis, 16(1): 6991. DOI: 10.3233/IDA-2011-0511
  4. 4Al Aghbari, Z, Kamel, I and Elbaroni, W. 2013. Energy-efficient distributed wireless sensor network scheme for cluster detection. International Journal of Parallel, Emergent and Distributed Systems, 28: 1: 128. DOI: 10.1080/17445760.2012.729584
  5. 5Al Jawarneh, IM, Bellavista, P, Corradi, A, Foschini, L, Montanari, R and Zanotti, A. 2018. In-memory Spatial-Aware Framework for Processing Proximity-Alike Queries in Big Spatial Data. In: 2018 IEEE 23rd International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), 16. IEEE. DOI: 10.1109/CAMAD.2018.8514950
  6. 6Alsaafin, A, Khedr, AM and Al Aghbari, Z. 2018. Distributed trajectory design for data gathering using mobile sink in wireless sensor networks. AEU-International Journal of Electronics and Communications, 96: 112. DOI: 10.1016/j.aeue.2018.09.005
  7. 7Babar, M, Arif, F, Jan, M, Tan, Z and Khan, F. 2019. Urban data management system: Towards big data analytics for internet of things based smart urban environment using customized hadoop. Future Generation Computer Systems, 96: 398409. DOI: 10.1016/j.future.2019.02.035
  8. 8Bae, WD, Alkobaisi, S, Kim, SH, Narayanappa, S and Shahabi, C. 2007. Supporting range queries on web data using k-nearest neighbor search. In: International Symposium on Web and Wireless Geographical Information Systems, 6175. Springer. DOI: 10.1007/978-3-540-76925-5_5
  9. 9Dean, J and Ghemawat, S. 2008. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1): 107113. DOI: 10.1145/1327452.1327492
  10. 10Dinges, L, Al-Hamadi, A, Elzobi, M, Al Aghbari, Z and Mustafa, H. 2011. Offline automatic segmentation based recognition of handwritten Arabic words. International Journal of Signal Processing, Image Processing and Pattern Recognition, 4(4): 131143.
  11. 11Eldawy, A and Mokbel, MF. 2015. Spatialhadoop: A mapreduce framework for spatial data. In: 2015 IEEE 31st international conference on Data Engineering, 13521363. IEEE. DOI: 10.1109/ICDE.2015.7113382
  12. 12Garaeva, A, Makhmutova, F, Anikin, I and Sattler, K-U. 2017. A framework for co-location patterns mining in big spatial data. In: 2017 XX IEEE International Conference on Soft Computing and Measurements (SCM), 477480. IEEE. DOI: 10.1109/SCM.2017.7970622
  13. 13Geolite 2 free downloadable databases maxmind developer site. https://dev.maxmind.com/geoip/geoip2/geolite2/. Accessed: 2019-05-22.
  14. 14George, L. 2011. HBase: the definitive guide: random access to your planetsize data. “O’Reilly Media, Inc.”
  15. 15Güting, RH, Behr, T, Düntgen, C and others. 2010. SECONDO: A Platform for Moving Objects Database Research and for Publishing and Integrating Research Implementations. IEEE Data Eng. Bull. 33(2): 5663.
  16. 16Hagedorn, S, Götze, P and Sattler, K-U. 2017. The STARK framework for spatio-temporal data analytics on spark. Datenbanksysteme für Business, Technologie und Web (BTW 2017).
  17. 17Hajebi, K, Abbasi-Yadkori, Y, Shahbazi, H and Zhang, H. 2011. Fast approximate nearest-neighbor search with k-nearest neighbor graph. In: Twenty-Second International Joint Conference on Artificial Intelligence.
  18. 18Hammou, B, Lahcen, A and Mouline, S. 2018. Apra: An approximate parallel recommendation algorithm for big data. Knowledge-Based Systems, 157: 1019. DOI: 10.1016/j.knosys.2018.05.006
  19. 19Hanif, S, Khedr, AM, Al Aghbari, Z and Agrawal, DP. 2018. Opportunistically exploiting internet of things for wireless sensor network routing in smart cities. Journal of Sensor and Actuator Networks, 7(4): 46. DOI: 10.3390/jsan7040046
  20. 20Hughes, JN, Annex, A, Eichelberger, CN, Fox, A, Hulbert, A and Ronquest, M. 2015. Geomesa: A distributed architecture for spatio-temporal fusion. In: Geospatial Informatics, Fusion, and Motion Video Analytics V, 9473: 94730F. International Society for Optics and Photonics. DOI: 10.1117/12.2177233
  21. 21JTS Topology Suite. https://www.osgeo.org/projects/jts/. Accessed: 2019-06-18.
  22. 22Kubo, M, Aghbari, Z, Makinouchi, A and Oh, K-S. 2003. Content-based image retrieval technique using wavelet-based shift and brightness invariant edge feature. International Journal of Wavelets, Multiresolution and Information Processing, 1(2): 163178. DOI: 10.1142/S0219691303000141
  23. 23Lu, J and Güting, RH. 2014. Parallel secondo: A practical system for largescale processing of moving objects. In: 2014 IEEE 30th International Conference on Data Engineering, 11901193. IEEE. DOI: 10.1109/ICDE.2014.6816738
  24. 24Magellan: Geospatial Analytics on Spark. Oct. 2015. https://hortonworks.com/blog/magellan-geospatial-analytics-in-spark/. Accessed: 2019-06-18.
  25. 25Nishimura, S, Das, S, Agrawal, D and El Abbadi, A. 2011. MD-HBase: A scalable multi-dimensional data infrastructure for location aware services. In: 2011 IEEE 12th International Conference on Mobile Data Management, 1: 716. IEEE. DOI: 10.1109/MDM.2011.41
  26. 26OpenStreetMap. https://www.openstreetmap.org/. Accessed: 2019-06-18.
  27. 27Ryan LeCompte. Bounded priority queue in scala. https://gist.github.com/ryanlecompte/5746241. Accessed: 2019-06-08.
  28. 28Sapountzi, A and Psannis, K. 2018. Social networking data analysis tools and challenges. Future Generation Computer Systems, 86: 893913. DOI: 10.1016/j.future.2016.10.019
  29. 29Sarwat, M. 2015. Interactive and Scalable Exploration of Big Spatial Data–A Data Management Perspective. In: 2015 16th IEEE International Conference on Mobile Data Management, 1: 263270. IEEE. DOI: 10.1109/MDM.2015.67
  30. 30Tang, M, Yu, Y, Malluhi, QM, Ouzzani, M and Aref, WG. 2016. Locationspark: A distributed in-memory data management system for big spatial data. Proceedings of the VLDB Endowment, 9(13): 15651568. DOI: 10.14778/3007263.3007310
  31. 31Thusoo, A, Sarma, JS, Jain, N, Shao, Z, Chakka, P, Anthony, S, Liu, H, Wyckoff, P and Murthy, R. 2009. Hive: a warehousing solution over a mapreduce framework. Proceedings of the VLDB Endowment, 2(2): 16261629. DOI: 10.14778/1687553.1687609
  32. 32White, T. 2012. Hadoop: The definitive guide. “O’Reilly Media, Inc.”
  33. 33Whitman, RT, Park, MB, Ambrose, SM and Hoel, EG. 2014. Spatial indexing and analytics on Hadoop. In: Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 7382. ACM. DOI: 10.1145/2666310.2666387
  34. 34Xie, D, Li, F, Yao, B, Li, G, Zhou, L and Guo, M. 2016. Simba: Efficient in-memory spatial analytics. In: Proceedings of the 2016 International Conference on Management of Data, 10711085. ACM. DOI: 10.1145/2882903.2915237
  35. 35Xie, X, Xiong, Z, Hu, X, Zhou, G and Ni, J. 2014. On massive spatial data retrieval based on spark. In: International Conference on Web-Age Information Management, 200208. Springer. DOI: 10.1007/978-3-319-11538-2_19
  36. 36You, S, Zhang, J and Gruenwald, L. 2015. Large-scale spatial join query processing in cloud. In: 2015 31st IEEE International Conference on Data Engineering Workshops, 3441. IEEE. DOI: 10.1109/ICDEW.2015.7129541
  37. 37Yu, J, Wu, J and Sarwat, M. 2015. Geospark: A cluster computing framework for processing large-scale spatial data. In: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, 70. ACM. DOI: 10.1145/2820783.2820860
  38. 38Zaharia, M, Chowdhury, M, Das, T, Dave, A, Ma, J, McCauley, M, Franklin, MJ, Shenker, S and Stoica, I. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, 22. USENIX Association.
  39. 39Zaharia, M, Xin, RS, Wendell, P, Das, T, Armbrust, M, Dave, A, Meng, X, Rosen, J, Venkataraman, S, Franklin, MJ and others. 2016. Apache spark: a unified engine for big data processing. Communications of the ACM, 59(11): 5665. DOI: 10.1145/2934664
  40. 40Zhang, J-D and Chow, C-Y. 2015. Geosoca: Exploiting geographical, social and categorical correlations for point-of-interest recommendations. In: SIGIR, ACM. DOI: 10.1145/2766462.2767711
  41. 41Zhang, Z, Jin, C, Mao, J, Yang, X and Zhou, A. 2017. Trajspark: A scalable and efficient in-memory management system for big trajectory data. In: Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data, 1126. Springer. DOI: 10.1007/978-3-319-63579-8_2
Language: English
Submitted on: Oct 9, 2019
Accepted on: Jul 28, 2020
Published on: Aug 24, 2020
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2020 Zaher Al Aghbari, Tasneem Ismail, Ibrahim Kamel, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.