Have a personal or library account? Click to login
SparkNN: A Distributed In-Memory Data Partitioning for KNN Queries on Big Spatial Data Cover

SparkNN: A Distributed In-Memory Data Partitioning for KNN Queries on Big Spatial Data

Open Access
|Aug 2020

Figures & Tables

dsj-19-1080-g1.png
Figure 1

Extension of Spark core to implement SparkNN.

dsj-19-1080-g2.png
Figure 2

Partitioning the spatial data into BBs.

dsj-19-1080-g3.png
Figure 3

Global indexing of the partitions.

Algorithm 1

kNN from SARDDs.

Input: node := root of the tree in a RDD partition, needle := {n0, n1, …} the query point, k := an integer value corresponding to the number of nearest neighbors to find
Output: bpq := A bounded priority queue containing the k-nearest neighbors
Function Knearest (node, needle, k) :
      Declare bpq as a BoundedPriorityQueue to contain can didate nearest neighbors
          Set the size of bpq to k
          Function nearest (node) :
              default ← node
                  if default == NULL then
                        return NearestPoint
                  else
                      Enqueue default into bpq
                  end
                  if ni ≤ defaulti then
                        NearestPoint ← nearest (left(node))
                  else
                        NearestPoint ← nearest (right(node))
                  end
                  if bpq is not full     OR |defaultini|    <    distance(needle, head(bpq)) then
                        if ni ≤ defaulti then
                              NearestPoint ← nearest (right(node))
                        else
                              NearestPoint ← nearest (left(node))
                        end
                        return NearestPoint
                  end
      Knearest (node, needle, k)
          return bpq
dsj-19-1080-g4.png
Figure 4

Impact of k.

dsj-19-1080-g5.png
Figure 5

Impact of data size on data load time.

dsj-19-1080-g6.png
Figure 6

Effect of no. of cores on the scalability of SpakNN.

dsj-19-1080-g7.png
Figure 7

Effect of the number of SARDD on the partitioning time.

Language: English
Submitted on: Oct 9, 2019
Accepted on: Jul 28, 2020
Published on: Aug 24, 2020
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2020 Zaher Al Aghbari, Tasneem Ismail, Ibrahim Kamel, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.