SparkNN: A Distributed In-Memory Data Partitioning for KNN Queries on Big Spatial Data

Zaher Al Aghbari; Tasneem Ismail; Ibrahim Kamel

doi:10.5334/dsj-2020-035

Figures & Tables

Extension of Spark core to implement SparkNN.

Algorithm 1

kNN from SARDDs.

Input: node := root of the tree in a RDD partition, needle := {n₀, n₁, …} the query point, k := an integer value corresponding to the number of nearest neighbors to find

Output: bpq := A bounded priority queue containing the k-nearest neighbors

Function Knearest (node, needle, k) :

Declare bpq as a BoundedPriorityQueue to contain can didate nearest neighbors

Set the size of bpq to k

Function nearest (node) :

default ← node

if default == NULL then

return NearestPoint

else

Enqueue default into bpq

end

if n_i ≤ default_i then

NearestPoint ← nearest (left(node))

else

NearestPoint ← nearest (right(node))

end

if bpq is not full OR |default_i – n_i| < distance(needle, head(bpq)) then

if n_i ≤ default_i then

NearestPoint ← nearest (right(node))

else

NearestPoint ← nearest (left(node))

end

return NearestPoint

end

Knearest (node, needle, k)

return bpq

Effect of no. of cores on the scalability of SpakNN.

Effect of the number of SARDD on the partitioning time.

SparkNN: A Distributed In-Memory Data Partitioning for KNN Queries on Big Spatial Data

Figures & Tables

Figure 1

Figure 2

Figure 3

Algorithm 1

Figure 4

Figure 5

Figure 6

Figure 7

Paradigm

My account