GeoSimMR: A MapReduce Algorithm for Detecting Communities based on Distance and Interest in Social Networks

Figures & Tables

The GeoSim Algorithm.

Input: V: set of all nodes in the social network, E: set of all edges in the network, I: set of all interests for all nodes and K: number of communities.
Output: C: set of detected communities.
1:	procedure GEOSIM(V, E, I, K)
2:	C = createRandomClusters(V, K);
3:	distances = apsp(V, E) // all-pairs shortest paths
4:	while community_variation > threshold do
5:	for c ∈ C do
6:	max_degree = 0
7:	for node ∈ c do
8:	if node.degree > max_degree then
9:	mean_nodes[c] = node
10:	max_degree = node.degree
11:	end if
12:	end for
13:	end for
14:	for v ∈ V do
15:	min_gss = ∞
16:	for mean_node ∈ mean_nodes do
17:	distances = distances(mean_node, v)
18:	calculate gss using equation 1
19:	if gss < min_gss then
20:	c_id = mean_node.community
21:	min_gss = gss
22:	end if
23:	end for
24:	C[c_id].addNode(v)
25:	end for
26:	end while
27:	return C;
28:	end procedure

Mean Nodes Selection Stage.

Input: input_file: The input file, a file containing the nodes information.
Output: output_file: A file containing a set key-value pairs, the key is the community id and the value is the node id and its interests.
1:	procedure MAP(line)
2:	node = parseNode(line)
3:	node.calculateFriendsNumber()
4:	emit(node.communityId, node)
5:	end procedure
6:
7:	procedure REDUCE(communityId, nodesList)
8:	maxFriends = 0
9:	for node ∈ nodesList do
10:	if node.friendsNumber > maxFriends then
11:	meanNode = node
12:	maxFriends = node.friendsNumber
13:	end if
14:	end for
15:	emit(communityId, node.printIdAndInterests())
16:	end procedure

Distances and Similarities Calculation Stage.

Input: input_file: The input file which contains the nodes information., meanNodes: A list of mean nodes obtained from the Mean Nodes Selection stage.
Output: output_file: A file containing the distances and the similarities from each node to all mean nodes.
1:	procedure MAP(line)
2:	node = parseNode(line)
3:	for meanNode ∈ meanNodes do
4:	if isFirstIteration then
5:	node.calculateSimilarity(meanNode)
6:	end if
7:	if node == meanNode\|\|node.v == G then
8:	if node == meanNode then
9:	node.distance = 0
10:	end if
11:	for friendinnode.friends do
12:	friend.v = G
13:	friend.distance = node.distance + 1
14:	emit({meanNode.communityId,
15:	friend.id}, friend)
16:	end for
17:	end if
18:	emit({meanNode.communityId, node.id}, node)
19:	end for
20:	end procedure
21:
22:	procedure REDUCE(communityId, nodeId, nodesList)
23:	outNode.communityId = nodeId
24:	outNode.id = communityId
25:	outNote.friends = getFriends(nodesList)
26:	outNote.interests = getInterests(nodesList)
27:	for node ∈ nodesList do
28:	if outNode.distance > node.distance then
29:	outNode.distance = node.distance
30:	end if
31:	if outNode.similarity < node.similarity then
32:	outNode.similarity = node.similarity
33:	end if
34:	if outNode.v < node.v then
35:	outNode.v = node.v
36:	end if
37:	end for
38:	emit (outNode.id, outNode.printInfo())
39:	end procedure

Clustering Stage.

Input: input_file: The output file of the Distances and Similarities Calculation.
Output: output_file: List of all nodes and their detected communities.
1:	procedure MAP(line)
2:	node = parseNode(line)
3:	Calculate GSS using equation 1
4:	emit(node, GSS)
5:	end procedure
6:
7:	procedure REDUCE(communityId, communityGSSPairs)
8:	minScore = inf
9:	for pair ∈ communityGSSPairs do
10:	if pair.GSS < minScore then
11:	minScore = pair.GSS
12:	communityId = pair.communityId
13:	end if
14:	end for
15:	node.communityId = communityId
16:	emit(node.id, node.printInfo())
17:	end procedure

DBLP authors used as a ground truth in the community detection experiment.