Have a personal or library account? Click to login
Embedding-based Detection and Extraction of Research Topics from Academic Documents Using Deep Clustering Cover

Embedding-based Detection and Extraction of Research Topics from Academic Documents Using Deep Clustering

Open Access
|Jun 2021

Figures & Tables

Figure 1

The overall and simplified process and objectives.
The overall and simplified process and objectives.

Figure 2

PV-DM model diagram.
PV-DM model diagram.

Figure 3

DEC Autoencoder diagram. K is the number of clusters.
DEC Autoencoder diagram. K is the number of clusters.

Figure 4

Additional WoS data figure.
Additional WoS data figure.

Figure 5

Data growth from 1970 to 2019 in the three journals, yielding over about 4300 records.
Data growth from 1970 to 2019 in the three journals, yielding over about 4300 records.

Figure 6

Diagram of clusters in the study period. Y axis is the number of articles (symmetrical). X axis is years.
Diagram of clusters in the study period. Y axis is the number of articles (symmetrical). X axis is years.

Figure 7

N-gram dictionary histogram of the dataset. Y-axis is frequency and X-axis is N.
N-gram dictionary histogram of the dataset. Y-axis is frequency and X-axis is N.

Comparison of clustering techniques for pre-trained BERT embeddings_

EmbeddingClusteringEvaluation Metrics

NMIAMIARI
BERT (uncased A12)kmeans (rand)0.195070.19480.1442
kmeans++0.19500.19480.1442
Agglomerative0.21580.21560.1877
DBSCAN0.00420.00370.0001
DEC0.25680.25660.2377
SciBERTkmeans (rand)0.14980.14960.1266
kmeans++0.14920.14890.1266
Agglomerative0.19030.19010.1505
DBSCAN0.00420.00370.0001
DEC0.17760.17740.1731

Ordered key phrases of the clusters_

Cluster #Terms (Normalized TF-IDF score)
c 1creativity(1.00), sentiment_analysis(0.85), university(0.81), facial(0.79), insect(0.74), dreyfus(0.71), expert_system(0.67), music(0.65), indian_language(0.64), recommendation(0.63), argumentation(0.62), swarm(0.62), data_mining(0.61), face_recognition(0.61), natural_language_processing(0.60)
c 2ois(1.00), execution(0.98), sinix(0.88), perception(0.80), people(0.75), unix(0.69), team(0.66), discourse(0.62), intention(0.57
c 3revision(1.00), contraction(0.70), postulate(0.65), horn(0.65)
c 4csp(1.00), propagation(0.80), arc_consistency(0.75), backjumping(0.59)
c 5description_logic(1.00), deep_learning(0.89), ontology(0.74), rcc(0.56)
c 6auction(1.00), equilibrium(0.74), election(0.66), coalition(0.66), bargaining(0.56)
c 7support_vector_machine(1.00), classifier(0.68), knee(0.66)
c 8document(1.00), wikipedia(0.99), wordnet(0.68), dictionary(0.63)
c 9phase_transition(1.00), minimax(0.89), voting(0.87), alpha_beta(0.75), chess(0.69), backbone(0.64), optimal_solution(0.63), heuristic_function(0.63), game_tree(0.61), ratio(0.59), heuristic_search(0.59), monte_carlo_tree_search(0.55)
c 10execution(1.00), reward(0.80), ebl(0.77), pomdp(0.68), team(0.66), heuristic_search(0.64), action_model(0.63), portfolio(0.60), monte_carlo_tree_search(0.59), mdp(0.59), conformant(0.58), mdps(0.57)

Comparison of clustering techniques with various embeddings on two different training corpora_

Training Dataset =>KIPRISWoS+KIPRIS


Evaluation metrics=>NMIAMIARINMIAMIARI
EmbeddingClustering
6*FastText (mean)K-means (rand)0.3790.3790.3120.3870.3870.327
K-means (++)0.3790.3790.3220.3870.3870.327
Hierarchy Aggl.0.3910.3910.2890.3630.3630.306
DBSCAN0.0060.0050.0000.0050.0050.000
DEC0.5110.5110.5040.4590.4590.400
DEC (scaled)0.3290.3290.2680.2840.2830.239
6*FastText (w. mean)K-means (rand)0.2430.2430.1860.2390.2390.184
K-means (++)0.2430.2430.1860.2390.2390.184
Hierarchy Aggl.0.2600.2600.1400.2340.2340.176
DBSCAN0.0370.0350.0010.0110.0100.000
DEC0.3480.3470.3210.3520.3520.300
DEC (scaled)0.2010.2010.1690.1720.1720.158
6*Doc2VecK-means (rand)0.5860.5860.6290.7120.7120.742
K-means (++)0.5860.5860.6300.7110.7110.741
Hierarchy Aggl.0.4440.4440.4570.6020.6020.633
DBSCAN0.0040.0040.0000.0040.0040.000
DEC0.6000.6000.6290.7340.7340.759
DEC (scaled)0.2350.2350.2200.3220.3220.279
Topic Modeling NMIAMIARI
LDA 0.3500.3500.291
DOI: https://doi.org/10.2478/jdis-2021-0024 | Journal eISSN: 2543-683X | Journal ISSN: 2096-157X
Language: English
Page range: 99 - 122
Submitted on: Nov 30, 2020
Accepted on: Apr 26, 2021
Published on: Jun 18, 2021
Published by: Chinese Academy of Sciences, National Science Library
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2021 Sahand Vahidnia, Alireza Abbasi, Hussein A. Abbass, published by Chinese Academy of Sciences, National Science Library
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.