Have a personal or library account? Click to login
Towards Explainable Graph Spectral Clustering for BERT Embeddings Cover

Figures & Tables

N-based clustering of Dataset 0 using various embeddings (vectorizers)

VectorizerF1-avgF1-stdev
CountVectorizer0.2551520.000049
TfVectorizer0.2551460.000102
TfidfVectorizer0.3964310.000360
GloVe@wiki0.3639630.000401
GloVe@twitter0.3559270.001543
sBERT@all-MiniLM-L6-v20.9749790.000092
sBERT@all-distilroberta-v10.9518570.000382
sBERT@multi-qa-mpnet-base-dot-v10.9422630.000090
BERT@bert-base-uncased#[CLS]0.4727230.000262
BERT@vinai/bertweet-base#[CLS]0.6757330.000439
BERT@distilbert-base-uncased#[CLS]0.5928540.000481
BERT@cardiffnlp/twitter-roberta-base#[CLS]0.8315080.000091
BERT@bert-base-uncased#T_AVG0.6025340.000423
BERT@vinai/bertweet-base#T_AVG0.6188350.000151
BERT@distilbert-base-uncased#T_AVG0.5586880.000899
BERT@cardiffnlp/twitter-roberta-base#T_AVG0.4769530.002215

N-based clustering of Dataset 2 using various embeddings (vectorizers)

VectorizerF1-avgF1-stdev
CountVectorizer0.2774860.000189
TfVectorizer0.2775340.000169
TfidfVectorizer0.5753430.000483
GloVe@twitter0.3853000.000959
sBERT@all-MiniLM-L12-v20.8343720.000242
sBERT@multi-qa-distilbert-cos-v10.8201150.000252
sBERT@multi-qa-mpnet-base-dot-v10.8547590.000197
BERT@bert-base-uncased#[CLS]0.4884750.000369
BERT@vinai/bertweet-base#[CLS]0.7808500.000328
BERT@distilbert-base-uncased#[CLS]0.4645720.000172
BERT@cardiffnlp/twitter-roberta-base#[CLS]0.6326360.000423
BERT@bert-base-uncased#T_AVG0.4782800.000458
BERT@vinai/bertweet-base#T_AVG0.5400160.003658
BERT@distilbert-base-uncased#T_AVG0.4441830.000424
BERT@cardiffnlp/twitter-roberta-base#T_AVG0.5519780.000632

k-means clustering of Dataset 4 using various embeddings (vectorizers)

VectorizerF1-avg
CountVectorizer0.286061
TfVectorizer0.286051
TfidfVectorizer0.311880
GloVe@wiki0.345985
GloVe@twitter0.239728
sBERT@all-MiniLM-L6-v20.715959
sBERT@all-MiniLM-L12-v20.708030
sBERT@all-mpnet-base-v20.717453
sBERT@all-distilroberta-v10.780480
sBERT@multi-qa-MiniLM-L6-cos-v10.683618
sBERT@multi-qa-distilbert-cos-v10.736226
sBERT@multi-qa-mpnet-base-dot-v10.761548
BERT@bert-base-uncased#[CLS]0.410637
BERT@vinai/bertweet-base#[CLS]0.399653
BERT@distilbert-base-uncased#[CLS]0.516114
BERT@cardiffnlp/twitter-roberta-base#[CLS]0.709762
BERT@bert-base-uncased#T_AVG0.531288
BERT@vinai/bertweet-base#T_AVG0.432609
BERT@distilbert-base-uncased#T_AVG0.523886
BERT@cardiffnlp/twitter-roberta-base#T_AVG0.505151

N-based clustering of Dataset 3 using various embeddings (vectorizers)

VectorizerF1-avgF1-stdev
CountVectorizer0.2318420.001354
TfVectorizer0.2315460.001468
TfidfVectorizer0.3204990.000485
GloVe@twitter0.4445210.000620
sBERT@all-MiniLM-L6-v20.9851080.000009
sBERT@all-MiniLM-L12-v20.9869210.000000
sBERT@multi-qa-MiniLM-L6-cos-v10.9604670.000130
sBERT@multi-qa-mpnet-base-dot-v10.9785970.000071
BERT@bert-base-uncased#[CLS]0.5374860.000581
BERT@vinai/bertweet-base#[CLS]0.6417150.000124
BERT@distilbert-base-uncased#[CLS]0.6336630.000141
BERT@cardiffnlp/twitter-roberta-base#[CLS]0.9355860.000230
BERT@bert-base-uncased#T_AVG0.6529090.000174
BERT@vinai/bertweet-base#T_AVG0.7134280.000090
BERT@distilbert-base-uncased#T_AVG0.6114320.000196
BERT@cardiffnlp/twitter-roberta-base#T_AVG0.6774520.000217

k-means clustering of Dataset 1 using various embeddings (vectorizers)

VectorizerF1-avg
CountVectorizer0.325411
TfVectorizer0.325468
TfidfVectorizer0.467889
GloVe@wiki0.183522
GloVe@twitter0.212565
sBERT@all-MiniLM-L6-v20.890326
sBERT@all-MiniLM-L12-v20.912272
sBERT@all-mpnet-base-v20.729900
sBERT@all-distilroberta-v10.821762
sBERT@multi-qa-MiniLM-L6-cos-v10.862951
sBERT@multi-qa-distilbert-cos-v10.816841
sBERT@multi-qa-mpnet-base-dot-v10.797446
BERT@bert-base-uncased#[CLS]0.389700
BERT@vinai/bertweet-base#[CLS]0.558415
BERT@distilbert-base-uncased#[CLS]0.533903
BERT@cardiffnlp/twitter-roberta-base#[CLS]0.707646
BERT@bert-base-uncased#T_AVG0.505968
BERT@vinai/bertweet-base#T_AVG0.570867
BERT@distilbert-base-uncased#T_AVG0.529057
BERT@cardiffnlp/twitter-roberta-base#T_AVG0.531740

N-based clustering of Dataset 4 using various embeddings (vectorizers)

VectorizerFl-avgF1-stdev
CountVectorizer0.2389780.000000
TfVectorizer0.2389780.000000
TfidfVectorizer0.2391200.000000
GloVe@wiki0.2922450.000981
GloVe@twitter0.3273670.059270
sBERT@multi-qa-mpnet-base-dot-v10.6709610.000124
BERT@bert-base-uncased#[CLS]0.5324330.000220
BERT@vinai/bertweet-base#[CLS]0.4501240.000068
BERT@distilbert-base-uncased#[CLS]0.6444040.000593
BERT@cardiffnlp/twitter-roberta-base#[CLS]0.7691690.000094
BERT@bert-base-uncased#T_AVG0.6147460.000177
BERT@vinai/bertweet-base#T_AVG0.5924770.000221
BERT@distilbert-base-uncased#T_AVG0.6632720.000055
BERT@cardiffnlp/twitter-roberta-base#T_AVG0.6724320.000529

k-means clustering of Dataset 0 using various embeddings (vectorizers)

VectorizerF1-avg
CountVectorizer0.249608
TfVectorizer0.249584
TfidfVectorizer0.348829
GloVe@wiki0.285086
GloVe@twitter0.239426
sBERT@all-MiniLM-L6-v20.961594
sBERT@all-MiniLM-L12-v20.937874
sBERT@all-mpnet-base-v20.925444
sBERT@all-distilroberta-v10.929055
sBERT@multi-qa-MiniLM-L6-cos-v10.941966
sBERT@multi-qa-distilbert-cos-v10.952534
sBERT@multi-qa-mpnet-base-dot-v10.762779
BERT@bert-base-uncased#[CLS]0.346147
BERT@vinai/bertweet-base#[CLS]0.492142
BERT@distilbert-base-uncased#[CLS]0.445644
BERT@cardiffnlp/twitter-roberta-base#[CLS]0.540962
BERT@bert-base-uncased#T_AVG0.401162
BERT@vinai/bertweet-base#T_AVG0.369402
BERT@distilbert-base-uncased#T_AVG0.400333
BERT@cardiffnlp/twitter-roberta-base#T_AVG0.362900

k-means clustering of Dataset 2 using various embeddings (vectorizers)

VectorizerF1-avg
CountVectorizer0.244935
TfVectorizer0.245068
TfidfVectorizer0.470825
GloVe@wiki0.211871
GloVe@twitter0.257164
sBERT@all-MiniLM-L6-v20.811696
sBERT@all-MiniLM-L12-v20.788610
sBERT@all-mpnet-base-v20.631467
sBERT@all-distilroberta-v10.657888
sBERT@multi-qa-MiniLM-L6-cos-v10.715318
sBERT@multi-qa-distilbert-cos-v10.586079
sBERT@multi-qa-mpnet-base-dot-v10.722792
BERT@bert-base-uncased#[CLS]0.376012
BERT@vinai/bertweet-base#[CLS]0.403862
BERT@distilbert-base-uncased#[CLS]0.363897
BERT@cardiffnlp/twitter-roberta-base#[CLS]0.586548
BERT@bert-base-uncased#T_AVG0.393337
BERT@vinai/bertweet-base#T_AVG0.363000
BERT@distilbert-base-uncased#T_AVG0.391392
BERT@cardiffnlp/twitter-roberta-base#T_AVG0.400820

k-means clustering of Dataset 3 using various embeddings (vectorizers)

VectorizerF1-avg
CountVectorizer0.239606
TfVectorizer0.239582
TfidfVectorizer0.374237
GloVe@wiki0.187456
GloVe@twitter0.192742
sBERT@all-MiniLM-L6-v20.984532
sBERT@all-MiniLM-L12-v20.986152
sBERT@all-mpnet-base-v20.990348
sBERT@all-distilroberta-v10.974033
sBERT@multi-qa-MiniLM-L6-cos-v10.945448
sBERT@multi-qa-distilbert-cos-v10.957767
sBERT@multi-qa-mpnet-base-dot-v10.972517
BERT@bert-base-uncased#[CLS]0.329817
BERT@vinai/bertweet-base#[CLS]0.365470
BERT@distilbert-base-uncased#[CLS]0.453654
BERT@cardiffnlp/twitter-roberta-base#[CLS]0.672232
BERT@bert-base-uncased#T_AVG0.442688
BERT@vinai/bertweet-base#T_AVG0.508783
BERT@distilbert-base-uncased#T_AVG0.438658
BERT@cardiffnlp/twitter-roberta-base#T_AVG0.572831

N-based clustering of Dataset 1 using various embeddings (vectorizers)

VectorizerF1-avgF1-stdev
CountVectorizer0.3674690.000429
TfVectorizer0.3676260.000624
TfidfVectorizer0.4972950.001748
GloVe@wiki0.4639990.000942
GloVe@twitter0.5943110.001918
sBERT@multi-qa-mpnet-base-dot-v10.9067460.000166
BERT@bert-base-uncased#[CLS]0.4758420.000682
BERT@vinai/bertweet-base#[CLS]0.5738880.000470
BERT@distilbert-base-uncased#[CLS]0.6548120.000458
BERT@cardiffnlp/twitter-roberta-base#[CLS]0.7995310.000085
BERT@bert-base-uncased#T_AVG0.6099750.001175
BERT@vinai/bertweet-base#T_AVG0.6839970.000223
BERT@distilbert-base-uncased#T_AVG0.6943250.000253
BERT@cardiffnlp/twitter-roberta-base#T_AVG0.7279760.000198
DOI: https://doi.org/10.14313/jamris-2026-005 | Journal eISSN: 2080-2145 | Journal ISSN: 1897-8649
Language: English
Page range: 53 - 65
Submitted on: Jul 10, 2025
|
Accepted on: Aug 10, 2025
|
Published on: Mar 31, 2026
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2026 Mieczysław A. Kłopotek, Sławomir T. Wierzchoń, Bartłomiej Starosta, Piotr Borkowski, Dariusz Czerski, published by Łukasiewicz Research Network – Industrial Research Institute for Automation and Measurements PIAP
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.