N-based clustering of Dataset 0 using various embeddings (vectorizers)
| Vectorizer | F1-avg | F1-stdev |
|---|---|---|
| CountVectorizer | 0.255152 | 0.000049 |
| TfVectorizer | 0.255146 | 0.000102 |
| TfidfVectorizer | 0.396431 | 0.000360 |
| GloVe@wiki | 0.363963 | 0.000401 |
| GloVe@twitter | 0.355927 | 0.001543 |
| sBERT@all-MiniLM-L6-v2 | 0.974979 | 0.000092 |
| sBERT@all-distilroberta-v1 | 0.951857 | 0.000382 |
| sBERT@multi-qa-mpnet-base-dot-v1 | 0.942263 | 0.000090 |
| BERT@bert-base-uncased#[CLS] | 0.472723 | 0.000262 |
| BERT@vinai/bertweet-base#[CLS] | 0.675733 | 0.000439 |
| BERT@distilbert-base-uncased#[CLS] | 0.592854 | 0.000481 |
| BERT@cardiffnlp/twitter-roberta-base#[CLS] | 0.831508 | 0.000091 |
| BERT@bert-base-uncased#T_AVG | 0.602534 | 0.000423 |
| BERT@vinai/bertweet-base#T_AVG | 0.618835 | 0.000151 |
| BERT@distilbert-base-uncased#T_AVG | 0.558688 | 0.000899 |
| BERT@cardiffnlp/twitter-roberta-base#T_AVG | 0.476953 | 0.002215 |
N-based clustering of Dataset 2 using various embeddings (vectorizers)
| Vectorizer | F1-avg | F1-stdev |
|---|---|---|
| CountVectorizer | 0.277486 | 0.000189 |
| TfVectorizer | 0.277534 | 0.000169 |
| TfidfVectorizer | 0.575343 | 0.000483 |
| GloVe@twitter | 0.385300 | 0.000959 |
| sBERT@all-MiniLM-L12-v2 | 0.834372 | 0.000242 |
| sBERT@multi-qa-distilbert-cos-v1 | 0.820115 | 0.000252 |
| sBERT@multi-qa-mpnet-base-dot-v1 | 0.854759 | 0.000197 |
| BERT@bert-base-uncased#[CLS] | 0.488475 | 0.000369 |
| BERT@vinai/bertweet-base#[CLS] | 0.780850 | 0.000328 |
| BERT@distilbert-base-uncased#[CLS] | 0.464572 | 0.000172 |
| BERT@cardiffnlp/twitter-roberta-base#[CLS] | 0.632636 | 0.000423 |
| BERT@bert-base-uncased#T_AVG | 0.478280 | 0.000458 |
| BERT@vinai/bertweet-base#T_AVG | 0.540016 | 0.003658 |
| BERT@distilbert-base-uncased#T_AVG | 0.444183 | 0.000424 |
| BERT@cardiffnlp/twitter-roberta-base#T_AVG | 0.551978 | 0.000632 |
k-means clustering of Dataset 4 using various embeddings (vectorizers)
| Vectorizer | F1-avg |
|---|---|
| CountVectorizer | 0.286061 |
| TfVectorizer | 0.286051 |
| TfidfVectorizer | 0.311880 |
| GloVe@wiki | 0.345985 |
| GloVe@twitter | 0.239728 |
| sBERT@all-MiniLM-L6-v2 | 0.715959 |
| sBERT@all-MiniLM-L12-v2 | 0.708030 |
| sBERT@all-mpnet-base-v2 | 0.717453 |
| sBERT@all-distilroberta-v1 | 0.780480 |
| sBERT@multi-qa-MiniLM-L6-cos-v1 | 0.683618 |
| sBERT@multi-qa-distilbert-cos-v1 | 0.736226 |
| sBERT@multi-qa-mpnet-base-dot-v1 | 0.761548 |
| BERT@bert-base-uncased#[CLS] | 0.410637 |
| BERT@vinai/bertweet-base#[CLS] | 0.399653 |
| BERT@distilbert-base-uncased#[CLS] | 0.516114 |
| BERT@cardiffnlp/twitter-roberta-base#[CLS] | 0.709762 |
| BERT@bert-base-uncased#T_AVG | 0.531288 |
| BERT@vinai/bertweet-base#T_AVG | 0.432609 |
| BERT@distilbert-base-uncased#T_AVG | 0.523886 |
| BERT@cardiffnlp/twitter-roberta-base#T_AVG | 0.505151 |
N-based clustering of Dataset 3 using various embeddings (vectorizers)
| Vectorizer | F1-avg | F1-stdev |
|---|---|---|
| CountVectorizer | 0.231842 | 0.001354 |
| TfVectorizer | 0.231546 | 0.001468 |
| TfidfVectorizer | 0.320499 | 0.000485 |
| GloVe@twitter | 0.444521 | 0.000620 |
| sBERT@all-MiniLM-L6-v2 | 0.985108 | 0.000009 |
| sBERT@all-MiniLM-L12-v2 | 0.986921 | 0.000000 |
| sBERT@multi-qa-MiniLM-L6-cos-v1 | 0.960467 | 0.000130 |
| sBERT@multi-qa-mpnet-base-dot-v1 | 0.978597 | 0.000071 |
| BERT@bert-base-uncased#[CLS] | 0.537486 | 0.000581 |
| BERT@vinai/bertweet-base#[CLS] | 0.641715 | 0.000124 |
| BERT@distilbert-base-uncased#[CLS] | 0.633663 | 0.000141 |
| BERT@cardiffnlp/twitter-roberta-base#[CLS] | 0.935586 | 0.000230 |
| BERT@bert-base-uncased#T_AVG | 0.652909 | 0.000174 |
| BERT@vinai/bertweet-base#T_AVG | 0.713428 | 0.000090 |
| BERT@distilbert-base-uncased#T_AVG | 0.611432 | 0.000196 |
| BERT@cardiffnlp/twitter-roberta-base#T_AVG | 0.677452 | 0.000217 |
k-means clustering of Dataset 1 using various embeddings (vectorizers)
| Vectorizer | F1-avg |
|---|---|
| CountVectorizer | 0.325411 |
| TfVectorizer | 0.325468 |
| TfidfVectorizer | 0.467889 |
| GloVe@wiki | 0.183522 |
| GloVe@twitter | 0.212565 |
| sBERT@all-MiniLM-L6-v2 | 0.890326 |
| sBERT@all-MiniLM-L12-v2 | 0.912272 |
| sBERT@all-mpnet-base-v2 | 0.729900 |
| sBERT@all-distilroberta-v1 | 0.821762 |
| sBERT@multi-qa-MiniLM-L6-cos-v1 | 0.862951 |
| sBERT@multi-qa-distilbert-cos-v1 | 0.816841 |
| sBERT@multi-qa-mpnet-base-dot-v1 | 0.797446 |
| BERT@bert-base-uncased#[CLS] | 0.389700 |
| BERT@vinai/bertweet-base#[CLS] | 0.558415 |
| BERT@distilbert-base-uncased#[CLS] | 0.533903 |
| BERT@cardiffnlp/twitter-roberta-base#[CLS] | 0.707646 |
| BERT@bert-base-uncased#T_AVG | 0.505968 |
| BERT@vinai/bertweet-base#T_AVG | 0.570867 |
| BERT@distilbert-base-uncased#T_AVG | 0.529057 |
| BERT@cardiffnlp/twitter-roberta-base#T_AVG | 0.531740 |
N-based clustering of Dataset 4 using various embeddings (vectorizers)
| Vectorizer | Fl-avg | F1-stdev |
|---|---|---|
| CountVectorizer | 0.238978 | 0.000000 |
| TfVectorizer | 0.238978 | 0.000000 |
| TfidfVectorizer | 0.239120 | 0.000000 |
| GloVe@wiki | 0.292245 | 0.000981 |
| GloVe@twitter | 0.327367 | 0.059270 |
| sBERT@multi-qa-mpnet-base-dot-v1 | 0.670961 | 0.000124 |
| BERT@bert-base-uncased#[CLS] | 0.532433 | 0.000220 |
| BERT@vinai/bertweet-base#[CLS] | 0.450124 | 0.000068 |
| BERT@distilbert-base-uncased#[CLS] | 0.644404 | 0.000593 |
| BERT@cardiffnlp/twitter-roberta-base#[CLS] | 0.769169 | 0.000094 |
| BERT@bert-base-uncased#T_AVG | 0.614746 | 0.000177 |
| BERT@vinai/bertweet-base#T_AVG | 0.592477 | 0.000221 |
| BERT@distilbert-base-uncased#T_AVG | 0.663272 | 0.000055 |
| BERT@cardiffnlp/twitter-roberta-base#T_AVG | 0.672432 | 0.000529 |
k-means clustering of Dataset 0 using various embeddings (vectorizers)
| Vectorizer | F1-avg |
|---|---|
| CountVectorizer | 0.249608 |
| TfVectorizer | 0.249584 |
| TfidfVectorizer | 0.348829 |
| GloVe@wiki | 0.285086 |
| GloVe@twitter | 0.239426 |
| sBERT@all-MiniLM-L6-v2 | 0.961594 |
| sBERT@all-MiniLM-L12-v2 | 0.937874 |
| sBERT@all-mpnet-base-v2 | 0.925444 |
| sBERT@all-distilroberta-v1 | 0.929055 |
| sBERT@multi-qa-MiniLM-L6-cos-v1 | 0.941966 |
| sBERT@multi-qa-distilbert-cos-v1 | 0.952534 |
| sBERT@multi-qa-mpnet-base-dot-v1 | 0.762779 |
| BERT@bert-base-uncased#[CLS] | 0.346147 |
| BERT@vinai/bertweet-base#[CLS] | 0.492142 |
| BERT@distilbert-base-uncased#[CLS] | 0.445644 |
| BERT@cardiffnlp/twitter-roberta-base#[CLS] | 0.540962 |
| BERT@bert-base-uncased#T_AVG | 0.401162 |
| BERT@vinai/bertweet-base#T_AVG | 0.369402 |
| BERT@distilbert-base-uncased#T_AVG | 0.400333 |
| BERT@cardiffnlp/twitter-roberta-base#T_AVG | 0.362900 |
k-means clustering of Dataset 2 using various embeddings (vectorizers)
| Vectorizer | F1-avg |
|---|---|
| CountVectorizer | 0.244935 |
| TfVectorizer | 0.245068 |
| TfidfVectorizer | 0.470825 |
| GloVe@wiki | 0.211871 |
| GloVe@twitter | 0.257164 |
| sBERT@all-MiniLM-L6-v2 | 0.811696 |
| sBERT@all-MiniLM-L12-v2 | 0.788610 |
| sBERT@all-mpnet-base-v2 | 0.631467 |
| sBERT@all-distilroberta-v1 | 0.657888 |
| sBERT@multi-qa-MiniLM-L6-cos-v1 | 0.715318 |
| sBERT@multi-qa-distilbert-cos-v1 | 0.586079 |
| sBERT@multi-qa-mpnet-base-dot-v1 | 0.722792 |
| BERT@bert-base-uncased#[CLS] | 0.376012 |
| BERT@vinai/bertweet-base#[CLS] | 0.403862 |
| BERT@distilbert-base-uncased#[CLS] | 0.363897 |
| BERT@cardiffnlp/twitter-roberta-base#[CLS] | 0.586548 |
| BERT@bert-base-uncased#T_AVG | 0.393337 |
| BERT@vinai/bertweet-base#T_AVG | 0.363000 |
| BERT@distilbert-base-uncased#T_AVG | 0.391392 |
| BERT@cardiffnlp/twitter-roberta-base#T_AVG | 0.400820 |
k-means clustering of Dataset 3 using various embeddings (vectorizers)
| Vectorizer | F1-avg |
|---|---|
| CountVectorizer | 0.239606 |
| TfVectorizer | 0.239582 |
| TfidfVectorizer | 0.374237 |
| GloVe@wiki | 0.187456 |
| GloVe@twitter | 0.192742 |
| sBERT@all-MiniLM-L6-v2 | 0.984532 |
| sBERT@all-MiniLM-L12-v2 | 0.986152 |
| sBERT@all-mpnet-base-v2 | 0.990348 |
| sBERT@all-distilroberta-v1 | 0.974033 |
| sBERT@multi-qa-MiniLM-L6-cos-v1 | 0.945448 |
| sBERT@multi-qa-distilbert-cos-v1 | 0.957767 |
| sBERT@multi-qa-mpnet-base-dot-v1 | 0.972517 |
| BERT@bert-base-uncased#[CLS] | 0.329817 |
| BERT@vinai/bertweet-base#[CLS] | 0.365470 |
| BERT@distilbert-base-uncased#[CLS] | 0.453654 |
| BERT@cardiffnlp/twitter-roberta-base#[CLS] | 0.672232 |
| BERT@bert-base-uncased#T_AVG | 0.442688 |
| BERT@vinai/bertweet-base#T_AVG | 0.508783 |
| BERT@distilbert-base-uncased#T_AVG | 0.438658 |
| BERT@cardiffnlp/twitter-roberta-base#T_AVG | 0.572831 |
N-based clustering of Dataset 1 using various embeddings (vectorizers)
| Vectorizer | F1-avg | F1-stdev |
|---|---|---|
| CountVectorizer | 0.367469 | 0.000429 |
| TfVectorizer | 0.367626 | 0.000624 |
| TfidfVectorizer | 0.497295 | 0.001748 |
| GloVe@wiki | 0.463999 | 0.000942 |
| GloVe@twitter | 0.594311 | 0.001918 |
| sBERT@multi-qa-mpnet-base-dot-v1 | 0.906746 | 0.000166 |
| BERT@bert-base-uncased#[CLS] | 0.475842 | 0.000682 |
| BERT@vinai/bertweet-base#[CLS] | 0.573888 | 0.000470 |
| BERT@distilbert-base-uncased#[CLS] | 0.654812 | 0.000458 |
| BERT@cardiffnlp/twitter-roberta-base#[CLS] | 0.799531 | 0.000085 |
| BERT@bert-base-uncased#T_AVG | 0.609975 | 0.001175 |
| BERT@vinai/bertweet-base#T_AVG | 0.683997 | 0.000223 |
| BERT@distilbert-base-uncased#T_AVG | 0.694325 | 0.000253 |
| BERT@cardiffnlp/twitter-roberta-base#T_AVG | 0.727976 | 0.000198 |
