Have a personal or library account? Click to login
Towards Explainable Graph Spectral Clustering for BERT Embeddings Cover

Abstract

Artificial Intelligence algorithms are increasingly applied to tasks in Natural Language Processing, including document clustering. As these algorithms become increasingly complex (such as transformer-based embeddings, like BERT) and/or are of a “black-box” nature, such as Graph Spectral Clustering (GSC) algorithms, the demand for explaining the results of such algorithms is becoming increasingly urgent. In this paper, we propose a model-aware method to explain the results of GSC in the context of BERT-based embeddings. We present a novel theoretical methodology for explanation, based on the premise that document similarity in GSC is computed as cosine similarity of BERT embeddings of documents. We demonstrate the validity of this methodology by presenting strong GSC clustering results, restoring the human-made assignment of hashtags to tweets. We show that GSC based on BERT embeddings outperforms approaches using Term Vector Space and GloVe embeddings. Therefore, the resulting explanations are also expected to be of higher quality.

DOI: https://doi.org/10.14313/jamris-2026-005 | Journal eISSN: 2080-2145 | Journal ISSN: 1897-8649
Language: English
Page range: 53 - 65
Submitted on: Jul 10, 2025
|
Accepted on: Aug 10, 2025
|
Published on: Mar 31, 2026
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2026 Mieczysław A. Kłopotek, Sławomir T. Wierzchoń, Bartłomiej Starosta, Piotr Borkowski, Dariusz Czerski, published by Łukasiewicz Research Network – Industrial Research Institute for Automation and Measurements PIAP
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.