References
- Arora, S., Liang, Y.Y., & Ma, T.Y. (2017). A simple but tough-to-beat baseline for sentence embeddings. In proceedings of International Conference on Learning Representations, Toulon, France, April 24–26, 2017.
- Astrakhantsev, N. (2015). Methods and software for terminology extraction from domain-specific text collection (Unpublished doctoral dissertation). Ph. D. thesis, Institute for System Programming of Russian Academy of Sciences.
- Awan, M.N., & Beg, M.O. (2020). Top-rank: A topicalpostionrank for extraction and classification of keyphrases in text. Computer Speech & Language, 65, 101116.
- Beltagy, I., Lo, K., & Cohan, A. (2019). Scibert: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676.
- Blei, D.M., Ng, A.Y., & Jordan, M.I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993–1022.
- Cagliero, L., & La Quatra, M. (2020). Extracting highlights of scientific articles: A supervised summarization approach. Expert Systems with Applications, 160, 113659.
- Curiskis, S.A., Drake, B., Osborn, T.R., & Kennedy, P.J. (2020). An evaluation of document clustering and topic modelling in two online social networks: Twitter and reddit. Information Processing & Management, 57(2), 102034.
- Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American society for information science, 41(6), 391–407.
- Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd (Vol. 96, pp. 226–231).
- Harris, Z.S. (1954). Distributional structure. Word, 10(2–3), 146–162.
- Hou, J.H., Yang, X.C., & Chen, C.M. (2018). Emerging trends and new developments in information science: A document co-citation analysis (2009–2016). Scientometrics, 115(2), 869–892.
- Jelodar, H., Wang, Y.L., Yuan, C., Feng, X., Jiang, X.H., Li, Y.C., & Zhao, L. (2019). Latent dirichlet allocation (lda) and topic modeling: Models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169–15211.
- Jones, K.S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21.
- Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.
- Kenter, T., Borisov, A., & De Rijke, M. (2016). Siamese cbow: Optimizing word embeddings for sentence representations. arXiv preprint arXiv:1606.04640.
- Kim, J., Yoon, J., Park, E., & Choi, S. (2020). Patent document clustering with deep embeddings. Scientometrics, 1–15.
- Krenn, M., & Zeilinger, A. (2020). Predicting research trends with semantic and neural networks with an application in quantum physics. Proceedings of the National Academy of Sciences, 117(4), 1910–1916.
- Kuhn, T., Perc, M., & Helbing, D. (2014). Inheritance patterns in citation networks reveal scientific memes. Physical Review X, 4(4), 041036.
- Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188–1196).
- Li, J.Z., Fan, Q.N., & Zhang, K., et al. (2007). Keyword extraction based on tf/idf for chinese news document. Wuhan University Journal of Natural Sciences, 12(5), 917–921.
- Liu, H.W., Kou, H.Z., Yan, C., & Qi, L.Y. (2019). Link prediction in paper citation network to construct paper correlation graph. EURASIP Journal on Wireless Communications and Networking, 2019(1), 1–12.
- Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).
- Miller, G.A. (1995). Wordnet: A lexical database for english. Communications of the ACM, 38(11), 39–41.
- Peganova, I., Rebrova, A., & Nedumov, Y. (2019). Labelling hierarchical clusters of scientific articles. In 2019 ivannikov memorial workshop (ivmem) (pp. 26–32).
- Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
- Radu, R.-G., Rădulescu, I.-M., Truică, C.-O., Apostol, E.-S., & Mocanu, M. (2020). Clustering documents using the document to vector model for dimensionality reduction. In 2020 ieee international conference on automation, quality and testing, robotics (aqtr) (pp. 1–6).
- Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic keyword extraction from individual documents. Text mining: Applications and theory, 1, 1–20.
- Rousseeuw, P.J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20, 53–65.
- Steinley, D. (2004). Properties of the hubert-arable adjusted rand index. Psychological methods, 9(3), 386.
- Vahidnia, S., Abbasi, A., & Abbass, H.A. (2020). Document clustering and labeling for research trend extraction and evolution mapping. In C. Zhang, P. Mayr, W. Lu, & Y. Zhang (Eds.), Proceedings of the 1st workshop on extraction and evaluation of knowledge entities from scientific documents co-located with the ACM/IEEE joint conference on digital libraries in 2020, eeke@jcdl 2020, virtual event, china, august 1st, 2020 (Vol. 2658, pp. 54–62). Retrieved from http://ceur-ws.org/Vol-2658/paper7.pdf
- Ward Jr, J.H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301), 236–244.
- Weber, T., Kranzlmüller, D., Fromm, M., & Tavares de Sousa, N. (2020). Using supervised learning to classify metadata of research data by field of study. Quantitative Science Studies, 1–26.
- Xie, J., Girshick, R., & Farhadi, A. (2016). Unsupervised deep embedding for clustering analysis. In International conference on machine learning (pp. 478–487).
- Xu, H.Y., Winnink, J., Yue, Z.H., Liu, Z.Q., & Yuan, G.T. (2020). Topic-linked innovation paths in science and technology. Journal of Informetrics, 14(2), 101014.
- Xu, S., Hao, L.Y., An, X., Yang, G.C., & Wang, F.F. (2019). Emerging research topics detection with multiple machine learning models. Journal of Informetrics, 13(4), 100983.
- Xu, S., Zhai, D.S., Wang, F.F., An, X., Pang, H.S., & Sun, Y.R. (2019). A novel method for topic linkages between scientific publications and patents. Journal of the Association for Information Science and Technology, 70(9), 1026–1042.
- Zeng, A., Shen, Z.S., Zhou, J.L., Wu, J.S., Fan, Y., Wang, Y.G., & Stanley, H.E. (2017). The science of science: From the perspective of complex systems. Physics Reports, 714–715, 1–73. Retrieved from https://doi.org/10.1016/j.physrep.2017.10.001 doi: 10.1016/j.physrep.2017.10.001
- Zhang, Q.R., Li, Y., Liu, J.S., Chen, Y.D., & Chai, L.H. (2017). A dynamic co-word network-related approach on the evolution of China's urbanization research. Scientometrics, 111(3), 1623–1642. doi: 10.1007/s11192-017-2314-1
- Zhang, Y., Chen, H.S., Lu, J., & Zhang, G.Q. (2017). Detecting and predicting the topic change of knowledge-based systems: A topic-based bibliometric analysis from 1991 to 2016. Knowledge-Based Systems, 133, 255–268. Retrieved from http://dx.doi.org/10.1016/j.knosys.2017.07.011 doi: 10.1016/j.knosys.2017.07.011
- Zhang, Y., Lu, J., Liu, F., Liu, Q., Porter, A., Chen, H.S., & Zhang, G.Q. (2018). Does deep learning help topic extraction? A kernel k-means clustering method with word embedding. Journal of Informetrics, 12(4), 1099–1117.
- Zhang, Y., Zhang, G.Q., Zhu, D.H., & Lu, J. (2017). Scientific evolutionary pathways: Identifying and visualizing relationships for scientific topics. Journal of the Association for Information Science and Technology, 68(8), 1925–1939. Retrieved from http://doi.wiley.com/10.1002/asi.23814 doi: 10.1002/asi.23814
- Zhou, Y., Lin, H., Liu, Y.F., & Ding, W. (2019). A novel method to identify emerging technologies using a semi-supervised topic clustering model: A case of 3d printing industry. Scientometrics, 120(1), 167–185.