References
- 1Arthur, D., & Vassilvitskii, S. (2007). K-means++ the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 1027–1035).
- 2Advameg, Inc. (2011). Philadelphia 2035 (Houston: Foreclosure, neighborhoods, wage)—Pennsylvania (PA)—City-Data Forum. City-Data.Com.
https://www.city-data.com/forum/philadelphia/1304227-philadelphia-2035-a.html - 3Advameg, Inc. (2012a). Official Philadelphia Metro Crime Thread (York, Chester: Apartment complexes, houses, unemployment)—Pennsylvania (PA)—Page 10—City-Data Forum [Forum]. City-Data.Com.
http://www.city-data.com/forum/philadelphia/1470248-official-philadelphia-metro-crime-thread-10.html - 4Advameg, Inc. (2012b). Retail coming to Philadelphia (Economy, Penn: 2013, tenant, shop)—Pennsylvania (PA)—Page 3—City-Data Forum [Forum]. City-Data.Com.
https://www.city-data.com/forum/philadelphia/1740992-retail-coming-philadelphia-3.html - 5Advameg, Inc. (2013). Official Greater Philadelphia Area Crime Thread (York, Mars: Leasing, condominium, place to live)—Pennsylvania (PA)—Page 267—City-Data Forum [Forum]. City-Data.Com.
https://www.city-data.com/forum/philadelphia/1839911-official-greater-philadelphia-area-crime-thread-267.html - 6Advameg, Inc. (2020). How’s everyone doing amongst the Coronavirus shut down? (Philadelphia, York: Restaurants, bus)—Pennsylvania (PA)—Page 37—City-Data Forum [Forum]. City-Data.Com.
https://www.city-data.com/forum/philadelphia/3137059-hows-everyone-doing-amongst-coronavirus-shut-37.html - 7Advameg, Inc. (n.d.a). City-Data.Com—Stats about all US cities—Real estate, relocation info, crime, house prices, cost of living, races, home value estimator, recent sales, income, photos, schools, maps, weather, neighborhoods, and more. Retrieved January 26, 2024, from
https://www.city-data.com/ - 8Advameg, Inc. (n.d.b). City-data.com Forum: Relocation, Moving, General and Local City Discussions. Retrieved January 26, 2024, from
https://www.city-data.com/forum/ - 9Advameg, Inc. (n.d.c). Terms of Service—City-Data Forum. Retrieved October 22, 2023, from
https://www.city-data.com/forumtos.html - 10Aharoni, R., & Goldberg, Y. (2020).
Unsupervised Domain Clusters in Pretrained Language Models (arXiv:2004.02105) , Cornell University, arXiv.http://arxiv.org/abs/2004.02105 . DOI: 10.18653/v1/2020.acl-main.692 - 11Angelov, D. (2020). Top2Vec: Distributed Representations of Topics.
- 12Bhatia, S., Lau, J. H., & Baldwin, T. (2016). Automatic Labeling of Topics with Neural Embeddings. DOI: 10.48550/arXiv.1612.05340
- 13Bianchi, F., Terragni, S., Hovy, D., Nozza, D., & Fersini, E. (2020a). Cross-lingual Contextualized Topic Models with Zero-shot Learning (arXiv:2004.07737). arXiv.
http://arxiv.org/abs/2004.07737 . DOI: 10.18653/v1/2021.eacl-main.143 - 14Bianchi, F., Terragni, S., & Hovy, D. (2020b). Pre-training is a hot topic: Contextualized document embeddings improve topic coherence. arXiv preprint arXiv:2004.03974. DOI: 10.18653/v1/2021.acl-short.96
- 15Blei, D. M., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation. (Jan), 993–1022.
- 16Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching Word Vectors with Subword Information. arXiv Preprint arXiv:1607.04606. DOI: 10.1162/tacl_a_00051
- 17Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American society for information science, 41(6), 391–407.
https://www.cs.csustan.edu/~mmartin/LDS/Deerwester-et-al.pdf . DOI: 10.1002/(SICI)1097-4571(199009)41:6<;391::AID-ASI1>3.0.CO;2-9 - 18Dieng, A. B., Ruiz, F. J. R., & Blei, D. M. (2020). Topic Modeling in Embedding Spaces. Transactions of the Association for Computational Linguistics, 8, 439–453. DOI: 10.1162/tacl_a_00325
- 19Duan, Z., Xu, Y., Chen, B., Wang, D., Wang, C., & Zhou, M. (2021). TopicNet: Semantic Graph-Guided Topic Discovery (arXiv:2110.14286). arXiv. DOI: 10.48550/arXiv.2110.14286
- 20El-Assady, M., Kehlbeck, R., Collins, C., Keim, D., & Deussen, O. (2019). Semantic Concept Spaces: Guided Topic Model Refinement using Word-Embedding Projections (arXiv:1908.00475). arXiv.
http://arxiv.org/abs/1908.00475 . DOI: 10.1109/TVCG.2019.2934654 - 21Gerlach, M., Peixoto, T. P., & Altmann, E. G. (2018). A network approach to topic models. Science Advances, 4(7),
eaaq1360 . DOI: 10.1126/sciadv.aaq1360 - 22Gourru, A., Velcin, J., Roche, M., Gravier, C., & Poncelet, P. (2018).
United We Stand: Using Multiple Strategies for Topic Labeling . In M. Silberztein, F. Atigui, E. Kornyshova, E. Métais, & F. Meziane (Eds.), Natural Language Processing and Information Systems (Vol. 10859, pp. 352–363). Springer International Publishing. DOI: 10.1007/978-3-319-91947-8_37 - 23Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv Preprint arXiv:2203.05794. DOI: 10.48550/arXiv.2203.05794
- 24Hinneburg, A., Rosner, F., Pessler, S., & Oberländer, C. (2014, November). Exploring document collections with topic frames. Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management (pp. 2084–2086). DOI: 10.1145/2661829.2661857
- 25Hoffman, M., Bach, F., & Blei, D. (2010). Online learning for latent dirichlet allocation. Advances. Neural information processing systems, 23. URL:
https://papers.nips.cc/paper_files/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf - 26Jagarlamudi, J., Iii, H. D., & Udupa, R. (2012). Incorporating Lexical Priors into Topic Models. Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 204–213).
Association for Computational Linguistics URL.https://aclanthology.org/E12-1021 - 27Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., & Mikolov, T. (2016a). FastText.zip: Compressing text classification models. arXiv Preprint arXiv:1612.03651. DOI: 10.48550/arXiv.1612.03651
- 28Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016b). Bag of Tricks for Efficient Text Classification. arXiv Preprint arXiv:1607.01759. DOI: https://doi.org/10.48550/arXiv.1607.01759; 10.18653/v1/E17-2068
- 29Li, C., Chen, S., Xing, J., Sun, A., & Ma, Z. (2018). Seed-Guided Topic Model for Document Filtering and Classification. ACM Transactions on Information Systems, 37(1), 9:1–9:37. DOI: 10.1145/3238250
- 30Limwattana, S., & Prom-on, S. (2021). Topic Modeling Enhancement using Word Embeddings. 18th International Joint Conference on Computer Science and Software Engineering (JCSSE), 1–5. DOI: 10.1109/JCSSE53117.2021.9493816
- 31McInnes, L., Healy, J., & Astels, S. (2017). hdbscan: Hierarchical density based clustering. J. Open Source Softw., 2(11), 205. DOI: 10.21105/joss.00205
- 32Mimno, D., Wallach, H., Talley, E., Leenders, M., & McCallum, A. (2011). Optimizing Semantic Coherence in Topic Models. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (pp. 262–272)
Association for Computational Linguistics URL:https://aclanthology.org/D11-1024 - 33Newman, M. E. (2009). The first-mover advantage in scientific publication. Europhysics Letters, 86(6), 68001. DOI: 10.1209/0295-5075/86/68001
- 34Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. DOI: 10.48550/arXiv.1201.0490
- 35Philadelphia City Planning Commission. (2023). About | Philadelphia2035.
https://www.phila2035.org/ - 36Popa, C., & Rebedea, T. (2021). BART-TL: Weakly-Supervised Topic Label Generation. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 1418–1425. DOI: 10.18653/v1/2021.eacl-main.121
- 37Reddit Inc. (2023). Homepage—Reddit.
https://www.redditinc.com/ - 38Řehůřek, R., & Sojka, P. (2011). Gensim—statistical semantics in python. Retrieved from genism.org. URL:
https://www.fi.muni.cz/usr/sojka/posters/rehurek-sojka-scipy2011.pdf - 39Richardson, L. (2007). Beautiful soup documentation.
- 40Ridolfo, J., & In Hart-Davidson, W. (2015). Rhetoric and the digital humanities. University of Chicago Press. DOI: 10.7208/chicago/9780226176727.001.0001
- 41Röder, M., Both, A., & Hinneburg, A. (2015a). Exploring the Space of Topic Coherence Measures. Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 399–408. DOI: 10.1145/2684822.2685324
- 42Röder, M., Both, A., & Hinneburg, A. (2015b). Exploring the Space of Topic Coherence Measures. Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, 399–408. DOI: 10.1145/2684822.2685324
- 43Sobkowicz, P., & Sobkowicz, A. (2010). Dynamics of hate based networks. The European Physical Journal B, 73(4), 633–643. DOI: 10.1140/epjb/e2010-00039-0
- 44Stevens, K., Kegelmeyer, P., Andrzejewski, D., & Buttler, D. (2012). Exploring Topic Coherence over Many Models and Many Topics. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics. URL:
https://aclanthology.org/D12-1087 - 45Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. In Thomas K. Landauer, Danielle S. McNamara, Simon Dennis, & Walter Kintsch (Eds.), Handbook of latent semantic analysis, 427(7), (pp. 424–440)
- 46Terragni, S. (2023).
A collection of Topic Diversity measures for topic modeling . [Python].https://github.com/silviatti/topic-model-diversity (Original work published 2020). - 47Terragni, S., Fersini, E., & Messina, E. (2021, June). Word embedding-based topic similarity measures. In International Conference on Applications of Natural Language to Information Systems (pp. 33–45). Cham:
Springer International Publishing . DOI: 10.1007/978-3-030-80599-9_4 - 48Tran, N. K., Zerr, S., Bischoff, K., Niederée, C., & Krestel, R. (2013).
Topic Cropping: Leveraging Latent Topics for the Analysis of Small Corpora . In T. Aalberg, C. Papatheodorou, M. Dobreva, G. Tsakonas, & C. J. Farrugia (Eds.), Research and Advanced Technology for Digital Libraries (pp. 297–308). Springer. DOI: 10.1007/978-3-642-40501-3_30 - 49Vayansky, I., & Kumar, S. A. (2020). A review of topic modeling methods. Information Systems, 94. DOI: 10.1016/j.is.2020.101582
- 50Yang, W., Boyd-Graber, J., & Resnik, P. (2016). A Discriminative Topic Model using Document Network Structure. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 686–696. DOI: 10.18653/v1/P16-1065
- 51Zhang, Z., Fang, M., Chen, L., & Namazi-Rad, M.-R. (2022). Is Neural Topic modeling Better than Clustering? An Empirical Study on Clustering with Contextual Embeddings for Topics (arXiv:2204.09874). arXiv. DOI: 10.48550/arXiv.2204.09874; 10.18653/v1/2022.naacl-main.285
