Albanian Text Classification: Bag of Words Model and Word Analogies

Arbana Kadriu; Lejla Abazi; Hyrije Abazi

doi:10.2478/bsrj-2019-0006

.blurhash-client-img { display: none !important; }

Albanian Text Classification: Bag of Words Model and Word Analogies

Business Systems Research Journal

Volume 10 (2019): Issue 1 (April 2019)

By: Arbana Kadriu, Lejla Abazi and Hyrije Abazi

Open Access

|May 2019

1. Antonellis, I., Bouras, C., Poulopoulos, V. (2006), “Personalized news categorization through scalable text classification”, in Zhou, X., Li, J., Shen, H. T., Kitsuregawa, M., Zhang, Y. (Eds.) Frontiers of WWW Research and Development – APWeb 2006, Springer, Berlin, Heidelberg, pp. 391-401.10.1007/11610113_35
Search in Google Scholar Back to article
2. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T. (2017), “Enriching word vectors with subword information”, Transactions of the Association of Computational Linguistics, Vol. 5, pp.135-146.10.1162/tacl_a_00051
Search in Google Scholar Back to article
3. Chaudhari, S. V., Lade, S. (2013), “Classification of News and Research Articles Using Text Pattern Mining”, IOSR Journal of Computer Engineering (IOSR-JCE), Vol. 14, No. 5, pp. 120-126.10.9790/0661-145120126
Search in Google Scholar Back to article
4. Cortes, C., Vapnik, V. (1995), “Support-vector networks”, Machine Learning, Vol. 20, No. 3, pp. 273-297.10.1007/BF00994018
Search in Google Scholar Back to article
5. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y., (2006), “Online passive-aggressive algorithms”, Journal of Machine Learning Research, Vol. 7, pp. 551-585.
Search in Google Scholar Back to article
6. Gui, Y., Gao, Z., Li, R., Yang, X. (2012), “Hierarchical text classification for news articles based-on named entities”, in Zhou, S., Zhangs, S., Karypis, G. (Eds.) Advanced Data Mining and Applications, Springer, Berlin, Heidelberg, pp. 318-329.10.1007/978-3-642-35527-1_27
Search in Google Scholar Back to article
7. Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., Aluisio, S. (2017), “Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks”, in Proceedings of Symposium in Information and Human Language Technology, Uberlandia, MG, Brazil, pp. 122-131.
Search in Google Scholar Back to article
8. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T. (2016), “Bag of tricks for efficient text classification”, in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Vol. 2, Short Papers, pp. 427-431.10.18653/v1/E17-2068
Search in Google Scholar Back to article
9. Jurka, T. P., Collingwood, L., Boydstun, A. E., Grossman, E., van Atteveldt, W. (2013) “RTextTools: A supervised learning package for text classification”, The R Journal, Vol. 5, No. 1, pp. 6-12.10.32614/RJ-2013-001
Search in Google Scholar Back to article
10. Liparas, D., HaCohen-Kerner, Y., Moumtzidou, A., Vrochidis, S., Kompatsiaris, I. (2014), “News Articles Classification Using Random Forests and Weighted Multimodal Features”, in Lamas, D., Buitelaar, P. (Eds.), Multidisciplinary Information Retrieval, Springer, Cham, pp. 63-75.10.1007/978-3-319-12979-2_6
Search in Google Scholar Back to article
11. Manning, C. D., Raghavan, P., Schutze, H. (2008). Introduction to Information Retrieval, New York, Cambridge University Press.10.1017/CBO9780511809071
Search in Google Scholar Back to article
12. Mikolov, T., Chen, K., Corrado, G., Dean, J. (2013), “Efficient estimation of word representations in vector space”, in Proceedings of the International Conference on Learning Representations (ICLR 2013), available at: https://arxiv.org/pdf/1301.3781.pdf
Search in Google Scholar Back to article
13. September 2013).
Search in Google Scholar Back to article
14. Natural Language Processing Group (2014). Web corpora of Bosnian, Croatian and Serbian top-level domain published, available at: http://nlp.ffzg.hr/web-corpora-of-bosniancroatian-and-serbian-top-level-domain-published/ (7 September 2014).
Search in Google Scholar Back to article
15. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V. Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E. (2011), “Scikit-learn: Machine Learning in Python”, In Journal of Machine Learning Research, Vol. 12, pp. 2825-2830.
Search in Google Scholar Back to article
16. Raschka, S. (2015). Python machine learning, Birmingham, Packt Publishing Ltd.
Search in Google Scholar Back to article
17. Rubin, T. N., Chambers, A., Smyth, P., Steyvers, M. (2012), “Statistical topic models for multilabel document classification”, Machine Learning, Vol. 88, No. 1-2, pp. 157-208.10.1007/s10994-011-5272-5
Search in Google Scholar Back to article
18. Scannell, K. P. (2007), “The Crúbadán Project: Corpus building for under-resourced languages”, in Fairon, C., Naets, H., Kilgarriff, A., de Schryver, G. M. (Eds.), Building and Exploring Web Corpora, Proceedings of the 3rd Web as Corpus Workshop, Vol. 4, pp. 5-15.
Search in Google Scholar Back to article
19. Swezey, R. M., Sano, H., Shiramatsu, S., Ozono, T., Shintani, T. (2012), “Automatic detection of news articles of interest to regional communities”, International Journal of Computer Science and Network Security, Vol. 12, No. 6, pp. 99-106.
Search in Google Scholar Back to article
20. Tyers, F. M., Alperen, M. S. (2010), “South-east European times: A parallel corpus of Balkan languages”, in Proceedings of the LREC Workshop on Exploitation of Multilingual Resources and Tools for Central and (South-) Eastern European Languages, pp. 49-53.
Search in Google Scholar Back to article
21. Zhou, D., Resnick, P., Mei, Q. (2011), “Classifying the Political Leaning of News Articles and Users from User Votes”, in 5^th International AAAI Conference on Web and Social Media, North America, pp. 417-424.10.1609/icwsm.v5i1.14108
Search in Google Scholar Back to article

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.2478/bsrj-2019-0006 | Journal eISSN: 1847-9375 | Journal ISSN: 1847-8344

Journal RSS Feed

Language: English

Page range: 74 - 87

Submitted on: Dec 1, 2017

Accepted on: Feb 22, 2018

Published on: May 9, 2019

Published by: IRENET - Society for Advancing Innovation and Research in Economy

In partnership with: Paradigm Publishing Services

Publication frequency: 2 issues per year

Keywords:

Related subjects:

Business and economics,

Business management,

Management, organization, corporate governance,

Business management, other,

Mathematics and statistics for economists,

Mathematics

© 2019 Arbana Kadriu, Lejla Abazi, Hyrije Abazi, published by IRENET - Society for Advancing Innovation and Research in Economy
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Volume 10 (2019): Issue 1 (April 2019)