Have a personal or library account? Click to login
Automatic Language Identification in Code-Switched Hindi-English Social Media Text Cover

Automatic Language Identification in Code-Switched Hindi-English Social Media Text

Open Access
|Jun 2021

References

  1. 1Aguilar, G., & Solorio, T. (2020, July). From English to code-switching: Transfer learning with strong morphological clues. In Proceedings of the 58th annual meeting of the association for computational linguistics (pp. 80338044). Online: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/2020.acl-main.716. DOI: 10.18653/v1/2020.acl-main.716
  2. 2Ahn, E., Jimenez, C., Tsvetkov, Y., & Black, A. W. (2020, January). What code-switching strategies are effective in dialogue systems? In Proceedings of the Society for Computation in Linguistics 2020 (pp. 254264). New York, USA: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/2020.scil-1.32
  3. 3Anastasopoulos, A., & Neubig, G. (2020, July). Should all cross-lingual embeddings speak English? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 86588679). Online: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/2020.acl-main.766. DOI: 10.18653/v1/2020.acl-main.766
  4. 4Attia, M., Samih, Y., Elkahky, A., Mubarak, H., Abdelali, A., & Darwish, K. (2019, August). POS tagging for improving code-switching identification in Arabic. In Proceedings of the Fourth Arabic Natural Language Processing Workshop (pp. 1829). Florence, Italy: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W19-4603. DOI: 10.18653/v1/W19-4603
  5. 5Bali, K., Sharma, J., Choudhury, M., & Vyas, Y. (2014, October). “I am borrowing ya mixing?”An analysis of English-Hindi code mixing in Facebook. In Proceedings of the First Workshop on Computational Approaches to Code Switching (pp. 116126). Doha, Qatar: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W14-3914. DOI: 10.3115/v1/W14-3914
  6. 6Barman, U., Das, A., Wagner, J., & Foster, J. (2014, October). Code mixing: A challenge for language identification in the language of social media. In Proceedings of the First Workshop on Computational Approaches to Code Switching (pp. 1323). Doha, Qatar: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W14-3902. DOI: 10.3115/v1/W14-3902
  7. 7Bullock, B., Guzmán, W., Serigos, J., Sharath, V., & Toribio, A. J. (2018, July). Predicting the presence of a Matrix Language in code-switching. In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching (pp. 6875). Melbourne, Australia: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W18-3208. DOI: 10.18653/v1/W18-3208
  8. 8Çetinoğlu, Ö., Schulz, S., & Vu, N. T. (2016, November). Challenges of computational processing of code-switching. In Proceedings of the Second Workshop on Computational Approaches to Code Switching (pp. 111). Austin, USA: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W16-5801. DOI: 10.18653/v1/W16-5801
  9. 9Chan, J. Y. C., Ching, P. C., & Lee, T. (2005). Development of a Cantonese-English code-mixing speech corpus. In Proceedings of the Ninth European Conference on Speech Communication and Technology – Interspeech’05 (pp. 15331536). Lisbon, Portugal. Retrieved from https://www.isca-speech.org/archive/archive_papers/interspeech_2005/i05_1533.pdf
  10. 10Choudhury, M., Chittaranjan, G., Gupta, P., & Das, A. (2014). Overview of FIRE 2014 Track on Transliterated Search (Tech. Rep.). Retrieved from https://www.isical.ac.in/~fire/working-notes/2014/MSR/2014-trainslit_search-track_over.pdf
  11. 11Dey, A., & Fung, P. (2014, May). A Hindi-English code-switching corpus. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). Reykjavik, Iceland: European Language Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2014/pdf/922_Paper.pdf
  12. 12Elfardy, H., Al-Badrashiny, M., & Diab, M. (2013). Codeswitch point detection in Arabic. In E. Métais, F. Meziane, M. Saraee, V. Sugumaran & S. Vadera (Eds.), Natural Language Processing and Information Systems (pp. 412416). Berlin, Germany: Springer. DOI: 10.1007/978-3-642-38824-8_51
  13. 13Eskander, R., Al-Badrashiny, M., Habash, N., & Rambow, O. (2014, October). Foreign words and the automatic processing of Arabic social media text written in Roman script. In Proceedings of the First Workshop on Computational Approaches to Code Switching (pp. 112). Doha, Qatar: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W14-3901. DOI: 10.3115/v1/W14-3901
  14. 14Garrido-Muñoz, I., Montejo-Ráez, A., Martínez-Santiago, F., & Ureña-Lápez, L. A. (2021). A survey on bias in deep NLP. Applied Sciences, 11(7), 3184. Retrieved from https://www.mdpi.com/2076-3417/11/7/3184. DOI: 10.3390/app11073184
  15. 15Grosjean, F., & Li, P. (2013). The psycholinguistics of bilingualism. Chichester, UK: Wiley-Blackwell.
  16. 16Gupta, K., Choudhury, M., & Bali, K. (2012, May). Mining Hindi-English transliteration pairs from online Hindi lyrics. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12) (pp. 24592465). Istanbul, Turkey: European Language Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2012/pdf/365_Paper.pdf
  17. 17Jamatia, A., Das, A., & Gambäck, B. (2019). Deep learning-based language identification in English-Hindi-Bengali code-mixed social media corpora. Journal of Intelligent Systems, 28(3), 399408. DOI: 10.1515/jisys-2017-0440
  18. 18Jamatia, A., Gambäck, B., & Das, A. (2015, September). Part-of-speech tagging for code-mixed English-Hindi Twitter and Facebook chat messages. In Proceedings of the International Conference Recent Advances in Natural Language Processing (pp. 239248). Hissar, Bulgaria: INCOMA Ltd. Retrieved from https://www.aclweb.org/anthology/R15-1033
  19. 19Kaur, J., & Singh, J. (2015). Toward normalizing Romanized Gurumukhi text from social media. Indian Journal of Science and Technology, 8(27), 16. DOI: 10.17485/ijst/2015/v8i27/81666
  20. 20Lyu, D-C., Tien-Ping, T., Eng, C., & Haizhou, L. (2015). Mandarin–English codeswitching speech corpus in South-East Asia: SEAME. Language Resources and Evaluation, 49, 19861989. DOI: 10.1007/s10579-015-9303-x
  21. 21Mager, M., Çetinoğlu, Ö., & Kann, K. (2019, June). Subword-level language identification for intraword code-switching. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 20052011). Minneapolis, USA: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/N19-1201. DOI: 10.18653/v1/N19-1201
  22. 22Mave, D., Maharjan, S., & Solorio, T. (2018, July). Language identification and analysis of code-switched social media text. In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching (pp. 5161). Melbourne, Australia: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W18-3206. DOI: 10.18653/v1/W18-3206
  23. 23Mhaiskar, R. (2015). Romanagari an alternative for modern media writings. Bulletin of the Deccan College Research Institute, 75, 195202. Retrieved from http://www.jstor.org/stable/26264736
  24. 24Molina, G., AlGhamdi, F., Ghoneim, M., Hawwari, A., Rey-Villamizar, N., Diab, M., & Solorio, T. (2016, November). Overview for the second shared task on language identification in code-switched data. In Proceedings of the Second Workshop on Computational Approaches to Code Switching (pp. 4049). Austin, USA: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W16-5805. DOI: 10.18653/v1/W16-5805
  25. 25Nguyen, D., & Doğruöz, A. S. (2013, October). Word level language identification in online multilingual communication. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 857862). Seattle, USA: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/D13-1084
  26. 26Nguyen, L., & Bryant, C. (2020, May). CanVEC – the Canberra Vietnamese-English code-switching natural speech corpus. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC’20) (pp. 41214129). Marseille, France: European Language Resources Association. Retrieved from https://www.aclweb.org/anthology/2020.lrec-1.507
  27. 27Roy, R. S., Choudhury, M., Majumder, P., & Agarwal, K. (2013, December). Overview of the FIRE 2013 track on transliterated search. In FIRE’12 & ’13: Post-Proceedings of the Fourth and Fifth Workshops of the Forum for Information Retrieval Evaluation (pp. 17). New York, USA: Association for Computing Machinery. DOI: 10.1145/2701336.2701636
  28. 28Sasaki, Y. (2007). The truth of the F-measure (Tech. Rep.). Manchester, UK: University of Manchester. Retrieved from https://www.cs.odu.edu/~mukka/cs795sum09dm/Lecturenotes/Day3/F-measure-YS-26Oct07.pdf
  29. 29Shen, H. P., Wu, C. H., Yang, Y. T., & Hsu, C. S. (2011, October). CECOS: A Chinese-English code-switching speech database. In 2011 International Conference on Speech Database and Assessments, Oriental COCOSDA 2011 – Proceedings (pp. 120123). Hsinchu City, Taiwan. DOI: 10.1109/ICSDA.2011.6085992
  30. 30Si, A. (2011). A diachronic investigation of Hindi–English code-switching, using Bollywood film scripts. International Journal of Bilingualism, 15(4), 388407. DOI: 10.1177/1367006910379300
  31. 31Solorio, T., & Liu, Y. (2008, October). Part-of-speech tagging for English-Spanish code-switched text. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (pp. 10511060). Honolulu, Hawaii: Association for Computational Linguistics. Retrieved from http://aclweb.org/anthology/D08-1110. DOI: 10.3115/1613715.1613852
  32. 32Soto, V., & Hirschberg, J. (2018, July). Joint part-of-speech and language ID tagging for code-switched data. In Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching (pp. 110). Melbourne, Australia: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W18-3201. DOI: 10.18653/v1/W18-3201
  33. 33Sowmya, V. B., Choudhury, M., Bali, K., Dasgupta, T., & Basu, A. (2010, May). Resource creation for training and testing of transliteration systems for Indian languages. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). Valletta, Malta: European Language Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2010/pdf/182_Paper.pdf
  34. 34Virga, P., & Khudanpur, S. (2003). Transliteration of proper names in cross-lingual information retrieval. In Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition – Volume 15 (pp. 5764). Sapporo, Japan: Association for Computational Linguistics. DOI: 10.3115/1119384.1119392
  35. 35Voss, C., Tratz, S., Laoudi, J., & Briesch, D. (2014, May). Finding Romanized Arabic dialect in code-mixed Tweets. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) (pp. 22492253). Reykjavik, Iceland: European Language Resources Association (ELRA). Retrieved from http://www.lrec-conf.org/proceedings/lrec2014/pdf/1116_Paper.pdf
  36. 36Xia, M. X. (2016, November). Codeswitching language identification using subword information enriched word vectors. In Proceedings of the second workshop on computational approaches to code switching (pp. 132136). Austin, USA: Association for Computational Linguistics. Retrieved from https://www.aclweb.org/anthology/W16-5818. DOI: 10.18653/v1/W16-5818
DOI: https://doi.org/10.5334/johd.44 | Journal eISSN: 2059-481X
Language: English
Published on: Jun 25, 2021
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Li Nguyen, Christopher Bryant, Sana Kidwai, Theresa Biberauer, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.