References
- Ali, A., & Renals, S. (2018). Word Error Rate Estimation for Speech Recognition: e-WER. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). https://doi.org/10.18653/v1/p18-2004
- Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C. L., & Parikh, D. (2015). VQA: Visual Question Answering. 2015 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2015.279
- Bagić Babac, M. (2023). Emotion analysis of user reactions to online news. Information Discovery and Delivery, 51(2), 179-193. https://doi.org/10.1108/IDD-04-2022-0027
- Bhatnagar, V., Sharma, S., Bhatnagar, A., & Kumar, L. (2021). Role of Machine Learning in Sustainable Engineering: A Review. IOP Conference Series: Materials Science and Engineering, 1099(1), 012036. https://doi.org/10.1088/1757-899x/1099/1/012036
- Bodnar, C. (2018). Text to image synthesis using generative adversarial networks. Available at: arXiv preprint arXiv:1805.00676.
- Čemeljić, H., & Bagić Babac, M. (2023). Preventing Security Incidents on Social Networks: An Analysis of Harmful Content Dissemination Through Applications. Police and Security (in press)
- Clark, A., Prosser, J., & Wiles, R. (2010). Ethical Issues in Image-Based Research, Arts & Health, 2(1), 81-93. doi: 10.1080/17533010903495298
- Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A. A. (2018). “Generative adversarial networks: An overview”. IEEE signal processing magazine, 35(1), 53-65.
- Cvitanović, I., & Bagić Babac, M. (2022). Deep Learning with Self-Attention Mechanism for Fake News Detection. In Lahby, M., Pathan, A.S.K., Maleh, Y., Yafooz, W.M.S. (Eds.), Combating Fake News with Computational Intelligence Techniques (pp. 205-229). Springer, Switzerland.
- Dunđer, I., Seljan, S. & Pavlovski, M. (2021), “What Makes Machine-Translated Poetry Look Bad? A Human Error Classification Analysis.”, Central European conference on information and intelligent systems, Varaždin: Fakultet organizacije i informatike Sveučilišta u Zagrebu, pp.183 - 191.
- Dunđer, I., Seljan, S. & Pavlovski, M. (2020), "Automatic Machine Translation of Poetry and a Low-Resource Language Pair," 43rd International Convention on Information, Communication and Electronic Technology (MIPRO 2020), Opatija, Croatia, pp. 1034-1039, doi: 10.23919/MIPRO48935.2020.9245342.
- Elasri, M., Elharrouss, O., Al-Maadeed, S., & Tairi, H. (2022). Image Generation: A Review. Neural Processing Letters, 54(5), 4609-4646. https://doi.org/10.1007/s11063-022-10777-x
- Girshick, R. (2015). Fast R-CNN. 2015 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2015.169
- Holzinger, A., Goebel, R., Fong, R., Moon, T., Müller, K. R., & Samek, W. (2022). xxAI-Beyond Explainable Artificial Intelligence. In International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers (pp. 15-47). Springer, Cham.
- Hu, Y., Liu, B., Kasai, J., Wang, Y., Ostendorf, M., Krishna, R., & Smith, N. A. (2023). Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering. Available at: arXiv preprint arXiv:2303.11897.
- Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., & Belongie, S. (2017). Stacked Generative Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2017.202
- Ivasic-Kos, M. (2022). Application of Digital Images and Corresponding Image Retrieval Paradigm. ENTRENOVA - ENTerprise REsearch InNOVAtion, 8(1), 350-363. https://doi.org/10.54820/entrenova-2022-0030
- Jamwal, A., Agrawal, R., & Sharma, M. (2022). Deep learning for manufacturing sustainability: Models, applications in Industry 4.0 and implications. International Journal of Information Management Data Insights, 2(2), 100107. https://doi.org/10.1016/j.jjimei.2022.100107
- Jurafsky, D., & Martin, J.H. (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice-Hall, Upper Saddle River, NJ.
- Karimian, G., Petelos, E., & Evers, S. M. A. A. (2022). The ethical issues of the application of artificial intelligence in healthcare: a systematic scoping review. AI and Ethics, 2(4), 539-551. https://doi.org/10.1007/s43681-021-00131-7
- Karras, T., Laine, S., Aila, T. & Hellsten, J. (2020). Training generative adversarial networks with limited data. Proceedings of the International Conference on Learning Representations. Advances in Neural Information Processing Systems, 33 (NeurIPS 2020)
- Krivosheev, N., Vik, K., Ivanova, Y., & Spitsyn, V. (2021). Investigation of the Batch Size Influence on the Quality of Text Generation by the SeqGAN Neural Network. Proceedings of the 31th International Conference on Computer Graphics and Vision. Volume 2. https://doi.org/10.20948/graphicon-2021-3027-1005-1010
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. https://doi.org/10.1145/3065386
- Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. https://doi.org/10.1109/5.726791
- Li, F., Ruijs, N., & Lu, Y. (2022). Ethics & AI: A Systematic Review on Ethical Concerns and Related Strategies for Designing with AI in Healthcare. AI, 4(1), 28-53. https://doi.org/10.3390/ai4010003
- Lin, C.-Y. (2004). ROUGE: a Package for Automatic Evaluation of Summaries. In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain, July 25 – 26.
- Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common Objects in Context. Computer Vision - ECCV 2014, 740-755. https://doi.org/10.1007/978-3-319-10602-1_48
- Lipovac, I., Bagić Babac, M. (2023), Developing a Data Pipeline Solution for Big Data Processing, International Journal of Data Mining, Modelling and Management. Accepted for publication.
- Lu, J., Xu, H., Yang, J., & Huang, Q. (2018). Neural baby talk. Proceedings of the European Conference on Computer Vision (pp. 721-736).
- Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z. & Smolley, P. (2017). Least squares generative adversarial networks. Proceedings of the IEEE International Conference on Computer Vision (pp. 2794-2802).
- Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever, I. & Chen, M. (2021). Glide: Towards photorealistic image generation and editing with text-guided diffusion models. Available at: arXiv preprint arXiv:2112.10741.
- Oliveira dos Santos, G., Colombini, E. L., & Avila, S. (2021). CIDEr-R: Robust Consensus-based Image Description Evaluation. Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021). https://doi.org/10.18653/v1/2021.wnut-1.39
- Papineni, K., Roukos, S., Ward, T. & Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation. ACL-2002: 40th Annual meeting of the Association for Computational Linguistics. pp. 311–318.
- Persello, C., Wegner, J. D., Hansch, R., Tuia, D., Ghamisi, P., Koeva, M., & Camps-Valls, G. (2022). Deep Learning and Earth Observation to Support the Sustainable Development Goals: Current approaches, open challenges, and future opportunities. IEEE Geoscience and Remote Sensing Magazine, 10(2), 172-200. https://doi.org/10.1109/mgrs.2021.3136100
- Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., & Rombach, R. (2023). SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. arXiv preprint arXiv:2307.01952.
- Puh, K., Bagić Babac, M. (2023a). Predicting sentiment and rating of tourist reviews using machine learning, Journal of Hospitality and Tourism Insights, 6(3), 1188-1204. https://doi.org/10.1108/JHTI-02-2022-0078
- Puh, K., & Bagić Babac, M. (2023b). Predicting stock market using natural language processing. American Journal of Business, 38(2), 41-61. https://doi.org/10.1108/ajb-08-2022-0124
- Qiao, T., Zhang, J., Xu, D., & Tao, D. (2019). MirrorGAN: Learning Text-To-Image Generation by Redescription. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00160
- Ramesh, A., Dhariwal, P., Nichol, A., Chu, C. & Chen, M. (2022). Hierarchical text-conditional image generation with CLIP latents. Available at: https://arxiv.org/abs/2204.06125
- Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., & Sutskever, I. (2021). Zero-shot text-to-image generation. In International Conference on Machine Learning (pp. 8821-8831). Available at: https://arxiv.org/abs/2102.12092
- Reed, S., Akata, Z., Lee, H., & Schiele, B. (2016). Learning Deep Representations of Fine-Grained Visual Descriptions. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2016.13
- Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137-1149. https://doi.org/10.1109/tpami.2016.2577031
- Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Lecture Notes in Computer Science, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28
- Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. https://doi.org/10.1038/323533a0
- Sah, S., Peri, D., Shringi, A., Zhang, C., Dominguez, M., Savakis, A., & Ptucha, R. (2018). Semantically Invariant Text-to-Image Generation. 2018 25th IEEE International Conference on Image Processing (ICIP). https://doi.org/10.1109/icip.2018.8451656
- Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., Kamyar, S., Ghasemipour, S., Ayan, B. K., Mahdavi, S. S., Lopes, R. G., Salimans, T., Ho, J., Fleet, D. J., & Norouzi, M. (2022). Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv:2205.11487
- Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training GANs. Available at: https://arxiv.org/abs/1606.03498
- Samek, W., Wiegand, T. & Müller, K. R. (2017). Explainable artificial intelligence: Understanding, visualising and interpreting deep learning models. Available at: https://arxiv.org/abs/1708.08296
- Šandor, D., & Bagić Babac, M. (2023). Sarcasm detection in online comments using machine learning. Information Discovery and Delivery. https://doi.org/10.1108/idd-01-2023-0002
- Szegedy, C., Wei Liu, Yangqing Jia, Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2015.7298594
- Tomičić Furjan, M., Tomičić-Pupek, K., & Pihir, I. (2020). Understanding Digital Transformation Initiatives: Case Studies Analysis. Business Systems Research, 11 (1), 125-141. https://doi.org/10.2478/bsrj-2020-0009
- Tunmibi, S., & Okhakhu, D. (2022). Machine Learning for Sustainable Development. In Conference proceedings of the First Conference of the National Institute of Office Administrators and Information Managers (NIOAIM) between 7th and 10th February, Lead City University, Ibadan, Oyo State, Nigeria.
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. & Polosukhin, I. (2017). Attention is all you need, In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17), Curran Associates (pp. 6000-6010). Red Hook, NY, USA
- Vinuesa, R., & Sirmacek, B. (2021). Interpretable deep-learning models to help achieve the Sustainable Development Goals. Nature Machine Intelligence, 3(11), 926-926. https://doi.org/10.1038/s42256-021-00414-y
- Xian, Y., Lampert, C. H., Schiele, B., & Akata, Z. (2019). Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9), 2251-2265. https://doi.org/10.1109/tpami.2018.2857768
- Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R. & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048-2057). PMLR.
- Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., & He, X. (2018). AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2018.00143
- Yildirim, E.. (2022). Text-to-Image Generation A.I. in Architecture, In (Kozlu Hale, H., 2022). Art and Architecture: Theory, Practice and Experience, Lyon: Livre de Lyon, 97-120.
- Zhang, C., Zhang, C., Zhang, M., & Kweon, I. S. (2023). Text-to-image Diffusion Models in Generative AI: A Survey. Available at: https://arxiv.org/abs/2303.07909
- Zhang, H., Koh, J. Y., Baldridge, J., Lee, H., & Yang, Y. (2021). Cross-Modal Contrastive Learning for Text-to-Image Generation. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr46437.2021.00089
- Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas, D. N. (2019). StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 1947-1962. https://doi.org/10.1109/tpami.2018.2856256