Bridging the gap between AI and human emotion: a multimodal recognition system

Ganta Neeraja; Jakkula Sai Surya Teja; M. Ravi Kumar; J. Lakshmi Prasanna; Parvez M. Muzammil; Chella Santhosh

doi:10.2478/candc-2024-0024

.blurhash-client-img { display: none !important; }

Bridging the gap between AI and human emotion: a multimodal recognition system

Control and Cybernetics

Volume 53 (2024): Issue 4 (December 2024)

By: Ganta Neeraja, Jakkula Sai Surya Teja, M. Ravi Kumar, J. Lakshmi Prasanna, Parvez M. Muzammil and Chella Santhosh

Open Access

|Aug 2025

Abstract

This study introduces a novel system that integrates voice and facial recognition technologies to enhance human-computer interaction by accurately interpreting and responding to user emotions. Unlike conventional approaches that analyze either voice or facial expressions in isolation, this system combines both modalities, o ering a more comprehensive understanding of emotional states. By evaluating facial expressions, vocal tones, and contextual conversation history, the system generates personalized, context-aware responses, fostering more natural and empathetic AI interactions. This advancement significantly improves user engagement and satisfaction, paving the way for emotionally intelligent AI applications across diverse fields.

References

Alnuaim, A. A., Zakariah, M., Alhadlaq, A., Shashidhar, C., Hatamleh, W. A., Tarazi, H., Shukla, P. K. and Ratna, R. (2022) Human-Computer Interaction with Detection of Speaker Emotions Using Convolution Neural Networks. Computational Intelligence and Neuroscience, 2022 (1): 7463091.
Search in Google Scholar Back to article
Baevski, A., Zhou, Y., Mohamed, A. and Auli, M. (2020) wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in Neural Information Processing Systems, 33, 12449-12460.
Search in Google Scholar Back to article
Chandolikar, N., Joshi, C., Roy, P., Gawas, A. and Vishwakarma, M. (2022, March) Voice recognition: A comprehensive survey. In: 2022 International Mobile and Embedded Technology Conference (MECON). IEEE, 45–51.
Search in Google Scholar Back to article
Chowdary, M. K., Nguyen, T.N. and Hemanth, D. J. (2023) Deep learning-based facial emotion recognition for human–computer interaction applications. Neural Computing and Applications, 35 (32): 23311–28.
Search in Google Scholar Back to article
Ekman, P. and Friesen, W. V. (1971) Constants across cultures in the face and emotion. Journal of Personality and Social Psychology, 17 (2), 124.
Search in Google Scholar Back to article
Eyben, F., Scherer, K. R., Schuller, B. W., Sundberg, J., André, E., Busso, C., Devillers, L. Y., Epps, J., Laukka, P., Narayanan, S. S. and Truong, K. P. (2015) The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing, 7(2), 190–202.
Search in Google Scholar Back to article
Goodfellow, I. J., Erhan, D., Carrier, P. L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D. H. and Zhou, Y. (2013) Challenges in representation learning: A report on three machine learning contests. In: Neural Information Processing: 20th International Conference, ICONIP 2013, Daegu, Korea, November 3-7, 2013. Proceedings, Part III 20. Springer Berlin Heidelberg, 117–124.
Search in Google Scholar Back to article
Kanna, R. K., Surendhar, P. A., Rubi, J., Jyothi, G., Ambikapathy, A. and Vasuki, R. (2022) Human Computer Interface Application for Emotion Detection Using Facial Recognition. In: 2022 IEEE International Conference on Current Development in Engineering and Technology (CCET). IEEE, 1–7.
Search in Google Scholar Back to article
Khare, S. K., Blanes-Vidal, V., Nadimi, E. S. and Acharya, U. R. (2024) Emotion recognition and artificial intelligence: A systematic review (2014–2023) and research recommendations. Information Fusion, 102, 102019.
Search in Google Scholar Back to article
Khattak, A., Asghar, M. Z., Ali, M. and Batool, U. (2022) An efficient deep learning technique for facial emotion recognition. Multimedia Tools and Applications, January, 81 (2): 1649–1683.
Search in Google Scholar Back to article
Kumar, H. and Martin, A. (2023) Artificial Emotional Intelligence: Conventional and deep learning approach. Expert Systems with Applications, February, 1, 212: 118651.
Search in Google Scholar Back to article
Lim, Y., Ng, K. W., Naveen, P. and Haw, S. C. (2022) Emotion recognition by facial expression and voice: review and analysis. Journal of Informatics and Web Engineering, 1(2), 45–54.
Search in Google Scholar Back to article
Majumder, N., Poria, S., Hazarika, D., Mihalcea, R., Gelbukh, A. and Cambria, E. (2019) Dialoguernn: An attentive rnn for emotion detection in conversations. In: Proceedings of the AAAI conference on artificial intelligence, 33, 01. AAAI, 6818–6825.
Search in Google Scholar Back to article
Mannar Mannan, J., Srinivasan, L., Maithili, K. and Ramya, C. (2023) Human emotion recognize using convolutional neural network (CNN) and Mel frequency cepstral coefficient (MFCC). Seybold Report Journal, 18 (4): 49–61.
Search in Google Scholar Back to article
Mansouri, A., Affendey, L. S. and Mamat, A. (2008) Named entity recognition approaches. International Journal of Computer Science and Network Security, 8(2), 339–344.
Search in Google Scholar Back to article
Poria, S., Cambria, E., Bajpai, R. and Hussain, A. (2017) A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion, 37, 98–125.
Search in Google Scholar Back to article
Raji, I. D. and Buolamwini, J. (2019) Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial ai products. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. ACM, 429–435.
Search in Google Scholar Back to article
Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K. and Müller, K. R., eds. (2019) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS 11700 Springer Nature.
Search in Google Scholar Back to article
Sarvakar, K., Senkamalavalli, R., Raghavendra, S., Kumar, J. S., Manjunath, R. and Jaiswal, S. (2023) Facial emotion recognition using convolutional neural networks. Materials Today: Proceedings, January, 1, 80: 3560-3564.
Search in Google Scholar Back to article
Scherer, K. R. (2003) Vocal communication of emotion: A review of research paradigms. Speech Communication, 40 (1-2), 227–256.
Search in Google Scholar Back to article
Shahzad, H. M., Bhatti, S. M., Jaffar, A., Akram, S., Alhajlah, M. and Mahmood, A. (2023) Hybrid facial emotion recognition using CNN-based features. Applied Sciences, 13 (9), 5572.
Search in Google Scholar Back to article
Trinh, V. L, Dao, Th. L. T., Le Xuan, T. and Castelli, E. (2022) Emotional speech recognition using deep neural networks. Sensors, February, 12, 22 (4), 1414.
Search in Google Scholar Back to article
Venkatesan, R., Shirly, S., Selvarathi, M. and Jebaseeli, T. J. (2023) Human Emotion Detection Using DeepFace and Artificial Intelligence. Engineering Proceedings, 59(1), 37.
Search in Google Scholar Back to article
Zafar, A., Aamir, M., Mohd Nawi, N., Arshad, A., Riaz, S., Alruban, A., Almotairi, S. and Dutta, A. K. (2022) A comparison of pooling methods for convolutional neural networks. Applied Sciences, 12 (17), 8643.
Search in Google Scholar Back to article
Zeng, Z., Pantic, M., Roisman, G. I. and Huang, T. S. (2007) A survey of a ect recognition methods: audio, visual and spontaneous expressions. In: Proceedings of the 9th International Conference on Multimodal Interfaces. ACM, 126–133.
Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.2478/candc-2024-0024 | Journal eISSN: 2720-4278 | Journal ISSN: 0324-8569

Journal RSS Feed

Language: English

Page range: 619 - 638

Submitted on: Sep 1, 2024

Accepted on: Mar 1, 2025

Published on: Aug 26, 2025

Published by: Systems Research Institute Polish Academy of Sciences

In partnership with: Paradigm Publishing Services

Publication frequency: 4 issues per year

Keywords:

emotion recognition,

multimodal AI,

human-computer interaction,

Related subjects:

Computer sciences, other,

Engineering,

Electrical engineering,

Fundamentals of electrical engineering,

Mechanical engineering,

Fundamentals of mechanical engineering,

Mathematics,

General mathematics

© 2025 Ganta Neeraja, Jakkula Sai Surya Teja, M. Ravi Kumar, J. Lakshmi Prasanna, Parvez M. Muzammil, Chella Santhosh, published by Systems Research Institute Polish Academy of Sciences
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Volume 53 (2024): Issue 4 (December 2024)