DCF–VQA: Counterfactual Structure Based on Multi–Feature Enhancement

Yang, Guan; Ji, Cheng; Liu, Xiaoming; Zhang, Ziming; Wang, Chen

DCF–VQA: Counterfactual Structure Based on Multi–Feature Enhancement

International Journal of Applied Mathematics and Computer Science

Volume 34 (2024): Issue 3 (September 2024)

By:

Guan Yang, Cheng Ji, Xiaoming Liu, Ziming Zhang and Chen Wang

Open Access

|Oct 2024

Abstract

Visual question answering (VQA) is a pivotal topic at the intersection of computer vision and natural language processing. This paper addresses the challenges of linguistic bias and bias fusion within invalid regions encountered in existing VQA models due to insufficient representation of multi-modal features. To overcome those issues, we propose a multi-feature enhancement scheme. This scheme involves the fusion of one or more features with the original ones, incorporating discrete cosine transform (DCT) features into the counterfactual reasoning framework. This approach harnesses finegrained information and spatial relationships within images and questions, enabling a more refined understanding of the indirect relationship between images and questions. Consequently, it effectively mitigates linguistic bias and bias fusion within invalid regions in the model. Extensive experiments are conducted on multiple datasets, including VQA2 and VQA-CP2, employing various baseline models and fusion techniques, resulting in promising and robust performance.

DOI: https://doi.org/10.61822/amcs-2024-0032 | Journal eISSN: 2083-8492 | Journal ISSN: 1641-876X

Journal RSS Feed

Language: English

Page range: 453 - 466

Submitted on: Jan 10, 2024

Accepted on: May 20, 2024

Published on: Oct 1, 2024

Published by: Sciendo

In partnership with: Paradigm Publishing Services

Publication frequency: 4 times per year

Keywords:

visual question answering,

multi-feature enhancement,

counterfactual,

discrete cosine transform

Related subjects:

Mathematics,

Applied mathematics

© 2024 Guan Yang, Cheng Ji, Xiaoming Liu, Ziming Zhang, Chen Wang, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Previous article Volume 34 (2024): Issue 3 (September 2024)Next article