DCF–VQA: Counterfactual Structure Based on Multi–Feature Enhancement

Yang, Guan; Ji, Cheng; Liu, Xiaoming; Zhang, Ziming; Wang, Chen

doi:10.61822/amcs-2024-0032

References

Abbasnejad, E., Teney, D., Parvaneh, A., Shi, J. and van den Hengel, A. (2020). Counterfactual vision and language learning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, pp. 10044–10054.
Search in Google Scholar Back to article
Agrawal, A., Batra, D., Parikh, D. and Kembhavi, A. (2018). Don’t just assume; look and answer: Overcoming priors for visual question answering, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp. 4971–4980.
Search in Google Scholar Back to article
AlFawwaz, B.M., AL-Shatnawi, A., Al-Saqqar, F. and Nusir, M. (2022). Multi-resolution discrete cosine transform fusion technique face recognition model, Data 7(6): 80.
Search in Google Scholar Back to article
Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S. and Zhang, L. (2018). Bottom-up and top-down attention for image captioning and visual question answering, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp. 6077–6086.
Search in Google Scholar Back to article
Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L. and Parikh, D. (2015). VQA: Visual question answering, Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, pp. 2425–2433.
Search in Google Scholar Back to article
Cadene, R., Dancette, C., Ben younes, H., Cord, M., Parikh, D. (2019). Rubi: Reducing unimodal biases for visual question answering, Advances in Neural Information Processing Systems 32: 3197–3208.
Search in Google Scholar Back to article
Chen, L., Zheng, Y., Niu, Y., Zhang, H. and Xiao, J. (2023). Counterfactual samples synthesizing and training for robust visual question answering, IEEE Transactions on Pattern Analysis and Machine Intelligence 45(11):13218–13234.
Search in Google Scholar Back to article
Clark, C., Yatskar, M. and Zettlemoyer, L. (2019). Don’t take the easy way out: Ensemble based methods for avoiding known dataset biases, arXiv: 1909.03683.
Search in Google Scholar Back to article
Das, A., Agrawal, H., Zitnick, L., Parikh, D. and Batra, D. (2017a). Human attention in visual question answering: Do humans and deep networks look at the same regions?, Computer Vision and Image Understanding 163: 90–100.
Search in Google Scholar Back to article
Das, A., Kottur, S., Gupta, K., Singh, A., Yadav, D., Moura, J.M., Parikh, D. and Batra, D. (2017b). Visual dialog, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 326–335.
Search in Google Scholar Back to article
Gat, I., Schwartz, I., Schwing, A. and Hazan, T. (2020). Removing bias in multi-modal classifiers: Regularization by maximizing functional entropies, Advances in Neural Information Processing Systems 33: 3197–3208.
Search in Google Scholar Back to article
Gokhale, T., Banerjee, P., Baral, C. and Yang, Y. (2020). Mutant: A training paradigm for out-of-distribution generalization in visual question answering, arXiv: 2009.08566.
Search in Google Scholar Back to article
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D. and Parikh, D. (2017). Making the V in VQA matter: Elevating the role of image understanding in visual question answering, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, pp. 6904–6913.
Search in Google Scholar Back to article
Guo, Y., Nie, L., Cheng, Z., Ji, F., Zhang, J. and Del Bimbo, A. (2021). ADAVQA: Overcoming language priors with adapted margin cosine loss, arXiv: 2105.01993.
Search in Google Scholar Back to article
Hashemi, M., Mahmoudi, G., Kodeiri, S., Sheikhi, H. and Eetemadi, S. (2023). LXMERT model compression for visual question answering, arXiv: 2310.15325.
Search in Google Scholar Back to article
Kafle, K. and Kanan, C. (2017a). An analysis of visual question answering algorithms, Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp. 1965–1973.
Search in Google Scholar Back to article
Kafle, K. and Kanan, C. (2017b). Visual question answering: Datasets, algorithms, and future challenges, Computer Vision and Image Understanding 163: 3–20.
Search in Google Scholar Back to article
Kingma, D.P. and Ba, J. (2014). Adam: A method for stochastic optimization, arXiv: 1412.6980.
Search in Google Scholar Back to article
Kv, G. and Mittal, A. (2020). Reducing language biases in visual question answering with visually-grounded question encoder, Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, pp. 18–34.
Search in Google Scholar Back to article
Li, L., Gan, Z., Cheng, Y. and Liu, J. (2019). Relation-aware graph attention network for visual question answering, Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, pp. 10313–10322.
Search in Google Scholar Back to article
Liang, Z., Jiang, W., Hu, H. and Zhu, J. (2020). Learning to contrast the counterfactual samples for robust visual question answering, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3285–3292, (online).
Search in Google Scholar Back to article
Metwaly, M.K., Elkalashy, N.I. and Zaky, M.S. (2017). Discrete sine and cosine transforms for signal processing spectral overlap saliencies of induction machine, IEEE Transactions on Industrial Electronics 65(1): 189–199.
Search in Google Scholar Back to article
Niu, Y., Tang, K., Zhang, H., Lu, Z., Hua, X.-S. and Wen, J.-R. (2021). Counterfactual VQA: A cause–effect look at language bias, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, pp. 12700–12710.
Search in Google Scholar Back to article
Park, D.H., Hendricks, L.A., Akata, Z., Rohrbach, A., Schiele, B., Darrell, T. and Rohrbach, M. (2018). Multimodal explanations: Justifying decisions and pointing to the evidence, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp. 8779–8788.
Search in Google Scholar Back to article
Ramakrishnan, S., Agrawal, A. and Lee, S. (2018). Overcoming language priors in visual question answering with adversarial regularization, Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, Montreal, Canada, pp. 1548–1558.
Search in Google Scholar Back to article
Selvaraju, R.R., Lee, S., Shen, Y., Jin, H., Ghosh, S., Heck, L., Batra, D. and Parikh, D. (2019). Taking a hint: Leveraging explanations to make vision and language models more grounded, Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, pp. 2591–2600.
Search in Google Scholar Back to article
Shrestha, R., Kafle, K. and Kanan, C. (2020). A negative case analysis of visual grounding methods for VQA, arXiv: 2004.05704.
Search in Google Scholar Back to article
Si, Q., Lin, Z., Zheng, M., Fu, P. and Wang, W. (2021). Check it again: Progressive visual question answering via visual entailment, arXiv: 2106.04605.
Search in Google Scholar Back to article
Surówka, G. and Ogorzałek, M. (2022). Segmentation of the melanoma lesion and its border, International Journal of Applied Mathematics and Computer Science 32(4): 683–699, DOI: 10.34768/amcs-2022-0047.
Search in Google Scholar Back to article
Teney, D., Abbasnedjad, E. and van den Hengel, A. (2020). Learning what makes a difference from counterfactual examples and gradient supervision, Computer Vision— ECCV 2020: 16th European Conference, Glasgow, UK, pp. 580–599.
Search in Google Scholar Back to article
Teney, D. and van den Hengel, A. (2019). Actively seeking and learning from live data, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern, Recognition, Long Beach, USA.
Search in Google Scholar Back to article
Wu, J. and Mooney, R. (2019). Self-critical reasoning for robust visual question answering, 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, pp. 8604–8610.
Search in Google Scholar Back to article
Yang, C., Feng, S., Li, D., Shen, H., Wang, G. and Jiang, B. (2021). Learning content and context with language bias for visual question answering, 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China, pp. 1–6.
Search in Google Scholar Back to article
Yang, L., Xie, T., Liu, M., Zhang, M., Qi, S. and Yang, J. (2023a). Infrared small-target detection under a complex background based on a local gradient contrast method, International Journal of Applied Mathematics and Computer Science 33(1): 33-43, DOI: 10.34768/amcs-2023-0003.
Search in Google Scholar Back to article
Yang, P., Wang, Q., Chen, H. and Wu, Z. (2023b). Position-aware spatio-temporal graph convolutional networks for skeleton-based action recognition, IET Computer Vision 17(7): 844–854.
Search in Google Scholar Back to article
Yang, Z., He, X., Gao, J., Deng, L. and Smola, A. (2016). Stacked attention networks for image question answering, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 21–29.
Search in Google Scholar Back to article
Zellers, R., Bisk, Y., Farhadi, A. and Choi, Y. (2019). From recognition to cognition: Visual commonsense reasoning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, pp. 6720–6731.
Search in Google Scholar Back to article
Zhang, L., Liu, S., Liu, D., Zeng, P., Li, X., Song, J. and Gao, L. (2020). Rich visual knowledge-based augmentation network for visual question answering, IEEE Transactions on Neural Networks and Learning Systems 32(10): 4362–4373.
Search in Google Scholar Back to article
Zhu, X., Mao, Z., Liu, C., Zhang, P., Wang, B. and Zhang, Y. (2020). Overcoming language priors with self-supervised learning for visual question answering, arXiv: 2012.11528.
Search in Google Scholar Back to article

DCF–VQA: Counterfactual Structure Based on Multi–Feature Enhancement

References

Paradigm

My account