Learning Abstract Visual Reasoning Via Task Decomposition: A Case Study in Raven Progressive Matrices

Kwiatkowski, Jakub; Krawiec, Krzysztof

doi:10.61822/amcs-2024-0022

Abstract

Learning to perform abstract reasoning often requires decomposing the task in question into intermediate subgoals that are not specified upfront, but need to be autonomously devised by the learner. In Raven progressive matrices (RPMs), the task is to choose one of the available answers given a context, where both the context and answers are composite images featuring multiple objects in various spatial arrangements. As this high-level goal is the only guidance available, learning to solve RPMs is challenging. In this study, we propose a deep learning architecture based on the transformer blueprint which, rather than directly making the above choice, addresses the subgoal of predicting the visual properties of individual objects and their arrangements. The multidimensional predictions obtained in this way are then directly juxtaposed to choose the answer. We consider a few ways in which the model parses the visual input into tokens and several regimes of masking parts of the input in self-supervised training. In experimental assessment, the models not only outperform state-of-the-art methods but also provide interesting insights and partial explanations about the inference. The design of the method also makes it immune to biases that are known to be present in some RPM benchmarks.

References

Barrett, D., Hill, F., Santoro, A., Morcos, A. and Lillicrap, T. (2018). Measuring abstract reasoning in neural networks, in J. Dy and A. Krause (Eds), Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 80, PMLR, Cambridge, pp. 511–520.
Search in Google Scholar Back to article
Benny, Y., Pekar, N. and Wolf, L. (2021). Scale-localized abstract reasoning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, pp. 12557–12565.
Search in Google Scholar Back to article
Bongard, M. (1970). Pattern Recognition, Spartan Books, Baltimore.
Search in Google Scholar Back to article
Defays, D. (1995). Numbo: A study in cognition and recognition, https://www.researchgate.net/publication/262363566_Numbo_a_study_in_cognition_and_recognition.
Search in Google Scholar Back to article
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K. and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database, 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, pp. 248–255.
Search in Google Scholar Back to article
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J. and Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale, arXiv: 2010.11929.
Search in Google Scholar Back to article
Hahne, L., Lüddecke, T., Wörgötter, F. and Kappel, D. (2019). Attention on abstract visual reasoning, CoRR: abs/1911.05990.
Search in Google Scholar Back to article
Hofstadter, D.R. (1995). Fluid Concepts & Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought, Basic Books, New York.
Search in Google Scholar Back to article
Hu, S., Ma, Y., Liu, X., Wei, Y. and Bai, S. (2020). Hierarchical rule induction network for abstract visual reasoning, https://www.researchgate.net/publication/339324056_Hierarchical_Rule_Induction_Network_for_Abstract_Visual_Reasoning.
Search in Google Scholar Back to article
Hu, S., Ma, Y., Liu, X., Wei, Y. and Bai, S. (2021). Stratified rule-aware network for abstract visual reasoning, Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1567–1574, (virtual).
Search in Google Scholar Back to article
Kim, Y., Shin, J., Yang, E. and Hwang, S.J. (2020). Few-shot visual reasoning with meta-analogical contrastive learning, in H. Larochelle et al. (Eds), Advances in Neural Information Processing Systems, Vol. 33, Curran Associates, Inc., Red Hook, pp. 16846–16856.
Search in Google Scholar Back to article
Lei Ba, J., Kiros, J.R. and Hinton, G.E. (2016). Layer normalization, arXiv: 1607.06450.
Search in Google Scholar Back to article
Luo, W., Li, Y., Urtasun, R. and Zemel, R. (2017). Understanding the effective receptive field in deep convolutional neural networks, arXiv: 1701.04128.
Search in Google Scholar Back to article
Małkiński, M. and Mańdziuk, J. (2022a). Deep learning methods for abstract visual reasoning: A survey on Raven’s progressive matrices, arXiv: 2201.12382.
Search in Google Scholar Back to article
Małkiński, M. and Mańdziuk, J. (2022b). Multi-label contrastive learning for abstract visual reasoning, IEEE Transactions on Neural Networks and Learning Systems 35(2): 1941–1953, DOI: 10.1109/TNNLS.2022.3185949.
Search in Google Scholar Back to article
Raven, J.C. (1936). Mental Tests Used in Genetic, the Performance of Related Individuals on Tests Mainly Educative and Mainly Reproductive, MSc thesis, University of London, London.
Search in Google Scholar Back to article
Spratley, S., Ehinger, K. and Miller, T. (2020). A closer look at generalisation in Raven, Computer Vision, ECCV 2020: 16th European Conference, Glasgow, UK, pp. 601–616, DOI: 10.1007/978-3-030-58583-9_36.
Search in Google Scholar Back to article
Tan, M. and Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks, in K. Chaudhuri and R. Salakhutdinov (Eds), Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 97, PMLR, Cambridge, pp. 6105–6114.
Search in Google Scholar Back to article
Tan, M. and Le, Q.V. (2021). EfficientNetV2: Smaller models and faster training, in M. Meila and T. Zhang (Eds), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Proceedings of Machine Learning Research, Vol. 139, PMLR, Cambrige, pp. 10096–10106.
Search in Google Scholar Back to article
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I. (2017). Attention is all you need, in I. Guyon et al. (Eds), Advances in Neural Information Processing Systems, Vol. 30, Curran Associates, Inc., Red Hook.
Search in Google Scholar Back to article
Wu, Y., Dong, H., Grosse, R.B. and Ba, J. (2020). The scattering compositional learner: Discovering objects, attributes, relationships in analogical reasoning, CoRR: abs/2007.04212.
Search in Google Scholar Back to article
Zhang, C., Gao, F., Jia, B., Zhu, Y. and Zhu, S.-C. (2019a). Raven: A dataset for relational and analogical visual reasoning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, pp. 5312–5322.
Search in Google Scholar Back to article
Zhang, C., Jia, B., Gao, F., Zhu, Y., Lu, H. and Zhu, S.-C. (2019b). Learning perceptual inference by contrasting, in H. Wallach et al. (Eds), Advances in Neural Information Processing Systems, Vol. 32, Curran Associates, Inc., Red Hook.
Search in Google Scholar Back to article
Zhuo, T. and Kankanhalli, M.S. (2021). Effective abstract reasoning with dual-contrast network, 9th International Conference on Learning Representations, ICLR 2021, (virtual).
Search in Google Scholar Back to article

Learning Abstract Visual Reasoning Via Task Decomposition: A Case Study in Raven Progressive Matrices

Abstract

Paradigm

My account