References
- C. Liu, J. Yuen, and A. Torralba. Sift flow: Dense correspondence across scenes and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33:978–994, 2010.
- H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. In European Conference on Computer Vision, pages 404–417. Springer, Berlin, Heidelberg, 2006.
- E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. Orb: An efficient alternative to sift or surf. In 2011 International Conference on Computer Vision, pages 2564–2571. IEEE, 2011.
- Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 224–236, 2018.
- Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature matching with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938–4947, 2020.
- Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. Lightglue: Local feature matching at light speed. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17627–17638, 2023.
- J. Sun, Z. Shen, Y. Wang, et al. Loftr: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8922–8931, 2021.
- Khang Truong Giang, Soohwan Song, and Sungho Jo. Topicfm: Robust and interpretable topic-assisted feature matching. In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 2447–2455, 2023.
- Hongkai Chen, Zixin Luo, Lei Zhou, Yurun Tian, Mingmin Zhen, Tian Fang, David Mckinnon, Yanghai Tsin, and Long Quan. Aspanformer: Detector-free image matching with adaptive span transformer. In European conference on computer vision, pages 20–36. Springer, 2022.
- Qing Wang, Jiaming Zhang, Kailun Yang, Kunyu Peng, and Rainer Stiefelhagen. Matchformer: Interleaving attention in transformers for feature matching. In Proceedings of the Asian conference on computer vision, pages 2746–2762, 2022.
- J. Edstedt, I. Athanasiadis, M. Wadenb¨ack, et al. Dkm: Dense kernelized feature matching for geometry estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17765–17775, 2023.
- J. Edstedt, Q. Sun, G. Bökman, et al. Roma: Revisiting robust losses for dense feature matching. arXiv preprint arXiv:2305.15404, 2023.
- Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE transactions on neural networks, 20(1):61–80, 2008.
- Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, and Andre Araujo. Omniglue: Generalizable feature matching with foundation model guidance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19865–19875, 2024.
- Johan Edstedt, Georg Bökman, Mårten Wadenb¨ack, and Michael Felsberg. Dedode: Detect, don’t describe—describe, don’t detect for local feature matching. In 2024 International Conference on 3D Vision (3DV), pages 148–157. IEEE, 2024.
- Yifan Wang, Xingyi He, Sida Peng, Dongli Tan, and Xiaowei Zhou. Efficient loftr: Semi-dense local feature matching with sparse-like speed. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 21666–21675, 2024.
- Michał Tyszkiewicz, Pascal Fua, and Eduard Trulls. Disk: Learning local features with policy gradient. In Advances in Neural Information Processing Systems, volume 33, pages 14254–14265, 2020.
- Shitao Tang, Jiahui Zhang, Siyu Zhu, and Ping Tan. Quadtree attention for vision transformers. arXiv preprint arXiv:2201.02767, 2022.
- P. Truong, M. Danelljan, L. Van Gool, et al. Learning accurate dense correspondences and when to trust them. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5714–5724, 2021.
- Tao Xie, Kun Dai, Ke Wang, Ruifeng Li, and Lijun Zhao. Deepmatcher: a deep transformer-based network for robust and accurate local feature matching. Expert Systems with Applications, 237:121361, 2024.
- Yongxian Zhang, Chaozhen Lan, Haiming Zhang, Guorui Ma, and Heng Li. Multimodal remote sensing image matching via learning features and attention mechanism. IEEE Transactions on Geo-science and Remote Sensing, 62:1–20, 2024.
- Wang Zhang, Tingting Li, Yuntian Zhang, Gensheng Pei, Xiruo Jiang, and Yazhou Yao. Ltformer: A light-weight transformer-based self-supervised matching network for heterogeneous remote sensing images. Information Fusion, 109:102425, 2024.
- Xin Hu, Yan Wu, Zhikang Li, Zhifei Yang, and Ming Li. Multi-feature alignment and matching network for sar and optical image registration. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024.
- Yongjun Zhang, Peihao Wu, Yongxiang Yao, Yi Wan, Wenfei Zhang, Yansheng Li, and Xiaohu Yan. Multi-modal remote sensing image robust matching based on second-order tensor orientation feature transformation. IEEE Transactions on Geoscience and Remote Sensing, 2025.
- Albert Gu, Karan Goel, and Christopher Ré. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021.
- Albert Gu and Tri Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
- Tri Dao and Albert Gu. Transformers are ssms: Generalized models and efficient algorithms through structured state space duality. arXiv preprint arXiv:2405.21060, 2024.
- Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz, et al. Jamba: A hybrid transformer-mamba language model, 2024. URL https://arxiv.org/abs/2403.19887, page 34.
- Weihao Yu and Xinchao Wang. Mambaout: Do we really need mamba for vision? arXiv preprint arXiv:2405.07992, 2024.
- Lianghui Zhu, Bencheng Liao, Qian Zhang, Xin-long Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024.
- Thomas Roßberg and Michael Schmitt. Estimating ndvi from sentinel-1 sar data using deep learning. In IGARSS 2022-2022 IEEE International Geo-science and Remote Sensing Symposium, pages 1412–1415. IEEE, 2022.
- Y. Di, Y. Liao, H. Zhou, K. Zhu, Y. Zhang, Q. Duan, et al. Femip: detector-free feature matching for multimodal images with policy gradient. Applied Intelligence, 2023.
- Matthew Brown and Sabine Süsstrunk. Multi-spectral sift for scene category recognition. In CVPR 2011, pages 177–184. IEEE, 2011.
- Xue Li, Guo Zhang, Hao Cui, Shasha Hou, Shun-yao Wang, Xin Li, Yujia Chen, Zhijiang Li, and Li Zhang. Mcanet: A joint semantic segmentation framework of optical and sar images for land use classification. International Journal of Applied Earth Observation and Geoinformation, 106:102638, 2022.
- Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor segmentation and support inference from rgbd images. In European conference on computer vision, pages 746–760. Springer, 2012.
- I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- I. Loshchilov and F. Hutter. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
- Yide Di, Yun Liao, Hao Zhou, Kaijun Zhu, Qing Duan, Junhui Liu, and Mingyu Lu. Ufm: Unified feature matching pre-training with multi-modal image assistants. PloS one, 20(3):e0319051, 2025.
- Yun Liao, Xuning Wu, Junhui Liu, Peiyu Liu, Zhixuan Pan, and Qing Duan. Fmcfa: a feature matching method for critical feature attention in multimodal images. Scientific Reports, 15(1):6640, 2025.
- Yide Di, Yun Liao, Kaijun Zhu, Hao Zhou, Yijia Zhang, Qing Duan, Junhui Liu, and Mingyu Lu. Mivi: Multi-stage feature matching for infrared and visible image. The Visual Computer, 40(3):1839–1851, 2024.
- Vassileios Balntas, DP Edgar Riba, and K. Mikolajczyk. Learning local feature descriptors with triplets and shallow convolutional neural networks. In Proceedings of the British Machine Vision Conference (BMVC), pages 119.1–119.11. BMVA Press, 2016. Available from: https://dx.doi.org/10.5244/C.30.119.
- Y. Liao, Y. Di, H. Zhou, A. Li, J. Liu, M. Lu, et al. Feature matching and position matching between optical and sar with local deep feature descriptor. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15:448–462, 2022.
- X. Han, T. Leung, Y. Jia, R. Sukthankar, and A. C. Berg. Matchnet: Unifying feature and metric learning for patch-based matching. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3279–3286, 2015.
- Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Pollefeys, Josef Sivic, Akihiko Torii, and Torsten Sattler. D2-net: A trainable cnn for joint description and detection of local features. In Proceedings of the ieee/cvf conference on computer vision and pattern recognition, pages 8092–8101, 2019.