Monocular depth estimation is one of the essential tasks in computer vision as it can provide depth information from 2D images and is extremely beneficial for applications such as autonomous driving, robot navigation, etc. Monocular depth estimation has significantly improved over the past couple of years and deep learning-based methods have surpassed traditional and machine learning-based methods. Deep learning-based methods have further been enhanced using transformer and hybrid approaches. This paper first discusses the sensors used for depth estimation and their limitations. Then, we briefly discuss the evolution of depth estimation. Then we dive into the deep learning methods including transformer and CNN-transformer hybrid methods and their limitations. Later, we discuss several methods addressing challenging weather conditions. Finally, we discuss the current trends, challenges and future directions of the transformer and hybrid methods.
© 2025 Lakindu Kumara, Nipuna Senanayake, Guhanathan Poravi, published by Riga Technical University
This work is licensed under the Creative Commons Attribution 4.0 License.