Shufflemono: Rethinking Lightweight Network for Self-Supervised Monocular Depth Estimation

Feng, Yingwei; Hong, Zhiyong; Xiong, Liping; Zeng, Zhiqiang; Li, Jingmin

doi:10.2478/jaiscr-2024-0011

References

Tomasz Szmuc, Rafał Mrówka, Marek Brańka, Jakub Ficoń, Piotr Pieta, A Novel Method for Fast Generation of 3D Objects from Multiple Depth Sensors., Journal of Artificial Intelligence and Soft Computing Research, 2023, 13(2): 95-105.
Search in Google Scholar Back to article
Martin-Gomez, A., Li, H., Song, T., Yang, S., Wang, G., Ding, H., Navab, N., Zhao, Z., Armand, M., Sttar: surgical tool tracking using off-the-shelf augmented reality head-mounted displays., IEEE Transactions on Visualization and Computer Graphics, 2023, 1-16.
Search in Google Scholar Back to article
Rodrigues, R.T., Miraldo, P., Dimarogonas, D.V., Aguiar, A.P., A framework for depth estimation and relative localization of ground robots using computer vision., IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, 3719-3724.
Search in Google Scholar Back to article
Silva, R., Cielniak, G., Gao, J., Leaving the Lines Behind: Vision-Based Crop Row Exit for Agricultural Robot Navigation., Preprint at https://arxiv.org/abs/2306.05869, 2023.
Search in Google Scholar Back to article
Sharma, A., Nett, R., Ventura, J., Unsupervised learning of depth and ego-motion from cylindrical panoramic video with applications for virtual reality., International Journal of Semantic Computing, 2020, 14(03): 333-356.
Search in Google Scholar Back to article
Rasla, A., Beyeler, M., The relative importance of depth cues and semantic edges for indoor mobility using simulated prosthetic vision in immersive virtual reality., Proceedings of the 28th ACM Symposium on Virtual Reality Software and Technology, 2022, 1-11.
Search in Google Scholar Back to article
Patakin, N., Vorontsova, A., Artemyev, M., Konushin, A., Single-stage 3d geometry-preserving depth estimation model training on dataset mixtures with uncalibrated stereo data., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 1705-1714.
Search in Google Scholar Back to article
Peng, R., Wang, R., Wang, Z., Lai, Y., Wang, R., Rethinking depth estimation for multi-view stereo: A unified representation., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 8645-8654.
Search in Google Scholar Back to article
Choe, J., Joo, K., Imtiaz, T., Kweon, I.S., Volumetric propagation network: Stereo-lidar fusion for long-range depth estimation., IEEE Robotics and Automation Letters, 2021, 6(3): 4672-4679.
Search in Google Scholar Back to article
Hirschmuller, H., Accurate and efficient stereo processing by semi-global matching and mutual information., IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, 2: 807-814.
Search in Google Scholar Back to article
Chang, J.-R., Chen, Y.-S., Pyramid stereo matching network., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 5410-5418.
Search in Google Scholar Back to article
Liu, P., King, I., Lyu, M.R., Xu, J.,Flow2stereo: Effective self-supervised learning of optical flow and stereo matching., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, 6648-6657.
Search in Google Scholar Back to article
Ullman, S., The interpretation of structure from motion., Proceedings of the Royal Society of London. Series B. Biological Sciences, 1979, 203(1153): 405–426.
Search in Google Scholar Back to article
Zhou, T., Brown, M., Snavely, N., Lowe, D.G., Unsupervised learning of depth and ego-motion from video., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, 1851–1858.
Search in Google Scholar Back to article
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J., Digging into self-supervised monocular depth estimation., Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, 3828–3838.
Search in Google Scholar Back to article
Zhou, Z., Fan, X., Shi, P., Xin, Y., R-msfm: Recurrent multi-scale feature modulation for monocular depth estimating., Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, 12777–12786.
Search in Google Scholar Back to article
Zhang, N., Nex, F., Vosselman, G., Kerle, N., Litemono: A lightweight cnn and transformer architecture for self-supervised monocular depth estimation., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 18537–18546.
Search in Google Scholar Back to article
Zhang, X., Zhou, X., Lin, M., Sun, J., Shufflenet: An extremely efficient convolutional neural network for mobile devices., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 6848–6856.
Search in Google Scholar Back to article
Eigen, D., Fergus, R., Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture., Proceedings of the IEEE/CVF International Conference on Computer Vision, 2015, 2650–2658.
Search in Google Scholar Back to article
Hui, T.-W., Rm-depth: Unsupervised learning of recurrent monocular depth in dynamic scenes., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 1675–1684.
Search in Google Scholar Back to article
Yan, J., Zhao, H., Bu, P., Jin, Y., Channel-wise attention-based network for self-supervised monocular depth estimation., 2021 International Conference on 3D Vision (3DV), 2021, 464–473.
Search in Google Scholar Back to article
Zhao, C., Zhang, Y., Poggi, M., Tosi, F., Guo, X., Zhu, Z., Huang, G., Tang, Y., Mattoccia, S.: Monovit: Self-supervised monocular depth estimation with a vision transformer., 2022 International Conference on 3D Vision (3DV), 2022, 668–678 .
Search in Google Scholar Back to article
He, M., Hui, L., Bian, Y., Ren, J., Xie, J., Yang, J., Ra-depth: Resolution adaptive self-supervised monocular depth estimation., European Conference on Computer Vision, 2022, 565–581.
Search in Google Scholar Back to article
Shim, D., Kim, H.J., Swindepth: Unsupervised depth estimation using monocular sequences via swin transformer and densely cascaded network., arXiv preprint arXiv:2301.06715, 2023.
Search in Google Scholar Back to article
Jaderberg, M., Vedaldi, A., Zisserman, A., Speeding up convolutional neural networks with low rank expansions., Proceeding of the British Machine Vision Conference 2014. British Machine Vision Association, 2014.
Search in Google Scholar Back to article
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H., Mobilenets: Efficient convolutional neural networks for mobile vision applications., arXiv preprint arXiv:1704.04861, 2017.
Search in Google Scholar Back to article
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C., Mobilenetv2: Inverted residuals and linear bottlenecks., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 4510–4520.
Search in Google Scholar Back to article
Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al., Searching for mobilenetv3., Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, 1314–1324.
Search in Google Scholar Back to article
Ma, N., Zhang, X., Zheng, H.-T., Sun, J., Shufflenet v2: Practical guidelines for efficient cnn architecture design., Proceedings of the European Conference on Computer Vision, 2018,116–131.
Search in Google Scholar Back to article
Mehta, S., Rastegari, M., Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer., International Conference on Learning Representations., 2021.
Search in Google Scholar Back to article
Yang, R., Ma, H., Wu, J., Tang, Y., Xiao, X., Zheng, M., Li, X., Scalablevit: Rethinking the context-oriented generalization of vision transformer., Proceedings of the European Conference on Computer Vision, 2022, 480–496.
Search in Google Scholar Back to article
Chen, Y., Dai, X., Chen, D., Liu, M., Dong, X., Yuan, L., Liu, Z.: Mobile-former: Bridging mobilenet and transformer., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 5270–5279.
Search in Google Scholar Back to article
Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T., Axial attention in multidimensional transformers., arXiv preprint arXiv:1912.12180, 2019.
Search in Google Scholar Back to article
Mehta, S., Rastegari, M., Separable self-attention for mobile vision transformers., Transactions on Machine Learning Research, 2022.
Search in Google Scholar Back to article
Ronneberger, O., Fischer, P., Brox, T., U-net: Convolutional networks for biomedical image segmentation., Medical Image Computing and Computer-Assisted Intervention–MICCAI, 2015, 234–241.
Search in Google Scholar Back to article
Krizhevsky, A., Sutskever, I., Hinton, G.E., Imagenet classification with deep convolutional neural networks., Advances in neural information processing systems, 2012, 5.
Search in Google Scholar Back to article
Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K., Aggregated residual transformations for deep neural networks., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, 1492–1500.
Search in Google Scholar Back to article
Glorot, X., Bordes, A., Bengio, Y., Deep sparse rectifier neural networks., Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, 315–323.
Search in Google Scholar Back to article
Maas, A.L., Hannun, A.Y., Ng, A.Y., et al., Rectifier nonlinearities improve neural network acoustic models., Proc. Icml, 2013, 30(1): 3.
Search in Google Scholar Back to article
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., Image quality assessment: from error visibility to structural similarity., IEEE transactions on image processing, 2004, 600–612.
Search in Google Scholar Back to article
Girshick, R., Fast r-cnn., Proceedings of the IEEE/CVF International Conference on Computer Vision, 2015, 1440–1448.
Search in Google Scholar Back to article
Zhou, H., Greenwood, D., Taylor, S., Self-supervised monocular depth estimation with internal feature fusion, arXiv preprint arXiv:2110.09482, 2021.
Search in Google Scholar Back to article
Geiger, A., Lenz, P., Stiller, C., Urtasun, R., Vision meets robotics: The kitti dataset., The International Journal of Robotics Research., 2013, 32(11): 1231-1237.
Search in Google Scholar Back to article
Saxena, A., Sun, M., Ng, A.Y., Make3d: Learning 3d scene structure from a single still image., IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 31(5): 824-840.
Search in Google Scholar Back to article
Eigen, D., Puhrsch, C., Fergus, R., Depth map prediction from a single image using a multi-scale deep network., Advances in neural information processing systems, 2014, 27.
Search in Google Scholar Back to article
Wang, C., Buenaposada, J.M., Zhu, R., Lucey, S., Learning depth from monocular videos using direct methods., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 2022–2030.
Search in Google Scholar Back to article

Shufflemono: Rethinking Lightweight Network for Self-Supervised Monocular Depth Estimation

References

Paradigm

My account