Have a personal or library account? Click to login
Monocular 3D Object Localization Using 2D Estimates for Industrial Robot Vision System Cover

Monocular 3D Object Localization Using 2D Estimates for Industrial Robot Vision System

Open Access
|Sep 2025

Figures & Tables

Figure 1.

Overview of the industrial robot vision system
Overview of the industrial robot vision system

Figure 2.

Illustration of industrial robot vision system: the green point is the initialized estimate center point, and the red point is the actual center point
Illustration of industrial robot vision system: the green point is the initialized estimate center point, and the red point is the actual center point

Figure 3.

A block diagram of our proposed calibration method. The translation vector between the initialized estimate center point (green point), and the calibration center point (red point) is calculated based on deep learning, and our novel calibration method
A block diagram of our proposed calibration method. The translation vector between the initialized estimate center point (green point), and the calibration center point (red point) is calculated based on deep learning, and our novel calibration method

Figure 4.

The progress of the calculation of the object position in the real-world coordinate
The progress of the calculation of the object position in the real-world coordinate

Figure 5.

The progress of object segmentation and edge extraction
The progress of object segmentation and edge extraction

Figure 6.

Illustration of the estimated translation vector
Illustration of the estimated translation vector

Figure 7.

Visualized examples of experimental results: figure (b): the orange point is the Yolo center, figure (d): dark red is the upper part center. The vector created by the blue points is a translation vector; the light blue point is the correction center
Visualized examples of experimental results: figure (b): the orange point is the Yolo center, figure (d): dark red is the upper part center. The vector created by the blue points is a translation vector; the light blue point is the correction center

Experimental results evaluate the position error of our algorithm (mm)

FoldSampleTraditional MethodRegression Method [30]Proposed Method
ΔxΔyErrΔxΔyErrΔxΔyErrΔxΔyErr
119.695.5111.151.341.321.880.381.141.200.950.761.22
28.368.7412.091.922.503.153.041.333.321.711.332.17
35.138.9310.301.231.521.961.140.951.481.140.951.48
410.078.3613.091.791.972.661.141.331.751.521.332.02
58.5510.2613.360.751.121.350.760.190.780.380.570.69
69.319.3113.170.961.411.710.191.521.530.570.760.95
214.185.897.221.453.073.401.332.092.480.951.141.48
210.0713.1116.533.123.414.622.471.713.002.471.332.81
33.427.037.821.212.783.031.521.902.430.760.570.95
410.078.7413.332.131.512.610.950.190.971.900.952.12
511.0212.1616.412.942.673.972.281.712.852.470.572.53
68.9311.0214.180.431.341.411.330.381.380.190.190.27
319.127.6011.871.391.421.990.950.190.971.140.381.20
24.1813.8714.492.543.364.213.230.193.242.280.572.35
39.884.1810.730.761.121.350.570.760.950.570.760.95
47.608.9311.731.892.172.881.710.951.961.710.951.96
510.454.9411.560.882.342.501.711.522.290.570.760.95
66.087.609.730.360.570.670.190.570.600.190.570.60
4113.8710.2617.251.372.843.150.762.472.580.952.092.30
211.977.9814.392.841.453.192.281.332.642.281.332.64
35.324.567.010.350.810.880.190.380.420.190.570.60
45.5116.3417.240.321.321.360.761.141.370.191.711.72
510.268.3613.230.920.921.302.280.762.400.571.141.27
66.2711.4013.011.471.632.191.902.853.431.331.331.88
Average8.308.9612.541.431.862.341.381.151.921.120.941.55

Performance comparison of various object segmentation models

AlgorithmmAPPrRcMS (MB)
Yolov5 [40]98.7%97.1%96.2%7.4
RCNN [41]97.8%98.1%96.4%16.8
Yolov7 [38]99.0%99.0%97.8%37.9
Yolov8 [39]99.2%98.7%97.4%11.8
Our99.8%99.1%97.9%28.9

Experiment setup details

Parameter SpecSpec
ProcessIntel Xeon Processor with two cores @ 2.3 GHz
GPUNVIDIA Tesla T4
RAM13 GB
OSUbuntu 20.04 LTS

Performance comparison of various object detection models

AlgorithmmAPPrRcMS
RTMDet [37]96.9%94.5%93.1%52.3
MobileNet [35]94.8%93.8%93.4%4.6
Fast R-CNN [34]97.0%93.4%94.1%12.9
Yolov3 [33]96.3%95.8%95.7%8.7
Yolov4 [36]96.8%96.6%95.4%60.0
Yolov7 [38]97.1%95.7%93.1%37.2
Yolov8 [39]97.8%95.5%94.4%11.1
Our98.7%98.6%97.0%7.0

Processing time of our proposed method (milliseconds)

PhaseProcessing Time
Object Detection15 ± 2
Object Segmentation40 ± 5
Object Calibration300 ± 10
DOI: https://doi.org/10.14313/jamris-2025-025 | Journal eISSN: 2080-2145 | Journal ISSN: 1897-8649
Language: English
Page range: 53 - 65
Submitted on: Jun 13, 2024
Accepted on: Sep 13, 2024
Published on: Sep 10, 2025
Published by: Łukasiewicz Research Network – Industrial Research Institute for Automation and Measurements PIAP
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2025 Thanh Nguyen Canh, Du Trinh Ngoc, Xiem HoangVan, published by Łukasiewicz Research Network – Industrial Research Institute for Automation and Measurements PIAP
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.