Have a personal or library account? Click to login
Combined YOLOv5 and HRNet for High Accuracy 2D Keypoint and Human Pose Estimation Cover

Combined YOLOv5 and HRNet for High Accuracy 2D Keypoint and Human Pose Estimation

Open Access
|Oct 2022

References

  1. [1] Ssd mobilenet v1 architecture (2018). [Accessed 22 Dec 2021]
  2. [2] Abdulla, W.: Mask r-cnn for object detection and instance segmentation on keras and tensorflow. https://github.com/matterport/Mask_RCNN (2017). [Accessed 20 Dec 2021]
  3. [3] Babu, S.C.: A 2019 guide to human pose estimation with deep learning. https://nanonets.com/blog/human-pose-estimation-2d-guide/. [Online: Accessed 5 December 2021]
  4. [4] Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv (2020)
  5. [5] Burrus, N.: Kinect calibration. http://nicolas.burrus.name/index.php/Research/KinectCalibration
  6. [6] Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Real-time multi-person 2D pose estimation using part affinity fields. In: IEEE Conference on CVPR, vol. 2017-Janua, pp. 1302–1310 (2017). DOI 10.1109/CVPR.2017.143
  7. [7] Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR (2017)10.1109/CVPR.2017.143
  8. [8] Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. CoRR abs/1507.06550 (2015)10.1109/CVPR.2016.512
  9. [9] Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded Pyramid Network for Multi-person Pose Estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018). DOI 10.1109/CVPR.2018.00742
  10. [10] Dai, J., Li, Y., He, K., Sun, J.: R-FCN: Object detection via region-based fully convolutional networks. Advances in Neural Information Processing Systems pp. 379–387 (2016)
  11. [11] Dai, J., Li, Y., He, K., Sun, J.: R-fcn: Object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. (2016). https://proceedings.neurips.cc/paper/2016/file/577ef1154f3240ad5b9b413aa7346a1e-Paper.pdf
  12. [12] Dang, Q., Yin, J., Wang, B., Zheng, W.: Deep learning based 2D human pose estimation: A survey. TPAMI 24(6), 663–676 (2021). DOI 10. 26599/TST.2018.9010100
  13. [13] Gao, H.: Single shot multibox detector implementation in pytorch. https://github.com/qfgaohao/pytorch-ssd (2020). [Accessed 20 Dec 2021]
  14. [14] Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 1440–1448 (2015). DOI 10.1109/ICCV.2015.169
  15. [15] Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014). DOI 10.1109/CVPR.2014.81
  16. [16] Glen., S.: “jaccard index/similarity coefficient” from statisticshowto.com: El-ementary statistics for the rest of us! https://www.statisticshowto.com/jaccard-index/. Online; accessed 6 December 2021
  17. [17] Haque, M.F., Lim, H.y., Kang, D.s.: Object Detection Based on VGG with ResNet Network. In: 2019 International Conference on Electronics, Information, and Communication (ICEIC), pp. 1–3. Institute of electronics and information engineers (IEIE)10.23919/ELINFOCOM.2019.8706476
  18. [18] He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)10.1109/ICCV.2017.322
  19. [19] He, K., Zhang, X., Ren, S., Sun, J.: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(9), 1904–1916 (2015). DOI 10.1109/TPAMI.2015.238982410.1109/TPAMI.2015.238982426353135
  20. [20] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on CVPR, vol. 2016-Decem, pp. 770–778 (2016). DOI 10.1109/CVPR.2016.90
  21. [21] Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., Murphy, K.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 3296–3305 (2017). DOI 10.1109/CVPR.2017.351
  22. [22] Hung, G.L., Sahimi, M.S.B., Samma, H., Almohamad, T.A., Lahasan, B.: Faster R-CNN Deep Learning Model for Pedestrian Detection from Drone Images. In: SN Computer Science, vol. 1, pp. 1–9. Springer Singapore (2020). DOI 10.1007/s42979-020-00125-y. https://doi.org/10.1007/s42979-020-00125-y
  23. [23] Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI 36(7), 1325–1339 (2014)10.1109/TPAMI.2013.24826353306
  24. [24] Jocher, G.R.: Head and person detection model. https://github.com/deepakcrk/yolov5-crowdhuman. Online; accessed 6 December 2021
  25. [25] Jocher, G.R.: Yolov5 tutorials. https://github.com/ultralytics/yolov5. Online; accessed 6 December 2021
  26. [26] Jonathan, H.: Object detection: speed and accuracy comparison (faster r-cnn, r-fcn, ssd, fpn, retinanet and yolov3) (2018). [Accessed 18 Dec 2021]
  27. [27] Krishnan, S.: Person-detection. https://github.com/SusmithKrishnan/person-detection (2021). [Accessed 20 Dec 2021]
  28. [28] Li, N.: Evoskeleton, cascaded 2d-to-3d lifting. https://github.com/Nicholasli1995/EvoSkeleton. Online; accessed 25 December 2021
  29. [29] Li, S., Ke, L., Pratama, K., Tai, Y.W., Tang, C.K., Cheng, K.T.: Cascaded deep monocular 3d human pose estimation with evolutionary training data. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)10.1109/CVPR42600.2020.00621
  30. [30] Liang, S., Sun, X., Wei, Y.: Compositional Human Pose Regression. In: ICCV, vol. 176-177, pp. 1–8 (2017). DOI 10.1016/j.cviu.2018.10.006
  31. [31] Lin, T.Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., Dollár, P.: Microsoft coco: Common objects in context (2014). http://arxiv.org/abs/1405.0312
  32. [32] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single shot multibox detector. In: European Conference on Computer Vision, vol. 9905 LNCS, pp. 21–37 (2016). DOI 10.1007/978-3-319-46448-0_2
  33. [33] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: B. Leibe, J. Matas, N. Sebe, M. Welling (eds.) ECCV (1), Lecture Notes in Computer Science, vol. 9905, pp. 21–37. Springer (2016). http://dblp.uni-trier.de/db/conf/eccv/eccv2016-1.htmlLiuAESRFB16
  34. [34] Luvizon, D.C., Tabia, H., Picard, D.: Human pose regression by combining indirect part detection and contextual information. Computers and Graphics (Pergamon) 85, 15–22 (2019). DOI 10.1016/j.cag. 2019.09.00210.1016/j.cag.2019.09.002
  35. [35] Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Computer Vision – ECCV 2016, pp. 483–499. Springer International Publishing (2016)10.1007/978-3-319-46484-8_29
  36. [36] Newell, A., Yang, K., Deng, J.: Stacked Hourglass Networks for Human Pose Estimation. In: ECCV (2016)10.1007/978-3-319-46484-8_29
  37. [37] openpose: openpose. https://github.com/CMU-Perceptual-Computing-Lab/openpose (2019). [Accessed 23 April 2019]
  38. [38] Ramanan, D.: Learning to parse images of articulated bodies. In: In NIPS (2006)
  39. [39] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-Decem, pp. 779–788 (2016). DOI 10.1109/CVPR.2016.91
  40. [40] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)10.1109/CVPR.2017.690
  41. [41] Redmon, J., Farhadi, A.: YOLO9000: Better, faster, stronger. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 6517–6525 (2017). DOI 10.1109/CVPR.2017.690
  42. [42] Redmon, J., Farhadi, A.: Yolov3 an incremental improvement (2018). http://arxiv.org/abs/1804.02767. [Accessed 18 April 2021]
  43. [43] Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems 28, pp. 91–99 (2015)
  44. [44] Ren, S., He, K., Girshick, R., Sun, J.: Faster RCNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2017). DOI 10.1109/TPAMI.2016. 257703110.1109/TPAMI.2016.257703127295650
  45. [45] Sapp, B., Taskar, B.: In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. DOI 10.1109/CVPR. 2013.471
  46. [46] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–14 (2015)
  47. [47] Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR (2019)10.1109/CVPR.2019.00584
  48. [48] Tan, D.: Image geometric transformation in numpy and opencv. https://towardsdatascience.com/image-geometric-transformation-in-numpy-and-opencv-936f5cd1d315 (2019). Online; accessed 6 December 2021
  49. [49] Thanh, N.T., Hùng, L.V., Công, P.T.: An Evaluation of Pose Estimation in Video of Traditional Martial Arts Presentation. Journal of Research and Development on Information and Communication Technology 2019(2), 114–126 (2019). DOI 10.32913/mic-ict-research.v2019.n2.86410.32913/mic-ict-research.v2019.n2.864
  50. [50] Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR, pp. 648–656. IEEE Computer Society (2015)10.1109/CVPR.2015.7298664
  51. [51] Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. CoRR abs/1312.4659 (2013). http://dblp.uni-trier.de/db/journals/corr/corr1312.htmlToshevS13
  52. [52] Toshev, A., Szegedy, C.: DeepPose: Human Pose Estimation via Deep Neural Networks. In: IEEE Conference on CVPR (2014)10.1109/CVPR.2014.214
  53. [53] Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., Xiao, B.: Deep high-resolution representation learning for visual recognition. TPAMI
  54. [54] Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)10.1109/CVPR.2016.511
  55. [55] Weiming Chen, Zijie Jiang, H.G., Ni, X.: Fall Detection Based on Key Points of of human-skeleton using openpose. Symmetry (2020)10.3390/sym12050744
  56. [56] Willett, N.S., Shin, H.V., Jin, Z., Li, W., Finkelstein, A.: Pose2Pose: Pose Selection and Transfer for 2D Character Animation. In: International Conference on Intelligent User Interfaces, Proceedings IUI, pp. 88–99 (2020). DOI 10.1145/3377325.3377505
  57. [57] Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: European Conference on Computer Vision (ECCV) (2018)10.1007/978-3-030-01231-1_29
  58. [58] Yang, W.: Human Pose Estimation 101. https://github.com/cbsudux/Human-Pose-Estimation-101percentage-of-correct-key-points—pck (2019). [Accessed 18 April 2021]
  59. [59] Yang, W., Ouyang, W., Li, H., Wang, X.: Endto-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: CVPR (2016)10.1109/CVPR.2016.335
  60. [60] Yang, W., Ouyang, W., Li, H., Wang, X.: End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. https://github.com/bearpaw/eval_pose (2016). Online; accessed 20 December 202110.1109/CVPR.2016.335
  61. [61] Zhang, H., Sciutto, C., Agrawala, M., Fatahalian, K.: Vid2Player: Controllable Video Sprites That Behave and Appear Like Professional Tennis Players. ACM Transactions on Graphics 40(3), 1–16 (2021). DOI 10.1145/344897810.1145/3448978
  62. [62] Zhang, X., Zou, J., He, K., Sun, J.: Accelerating Very Deep Convolutional Networks for Classification and Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(10), 1943–1955 (2016). DOI 10.1109/TPAMI.2015.250257910.1109/TPAMI.2015.250257926599615
  63. [63] Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: A weakly-supervised approach. In: The IEEE International Conference on Computer Vision (ICCV) (2017)10.1109/ICCV.2017.51
Language: English
Page range: 281 - 298
Submitted on: Jun 15, 2022
|
Accepted on: Oct 18, 2022
|
Published on: Oct 29, 2022
Published by: SAN University
In partnership with: Paradigm Publishing Services
Publication frequency: 4 issues per year

© 2022 Hung-Cuong Nguyen, Thi-Hao Nguyen, Jakub Nowak, Aleksander Byrski, Agnieszka Siwocha, Van-Hung Le, published by SAN University
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.