To address the challenges of heavy data processing volume and the difficulty in meeting real-time requirements for industrial applications in 3D point cloud–based manipulator grasping, this paper proposes a novel visual grasping method based on negative space analysis of point cloud bird's-eye view (BEV). First, the YOLOv8 network is employed to perform fast and accurate 2D localization of targets in RGB images, and a 3D frustum is constructed to preliminarily filter the scene point cloud, followed by the random sample consensus (RANSAC) algorithm to robustly segment the desktop support plane. The core innovation involves a geometric manifold projection strategy that reduces the dimensionality of sparse 3D point clouds onto a 2D BEV plane. Based on the theory of image moments, the contour of the "negative space" occupied by the object is analytically parsed, thereby solving the target's six-degree-of-freedom (6-DoF) grasping pose with a linear computational complexity of $ O(N) $. Experimental results demonstrate that, compared with the baseline method combining single-shot multiBox detector (SSD) and PointNetGPD, the proposed method achieves a 5% improvement in the total system success rate (rising from 65% to 70%). Moreover, the average computation time per grasp is significantly reduced from 550 to 210 ms, exhibiting a speed advantage of more than 2.6 times. This work verifies the feasibility of replacing complex 3D deep-learning models with lightweight geometric analysis in specific structured scenes.
Citation: Baoju Wu, Yancheng Li, Nanmu Hui, Xiaowei Han. Lightweight manipulator grasping method based on manifold projection and negative space analysis in point cloud bird's-eye view[J]. AIMS Electronics and Electrical Engineering, 2026, 10(3): 395-421. doi: 10.3934/electreng.2026016
To address the challenges of heavy data processing volume and the difficulty in meeting real-time requirements for industrial applications in 3D point cloud–based manipulator grasping, this paper proposes a novel visual grasping method based on negative space analysis of point cloud bird's-eye view (BEV). First, the YOLOv8 network is employed to perform fast and accurate 2D localization of targets in RGB images, and a 3D frustum is constructed to preliminarily filter the scene point cloud, followed by the random sample consensus (RANSAC) algorithm to robustly segment the desktop support plane. The core innovation involves a geometric manifold projection strategy that reduces the dimensionality of sparse 3D point clouds onto a 2D BEV plane. Based on the theory of image moments, the contour of the "negative space" occupied by the object is analytically parsed, thereby solving the target's six-degree-of-freedom (6-DoF) grasping pose with a linear computational complexity of $ O(N) $. Experimental results demonstrate that, compared with the baseline method combining single-shot multiBox detector (SSD) and PointNetGPD, the proposed method achieves a 5% improvement in the total system success rate (rising from 65% to 70%). Moreover, the average computation time per grasp is significantly reduced from 550 to 210 ms, exhibiting a speed advantage of more than 2.6 times. This work verifies the feasibility of replacing complex 3D deep-learning models with lightweight geometric analysis in specific structured scenes.
| [1] |
Chu FJ, Xu R, Vela PA (2018) Real-world multiobject, multi grasp detection. IEEE Robotic Autom Lett 3: 3355–3362.https://doi.org/10.1109/LRA.2018.2852777 doi: 10.1109/LRA.2018.2852777
|
| [2] |
Ribeiro EG, de Queiroz R, Grassi J (2021) Real-time deep learning approach to visual servo control and grasp detection for autonomous manipulator manipulation. Robot Auton Syst 139: 103757.https://doi.org/10.1016/j.manipulator.2021.103757 doi: 10.1016/j.manipulator.2021.103757
|
| [3] |
Jiang Y, Fang Y, Deng L (2025) PDCNet: A lightweight and efficient manipulator grasp detection framework via partial convolution and knowledge distillation. Comput Vis Image Und 259: 104441.https://doi.org/10.1016/j.cviu.2025.104441 doi: 10.1016/j.cviu.2025.104441
|
| [4] | Yang M, Li H (2025) GMatch: A lightweight, geometry-constrained keypoint matcher for zero-shot 6DoF pose estimation in manipulator grasp tasks. arXiv preprint arXiv: 2505.16144. |
| [5] |
Guo C, Zhu C, Liu Y, Huang R, Cao B, Zhu Q, et al. (2024) End-to-end lightweight transformer-based neural network for grasp detection towards fruit manipulator handling. Comput Electron Agr 221: 109014.https://doi.org/10.1016/j.compag.2024.109014 doi: 10.1016/j.compag.2024.109014
|
| [6] |
Xu Z, Xue J, Song Z, Jia R, Lu W (2025) Lightweight network research for manipulator visual grasp for deep space exploration. Neural Comput Appl 37: 17083–17109.https://doi.org/10.1007/s00521-025-11377-1 doi: 10.1007/s00521-025-11377-1
|
| [7] | Yang L, Bai Y, Wang Y, Alsarraj I, Kutyniok G, Wang Z, et al. (2026) Lightweight learning from actuation-space demonstrations via flow matching for whole-body soft manipulator grasping. IEEE Robotic Autom Lett 11: 6720‒6727. |
| [8] |
Nguyen N, Vu MN, Huang B, Vuong A, Le N, Vo T, et al. (2024) Lightweight language-driven grasp detection using conditional consistency model. IEEE/RSJ International Conference on Intelligent Manipulators and Systems (IROS), 13719‒13725.https://doi.org/10.1109/IROS58592.2024.10802007 doi: 10.1109/IROS58592.2024.10802007
|
| [9] |
Fang HS, Wang C, Gou M, Lu C (2020) GraspNet-1Billion: A large-scale benchmark for general object grasping. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11444‒11453.https://doi.org/10.1109/CVPR42600.2020.01146 doi: 10.1109/CVPR42600.2020.01146
|
| [10] |
Wang C, Martín-Martín R, Xu D, Lv J, Lu C, Fei-Fei L, et al. (2020) 6-PACK: Category-level 6D pose tracker with anchor-based keypoints. IEEE International Conference on Robotics and Automation (ICRA), 10059‒10066.https://doi.org/10.1109/ICRA40945.2020.9196643 doi: 10.1109/ICRA40945.2020.9196643
|
| [11] |
Farhadi A, Mirzarezaee M, Sharifi A, Teshnehlab M (2024) Domain adaptation in reinforcement learning: a comprehensive and systematic study. Front Inform Tech Electr Eng 25: 1446‒1465.https://doi.org/10.1631/FITEE.2300668 doi: 10.1631/FITEE.2300668
|
| [12] |
Pan Y, Zhang T, Li R (2025) Object dynamic recognition and grasping location via lightweight semantic attention network with learnable boundary vectors. Measurement 258: 119386.https://doi.org/10.1016/j.measurement.2025.119386 doi: 10.1016/j.measurement.2025.119386
|
| [13] | Wang S, Fei S (2019) Research and improvement of SSD (Single Shot MultiBox Detector) target detection algorithm. Industrial Control Computer 32: 103–105. |
| [14] |
Ten PA, Gualtieri M, Saenko K (2017) Grasp pose detection in point clouds. The International Journal of Robotics Research 36: 1455–1473.https://doi.org/10.1177/0278364917735594 doi: 10.1177/0278364917735594
|
| [15] |
Qi CR, Su H, Mo K (2017) PointNet: Deep learning on point sets for 3D classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 652–660.https://doi.org/10.1109/CVPR.2017.70 doi: 10.1109/CVPR.2017.70
|
| [16] |
Liang H, Ma X, Li S (2019) PointNetGPD: Detecting grasp configurations from point sets. IEEE International Conference on Robotics and Automation (ICRA), 3629–3635.https://doi.org/10.1109/ICRA.2019.8794435 doi: 10.1109/ICRA.2019.8794435
|
| [17] |
Zhang Q, Zhang L, Dai C, Huang H, Liu L, Guo J, et al. (2023) RTFT6D: A real-time 6D pose estimation with fusion transformer. International Conference on Autonomous Unmanned Systems, 430–440.https://doi.org/10.1007/978-981-97-1099-7_41 doi: 10.1007/978-981-97-1099-7_41
|
| [18] |
Chai Z, Liu C, Xiong Z (2023) Multi-pyramid-based hierarchical template matching for 6D pose estimation in industrial grasping task. Ind Robot 50: 659–672.https://doi.org/10.1108/IR-08-2022-0220 doi: 10.1108/IR-08-2022-0220
|
| [19] |
Zhang H, Tan J, Zhao C, Liang Z, Liu L, Zhong H, et al. (2020) A fast detection and grasping method for mobile manipulator based on improved Faster R-CNN. Ind Robot 47: 167–175.https://doi.org/10.1108/IR-07-2019-0150 doi: 10.1108/IR-07-2019-0150
|
| [20] | Yu JY, Huang D, Gao J, Li W (2023) Grasping perception method of space manipulator for complex scene task. In Third International Conference on Machine Learning and Computer Application (ICMLCA 2022) 12636: 930‒940.https://doi.org/10.1117/12.2675288 |
| [21] |
Boulch A (2020) ConvPoint: Continuous convolutions for point cloud processing. Comput Graph 88: 24–34.https://doi.org/10.1016/j.cag.2020.02.005 doi: 10.1016/j.cag.2020.02.005
|
| [22] |
Zhou Y, Tuzel O (2018) VoxelNet: End-to-end learning for point cloud based 3D object detection. Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR), 4490–4499.https://doi.org/10.1109/CVPR.2018.00472 doi: 10.1109/CVPR.2018.00472
|
| [23] |
Shi SS, Wang XG, Li HS (2019) PointRCNN: 3D object proposal generation and detection from point cloud. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 770–779.https://doi.org/10.1109/CVPR.2019.00086 doi: 10.1109/CVPR.2019.00086
|
| [24] |
Hui NM, Wu XH, Han XW, Wu BJ (2024) A robotic arm visual grasp detection algorithm combining 2D images and 3D point clouds. Appl Mech Mater 919: 209‒223.https://doi.org/10.4028/p-vnDoN1 doi: 10.4028/p-vnDoN1
|
| [25] |
Zhang Y, Xiang Z, Qiao C, Chen S (2020) High precision real-time target detection based on 3D point cloud bird's eye view. Manipulator 42: 148–156.https://doi.org/10.13973/j.cnki.manipulator.190236 doi: 10.13973/j.cnki.manipulator.190236
|
| [26] | Liu Z, Luo J, Pan Z (2019) Mid-view projection processing based on radar point cloud. Information Technology and Network Security 38: 40–44. |
| [27] | Guo Y, Wang H, Gao X, Wang H, Wang Y (2026) Survey of BEV 3D object detection algorithm system. Journal of Computer Applications 46: 1238–1252. |
| [28] | Lian QY, Zheng SW, Tu XK, Li WH (2025) Voxel feature attention-based point cloud object dection algorithm for traffic cone. Journal of Mechanical Engineering 61: 239–249. |
| [29] | Chen X, Han L, Xiao Y, Xue B, Ma L (2025) 3D object detection of point cloud based on voxel-keypoint feature aggregation network. Laser Technology 50: 291–299. |
| [30] | Xu K, Li W (2024) End-to-end multi-task 3D object detection method based on bird's eye view images. Computer Simulation 41: 176–181. |
| [31] |
Zhang T, Xiao Z, Zou YB (2022) Workpiece recognition and pose estimation based on 3D point cloud features. Journal of Machinery Design & Manufacturing, 252–256.https://doi.org/10.3969/j.issn.1001-3997.2022.02.054 doi: 10.3969/j.issn.1001-3997.2022.02.054
|
| [32] |
Wu J, Fang HG, Yang GX (2022) 6D pose estimation and manipulator arm grasping based on minimum size point model. Computer Integrated Manufacturing Systems 28: 2472–2480.https://doi.org/10.13196/j.cims.2022.08.018 doi: 10.13196/j.cims.2022.08.018
|
| [33] |
Zhong Y, Zhang J, Zhang H (2022) Manipulator hand-eye calibration method based on target detection. Journal of Computer Engineering 48: 100–106.https://doi.org/10.19678/j.issn.1000-3428.0060670 doi: 10.19678/j.issn.1000-3428.0060670
|
| [34] |
Fischler MA, Bolles RC (1981) Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM 24: 381–395.https://doi.org/10.1145/358669.358692 doi: 10.1145/358669.358692
|
| [35] | Jocher G, Chaurasia A, Qiu J (2023) YOLO by Ultralytics. arXiv preprint arXiv: 2309.13353. |
| [36] |
Konishi Y, Hattori K, Hashimoto M (2019) Real-time 6D object pose estimation on CPU. IEEE/RSJ International Conference on Intelligent Manipulators and Systems (IROS), 3451–3458.https://doi.org/10.1109/IROS40897.2019.8967967 doi: 10.1109/IROS40897.2019.8967967
|
| [37] |
Liao Y, Kang S, Li J, Liu Y, Liu Y, et al. (2024) Mobile-Seed: Joint semantic segmentation and boundary detection for mobile manipulators. IEEE Robot Autom Lett 9: 3902–3909.https://doi.org/10.1109/LRA.2024.3373235 doi: 10.1109/LRA.2024.3373235
|
| [38] |
Cavelli RF, Cheng PDC, Indri M (2024) Motion planning and safe object handling for a low-resource mobile manipulator as human assistant. IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), 1–8.https://doi.org/10.1109/ETFA61755.2024.10711157 doi: 10.1109/ETFA61755.2024.10711157
|
| [39] |
Gao Z, Li C, Ma D, Chong NY (2024) Object re-orientation via two-edge-contact pushing along a circular path based on friction estimation. Eighth IEEE International Conference on Manipulator Computing (IRC), 17–23.https://doi.org/10.1109/IRC63610.2024.00009 doi: 10.1109/IRC63610.2024.00009
|
| [40] |
Chen H, Quan F, Fang L, Zhang S (2019) Aerial grasping with a lightweightmanipulator based on multi-objective optimization and visual compensation. Sensors 19: 4253.https://doi.org/10.3390/s19194253 doi: 10.3390/s19194253
|
| [41] |
Xie Y, Liu J, Yang Y (2024) Pose optimization for mobile manipulator grasping based on hybrid manipulability. Ind Robot 51: 134–147.https://doi.org/10.1108/IR-06-2023-0128 doi: 10.1108/IR-06-2023-0128
|