
Citation: Atul Narang, Sergei S. Pilyugin. Toward an Integrated Physiological Theory of Microbial Growth: From Subcellular Variables to Population Dynamics[J]. Mathematical Biosciences and Engineering, 2005, 2(1): 169-206. doi: 10.3934/mbe.2005.2.169
[1] | Peng Ren, Qunli Xia . Classification method for imbalanced LiDAR point cloud based on stack autoencoder. Electronic Research Archive, 2023, 31(6): 3453-3470. doi: 10.3934/era.2023175 |
[2] | Min Li, Ke Chen, Yunqing Bai, Jihong Pei . Skeleton action recognition via graph convolutional network with self-attention module. Electronic Research Archive, 2024, 32(4): 2848-2864. doi: 10.3934/era.2024129 |
[3] | Zongsheng Zheng, Jia Du, Yuewei Zhang, Xulong Wang . CoReFuNet: A coarse-to-fine registration and fusion network for typhoon intensity classification using multimodal satellite imagery. Electronic Research Archive, 2025, 33(4): 1875-1901. doi: 10.3934/era.2025085 |
[4] | Bingsheng Li, Na Li, Jianmin Ren, Xupeng Guo, Chao Liu, Hao Wang, Qingwu Li . Enhanced spectral attention and adaptive spatial learning guided network for hyperspectral and LiDAR classification. Electronic Research Archive, 2024, 32(7): 4218-4236. doi: 10.3934/era.2024190 |
[5] | Hui Xu, Jun Kong, Mengyao Liang, Hui Sun, Miao Qi . Video behavior recognition based on actional-structural graph convolution and temporal extension module. Electronic Research Archive, 2022, 30(11): 4157-4177. doi: 10.3934/era.2022210 |
[6] | Jinjiang Liu, Yuqin Li, Wentao Li, Zhenshuang Li, Yihua Lan . Multiscale lung nodule segmentation based on 3D coordinate attention and edge enhancement. Electronic Research Archive, 2024, 32(5): 3016-3037. doi: 10.3934/era.2024138 |
[7] | Guozhong Liu, Qiongping Tang, Changnian Lin, An Xu, Chonglong Lin, Hao Meng, Mengyu Ruan, Wei Jin . Semantic segmentation of substation tools using an improved ICNet network. Electronic Research Archive, 2024, 32(9): 5321-5340. doi: 10.3934/era.2024246 |
[8] | Bojian Chen, Wenbin Wu, Zhezhou Li, Tengfei Han, Zhuolei Chen, Weihao Zhang . Attention-guided cross-modal multiple feature aggregation network for RGB-D salient object detection. Electronic Research Archive, 2024, 32(1): 643-669. doi: 10.3934/era.2024031 |
[9] | Wenrui Guan, Xun Wang . A Dual-channel Progressive Graph Convolutional Network via subgraph sampling. Electronic Research Archive, 2024, 32(7): 4398-4415. doi: 10.3934/era.2024198 |
[10] | Yongmei Zhang, Zhirong Du, Lei Hu . A construction method of urban road risky vehicles based on dynamic knowledge graph. Electronic Research Archive, 2023, 31(7): 3776-3790. doi: 10.3934/era.2023192 |
With the rapid development of 3D sensing technology, 3D vision has gradually become a current research hotspot [1]. Three-dimensional data plays an extremely important role in various fields, including remote sensing[2], unmanned aerial vehicles[3], autonomous driving [4] and image segmentation[5]. Especially in the medical image segmentation field, for instance, optical coherence tomography (OCT) is an important standard for clinical retinal diagnosis [6] and widely used in imaging, cardiology, dermetology, etc.[7]. Traditional medical image segmentation methods are known for their information deficiency and low accuracy[8]. Therefore, enhancing accuracy and speed in medical image segmentation is a primary research focus. Currently, mainstream representations of 3D data include voxels, meshes and point clouds[9]. Voxel representation can be regarded as quantized and fixed-sized point clouds, similar to three-dimensional space pixels[10]. However, it has drawbacks such as insufficient resolution and high memory cost. Mesh representation is composed of polygons constructed by adjacent point clouds of an object[11], representing non-Euclidean data that is not directly related to the output of 3D sensors. Compared to voxels and meshes, 3D point cloud data avoids the complexity of data description[12] and becomes the preferred representation for 3D data due to its ability to effectively describe geometric information in 3D space, easy acquisition and no need for discretization. However, point cloud data possess characteristics such as disorder, non-structured nature and complex scenes[13] in 3D space, making analysis tasks based on point cloud data challenging.
In the early stages of point cloud research, manual feature extraction methods were used to extract point cloud features[14]. For instance, point feature histograms (FSPH, shapeme histogram), etc., were employed. However, manual feature extraction methods are usually limited to solving specific problems. With the increasing types and quantities of 3D point cloud data, manual extraction methods suffer from drawbacks like low efficiency and accuracy, making it difficult to extend them to other point cloud tasks.
With the widespread application of deep learning, especially the extensive use of convolutional neural networks (CNNs) in 2D vision research[15], 2D vision research has reached a relatively mature stage. However, for unordered and non-structured 3D point clouds, regular 2D convolutional kernels cannot be directly applied. To address these challenges in 3D point cloud research, early research attempted to transform unordered point clouds into regular structures (such as multi-view, voxel meshes, etc.) that can be handled by CNNs[16,17]. Although these methods achieved acceptable feature extraction results, they often resulted in partial information loss during the point cloud data transformation process, and also faced issues of computational complexity and high workload[18].
In 2016, the PointNet network model was proposed and could be directly used to handle point clouds[19], which effectively extracts global features of point clouds through multi-layer perceptrons (MLPs), and performs well in point cloud classification and segmentation tasks. Consequently, direct processing of point clouds became a new research focus[20], and some improved methods had been continually proposed for addressing point clouds, such as PointNet++[21], PointCNN[22], and DGCNN[23].
New methods for directly processing point cloud data have emerged, and at present they mainly focus on how to reduce processing losses and enhance classification accuracy. MFNet[24] introduces a RBFLayer to extract point cloud features, incorporating spatial location, geometric features of neighboring points, and feature correlations. This comprehensive approach effectively captures rich point cloud features. CSANet[25] constructs a point cloud feature extraction network by stacking cross self-attention network (CSA) modules and incorporates a multi-scale fusion (MF) module for feature fusion in the segmentation process. Multiscale receptive fields graph attention network (MRFGAT)[26] employs a multi-scale graph attention network to capture an extensive range of point cloud features, demonstrating its effectiveness in partial feature extraction. Methods based on convolutional neural networks tend to lose original point cloud information during data transformation and involve complex computations. On the other hand, methods based on PointNet can directly handle point cloud data, avoiding information loss during data transformation. However, PointNet and most of its derivatives often overlook information from neighboring points. Graph-based methods, while improving point cloud data processing, mainly focus on feature extraction, with limited consideration for global and local feature extraction and potential feature loss in the local feature extraction process.
This study proposes a novel graph attention convolutional network that combines attention mechanisms with graph convolution for local feature extraction from point clouds. This study also introduces a global attention module for capturing point cloud local contextual features and employ a feedback feature fusion module for local feature fusion, enabling point cloud classification and segmentation tasks. Our main contributions can be summarized as follows:
1) This study designs a graph attention convolution operation that focuses on both the center point and its neighboring points. By incorporating attention mechanisms into graph convolution, this study can effectively extracts features.
2) This study develops a feature fusion method by employing an error feedback mechanism, which can effectively reduce biases introduced during feature fusion.
3) This study introduces a global attention module for extracting global contextual features from the point cloud.
Experimental results conducted on standard datasets demonstrate that our method performs better than state-of-the-art methods in terms of model training, point cloud classification and segmentation.
The rests of this paper are organized as follows: Section 2 reviews existing work related to different methods. Section 3 provides a detailed description of our method's implementation. In Section 4, this paper discusses experimental results and introduce the details of ablation experiments. Finally, this paper gives a conclusion in Section 5.
In this section, this paper reviews some of the current methods for 3D point cloud analysis, which can be broadly categorized into two types: methods based on original point clouds and methods based on graph structures.
PointNet is a deep neural network that can directly process disordered point cloud data[19]. It uses multiple shared MLPs for each point for feature extraction, and then uses max-pooling to aggregate all learned point features into global features. Experiments have shown that PointNet has good computational performance and speed in 3D point cloud tasks, and it has the characteristics of light weight, high efficiency and speed, but it ignores the local characteristics of point clouds in 3D space.
Building upon PointNet, PointNet++ [21] introduces a hierarchical structure for feature extraction. It utilizes PointNet as a feature extractor and iteratively extracts features for small sets of local point clouds. This enables PointNet++ to learn more comprehensive features compared to PointNet. Despite its improved feature extraction performance in point cloud tasks, PointNet++ still processes individual points without considering context information between points, leading to high memory usage and computational burden. To address these limitations, Kd-Net[27] proposes using kd-tree indexing to represent point cloud sets and capture potential feature relationships between local points. Although Kd-Net shows significant performance improvements, it lacks robustness against rotated point clouds.
PointCNN[22] introduces hierarchical convolution to "regularize" the disordered point cloud by learning an X-transform (a K∗K matrix) on the point cloud, and at the same time combines with the traditional convolution operation to construct an X-Conv block to act on the local region for feature extraction, PointCNN fully considers the local information of the point cloud. PPFNet[28] represents the point cloud's coordinates, normal, and point pair features as point sets. It encodes different point sets and inputs them into a small PointNet to extract local feature information. By generalizing contrastive loss to n-tuple loss, PPFNet maximizes the use of available correspondence relations, thus improving invariance.
SO-Net[29] simulates the spatial distribution of point clouds by building a self-organizing map (SOM). It designs a point cloud auto-encoder to hierarchically extract features for individual points and SOM nodes, and controls the degree of overlap of the sensory fields by a tunable parameter K. Larger K values result in more overlap and richer learned feature information. It improves network performance in point cloud classification tasks.
The graph-based methods primarily treat the points in the point cloud as vertices in a graph, constructing graph data from the point cloud[30], which allows the features of point cloud to be learned directly in the spatial or spectral domain.
KCNet[31] first defines the point set kernel as a set of learnable 3D points, and then leverages the kernel correlation layer to learn the local 3D geometric structure of the point cloud by computing the affinity between neighboring point kernels within the kernel point set. The graph pooling layer utilizes the high-dimensional feature structure of local regions and performs multiple rounds of feature aggregation on the local neighborhood graph. KCNet enables learning of local geometric information while maintaining permutation invariance of 3D point clouds.
Most methods based on original points often do not consider the topological structure information of point clouds. To address this problem, DGCNN[23] designs an edge convolution operation by dynamically updating the graph structure of the point set in each convolutional layer. It performs convolution-like operations on point edges to learn local structures. Edge convolution can be embedded into existing point cloud processing frameworks. LDGCNN[32] optimizes the DGCNN network by connecting hierarchical features from different dynamic graphs. It adds shortcut connections between different edge convolution layers to learn local features, and cascading layer-wise features can alleviate the problem of gradient vanishing.
Res2Net[33] can deal with the challenge of information loss during dimension reduction by building a multi-scale CNN for point cloud classification. Chen et al.[34] propose a point neural network based on graph attention called GAPointNet, which uses the graph attention and multi-head mechanisms to capture local features. FatNet[35] stands out for its use of the same attention mechanism across two distinct forms of feature map aggregation. It also incorporates residual connections for enhanced information transfer between layers, enabling the training of deeper and more stable network architectures.
While previous studies have made significant contributions to enhancing local feature acquisition in point cloud data, they often concentrated on either global or local feature extraction without addressing the challenge of feature loss during fusion. In this study proposed method, this study takes a comprehensive approach by simultaneously extracting global and local feature information from 3D point clouds. Furthermore, this study designs a feature fusion method based on error feedback to capture additional local features.
In this section, this paper first gives the network framework and then introduce the global attentional feature module, the attention graph convolution module, and the error feedback feature fusion module. This study define the notations used in this paper as follows. The original point cloud instance is regarded as a collection of point clouds, which contains attributes such as coordinates, colors and normal vectors[36]. In this paper, the point clouds can be represented by the 3D coordinate attributes, which are described as (xi,yi,zi). Given a 3D point cloud P with N points, P={p1,p2,...,pN}∈RN×3, where the size of the point cloud is expressed as a matrix of N×3 and a single point can be noted as: {pi=(xi,yi,zi)|i=1,2,…,N}. The framework of our classification and segmentation model is shown in Figure 1.
Figure 1(a) shows the classification model with an n×3 point cloud as the input. Firstly, the k-nearest neighbors (KNN) algorithm is used to compute the k closest neighbors for each input point. Then, four AGConv layers are employed for local feature extraction (note that: some convolution processes are omitted). After each convolutional layer, the EFFM is applied for feature fusion. Additionally, a GAFM is utilized to learn global features of the point cloud. In the final EFFM module, local and global features are fused, generating classification scores for c categories. Figure 1(b) descripts the segmentation model, which is an extension of the classification model. Unlike the classification model, this study uses EFFM for feature fusion after three-layer convolution learning. The symbol ⊕ denotes feature concatenation. The segmentation model outputs classification scores for p semantic labels for each point.
For point pi in the input 3D point cloud instance P, this study uses the KNN algorithm to select the k nearest neighbors of the point pi, denoted by pi,j={pi,1,pi,2,...,pi,k},j∈k, to construct a local neighborhood graph, i.e., G(V,E), with as the center point of the vertex V={1,2,...,k}∈N and edge E={ei1,ei2,...,eik}. The process of constructing the neighborhood graph of KNN is shown in Figure 2.
Figure 2 illustrates the process of constructing local neighborhood graphs using the KNN within a point cloud instance. It calculates the Euclidean distance between each point to select its K closest neighboring points. The parameter k=5, the center node pi is marked in yellow, and the neighborhood points pi,j are marked in blue.
In the global attention feature module, to better aggregate effective feature information from the point cloud, this work considers the spatial relationships among all points. In the given instance, the Euclidean matrix is obtained by calculating the Euclidean distance between points along the three-dimensional spatial directions. The global attention feature module is shown in Figure 3.
This work calculates the Euclidean distance between points along each coordinate direction in the point cloud, and then use softmax to perform normalization to obtain the global normalized distance matrix. For a point cloud with N points, this work uses Eq (3.1) to calculate the Euclidean distance matrix of the points pi and pj in each coordinate direction:
D=N∑i=1N∑j=1[pxi−pxj,pyi−pyj,pzi−pzj]T, | (3.1) |
where pxi is the coordinate of the point pi in the x-axis direction, pyi is the coordinate of the point pi in the y-axis direction, and pzi is the coordinate of the point pi in the z-axis direction.
This work uses the softmax function to normalize the distance matrix calculated in Eq (3.1), and the normalized distance matrix is obtained according to Eq (3.2).
D′i,j=softmax(Dij)=exp(Dij)∑Nj=1exp(Dij), | (3.2) |
where Di,j is the distance between the points pi and pj, and D′ij is the normalized distance between pi and pj.
This work utilizes an MLP as a non-linear transformation function to compute the global features, as described in Eq (3.3).
Fgij=MLP(D′ij)=N∑j=1Wgij∗D′ij+bi, | (3.3) |
where Fgij is the global attention weight of points pi and pj, Wgij is the weight matrix and bi is the bias.
This work uses an MLP to learn and map the local point set Lp to obtain ~Lp, then matrix multiply the mapping result ~Lp with the global attention weights, and finally perform a concatenation operation with Lp to obtain the global features, which are defined by Eq (3.4).
Fg=(~Lp⊗Fgij)⊕Lp, | (3.4) |
where Fg are the global features.
Our attention graph convolution module consists of multiple attention graph convolution layers (AGConv). The AGConv layer is responsible for extracting local features from the point cloud while considering both the central point pi and its k neighboring points' features during the aggregation process. In the graph convolution, this work introduces an attention mechanism to obtain attention coefficients for each neighborhood point with respect to the central point. This allows us to learn more valuable information about local features. The process of the attention graph convolution operation is illustrated in Figure 4.
According to the attention mechanism, this work needs to convert neighborhood points pi,j and edges into the high-dimensional feature space to obtain enhanced representational capacity. This work uses a non-linear function g(.) to represent the neighborhood relationship of the central node pi, which is defined by Eq (3.5).
ei,j=g(pi,j,θ), | (3.5) |
where ei,j is the neighborhood relationship of the central node pi, pi,j is the neighborhood pj of the point pi, θ is a set of learnable parameters of the filter, and g(.) is a single-layer neural network.
According to the neighborhood relationship calculated in Eq (3.5), this work uses a nonlinear activation function to obtain the attention coefficient of the relationship ei,j between the center node pi and the neighborhood node pi,j, which is defined by Eq (3.6).
αij=σ(ei,j)=σ(g(ei,j,θ))=LeakyReLU(g(ei,j,θ)), | (3.6) |
where αij is the attention coefficient of pi and pi,j, and σ(.) is a non-linear activation function. Here this work chooses LeakyReLU as the activation function.
According to the attention mechanism, the attention coefficients assigned by neighbors at different points need to be aligned. This work uses softmax to normalize the attention coefficients αij of different ei,j to obtain the final attention coefficient, which is defined by Eq (3.7).
α′ij=softmax(αij)=exp(αij)∑k∈Nexp(αij), | (3.7) |
where α′ij is the attention coefficient after normalization.
This uses the normalization coefficient to calculate the contextual features of each point, which are defined by Eq (3.8).
^Fi=σ(∑j∈Nα′ijeij), | (3.8) |
where ^Fi is the contextual features of point pi and σ(.) is the non-linear activation function. This work chooses LeakyReLU as the activation function.
In previous research on point cloud classification, the common approach was to concatenate the extracted local features to form the global feature representation of the point cloud. However, simple feature concatenation may lead to the loss of some important information, resulting in classification performance deviating from the expected results. To address this issue, this work proposes a feature fusion module called the error feedback feature fusion module (EFFM) based on the error feedback mechanism[37]. The EFFM module is designed to better learn the contextual relationships among points in high-dimensional space. The feature fusion module is illustrated in Figure 5.
In the feature fusion module, this work first cascades the local features extracted by the two AGConv to obtain the initial fusion features. This process is defined by Eqs (3.9) and (3.10).
Fs=concat{Fl−1,Fl}, | (3.9) |
Fs=concat{F1,...,Fl}, | (3.10) |
where Fs is the initial fused feature, Fl is the graph feature extracted in the l-th layer and Fl−1 is the input feature from the previous layer. Equation (3.10) describes the EFFM process of fusion of multiple extracted features in a segmentation model.
Next, this work maps Fs to the high-dimensional feature space to obtain the geometric relationship of the local region in the high-dimensional space, and restore the initial fusion feature of the input by using reverse-projection or projection, which aims to map the feature back to the original feature space. After obtaining the reconstructed feature, this work computes the difference between the original fused feature and the reconstructed feature. This difference serves as the error signal, that is, the information lost during the initial fusion process. EFFM is used to constrain the feature learning process and minimize the deviation of the output feature as the training progresses. This process is defined by Eq (3.11).
F′s=fa(Fs),Fr=fb(F′s),△F=Fs−Fr, | (3.11) |
where F′s is the feature of Fs after the projection operation, fa(.) is the projection operation, fb(.) is the reverse-projection, Fr is the reconstructed feature and △F is the difference between the original fusion feature and the reconstructed feature.
Finally, a projection operation is performed on the error signal, and a cascade operation is performed with the projected F′s to obtain the final fusion feature Fc, which is defined by Eq (3.12).
Fc=F′s+fa(△F), | (3.12) |
Please note that in the feature fusion module EFFM, this work sets the size of the point cloud to be constant.
In the last layer of local features extracted by AGConv via EFFM, this work needs to perform the final fusion of global features extracted by GAFM as the final point cloud features and perform our classification task, which is defined by Eq (3.13).
Fall=EFFC(Fc,Fg), | (3.13) |
where Fall is the final feature of the point cloud.
In order to verify the performance of the proposed model, this study compares it with several state-of-the-art methods for point cloud classification and segmentation tasks. Extensive experiments are conducted on the two main datasets, ModelNet40 and ShapeNet[38]. The ModelNet40 dataset is well-structured and commonly used for point cloud classification tasks. On the other hand, the ShapeNet dataset contains point cloud data with complex backgrounds and missing points, making it suitable for part segmentation tasks in more challenging and complex scenes. Finally, to validate the effectiveness of each module in our proposed approach, this work also conducted ablation experiments accordingly. All experiments are conducted on an NVIDIA GTX 3060 GPU.
The experiments of point cloud shape classification are performed on the dataset ModelNet40, which contains 40 object categories such as airplanes, chairs and cars, consisting of 9843 samples in the training set and 2468 samples in the test set. During the experiment, this represents individual point clouds using their 3-dimensional coordinates (x, y, z), and sample 1024 points from the surface of each object model to create the input data.
In our classification model, this work employs four AGConv layers to extract local features from the point cloud. The output of AGConv1 is used as the input of AGConv2, the output feature F1 of AGConv1 and the output feature F2 of AGConv2 are combined in the EFFM module for feedback feature fusion to obtain the fusion feature F12. Next, F12 is used as the input of AGConv3, then this process is repeated until the final fused local feature Fc is obtained.
During the local feature learning process, this work utilizes the KNN algorithm to construct local neighborhood graphs for the input unordered point cloud. Experiments are conducted with different values of k to evaluate its impact on the classification results. Experimental results imply that setting k to 20 for the number of neighboring points around the center point yields favorable results.
The input consists of a set of 1024 points in three-dimensional space, which are transformed into a 32-dimensional embedding space. Multiple layers of AGConv are used for high-dimensional feature learning, and in the final EFFM module, the high-dimensional features learned are fused with the point cloud features extracted by the global attention feature module (GAFM). Then, a max pooling operation is performed to obtain the global features, which are then fed into an MLP layer for classification.
In the classification experiments, the initial learning rate is set to 0.001, the batch size is set to 32, and each training epoch is set to 250. The Adam optimizer is used for network training.
In point cloud classification experiments on the ModelNet40 dataset, our network model achieves superior performance compared to other methods under the same conditions, as shown in Table 1. This work uses the 1024 points represented by three-dimensional coordinates (x, y, z) as input. Extensive experimental results demonstrate that our overall accuracy of classification reaches 93.1%, indicating the effectiveness of our proposed model.
Method | Points | Mean class accuracy (%) | overall class accuarcy (%) |
kd-Net[31] | 1024 | 86.3 | 90.6 |
PointNet[19] | 1024 | 86.0 | 89.2 |
PointCNN[22] | 1024 | 88.1 | 92.2 |
PointConv[39] | 1024 | - | 92.5 |
KC-Net[31] | 1024 | - | 91.0 |
PointNet++[21] | 5000 | - | 91.9 |
So-Net[29] | 5000 + normal | 90.8 | 90.4 |
SpiderCNN[40] | 1024 + normal | - | 92.4 |
DGCNN[23] | 1024 | 88.9 | 91.7 |
Ours | 1024 | 91.2 | 93.1 |
To provide a comprehensive visual representation of our experimental results, this work plots a bar chart in Figure 6(a) to compare the classification performance of PointNet, DGCNN and our proposed method. Additionally, this work includes a red dashed line as a reference, which represents our method's mAcc (mean accuracy) as the baseline for comparison.
In Figure 6(b), this work demonstrates the training process effects of PointNet, DGCNN and our method. This work tracks the classification accuracy of PointNet, DGCNN and our method on the ModelNet40 dataset throughout 250 epochs, and the result shows that our accuracy reaches a relatively stable state within the first 50 epochs. Moreover, not only does our method achieve higher accuracy compared to the other models, but the overall fluctuation is also minimal, indicating a consistently good performance and a strong classification effect.
To further validate the rationality of our method, this work will conduct additional experiments by varying some crucial parameters that could potentially influence the experimental results.
This work constructs the local neighborhood graph of the point cloud using the KNN algorithm, where the value of k determines the construction of the neighborhood graph and affects the acquisition of local features. To evaluate the optimal k value within a certain range, while keeping other parameters constant, this work conducts experiments by varying the k value. The experimental results are presented in Table 2. Additionally, to provide a more intuitive understanding of the impact of different k values on our results, this work compares the classification accuracy with DGCNN in Figure 7. The experiments demonstrate that our method achieves better results when k=20, and in general, our method outperforms DGCNN when k∈[15,30].
k | oAcc (%) |
k=5 | 87.92 |
k=10 | 89.49 |
k=15 | 91.74 |
k=20 | 93.15 |
k=25 | 92.16 |
k=30 | 90.58 |
In the experiment of changing the k value, this work finds that the model achieves the best results when k=20. Therefore, in the next experiments, k is set to 20 by default. In addition to the influence of the k value on feature extraction, the input number of points will also affect the result of feature extraction. Under the condition of k=20 and the other parameters remaining unchanged, this work varies the input number of points from {256, 512, 768, 1024}, and the classification accuracy is shown in Figure 8.
This work evaluated the performance of our model on the ShapeNet dataset[38] for point cloud part segmentation. The ShapeNet dataset, as a popular benchmark for point cloud semantic segmentation, consists of 16 major categories and is annotated with 50 parts, totaling 16, 881 models, which are divided into training and testing sets. Unlike the classification experiments, in the ShapeNet dataset, each model samples 2048 points as the input.
In the segmentation model, this work employs three AGConv layers for feature extraction. Unlike the classification model, after the three AGConv layers, this work performs an additional feature fusion step. This fusion process includes features extracted by the three AGConv layers and the GAFM module. Except for the number of sampled points, the other parameters, such as batch size and activation functions, remain the same as in the classification model.
Similar to DGCNN, this work uses mIoU to evaluate the performance of our network for point cloud part segmentation. To better understand the part segmentation performance of different methods, this work categorizes the other compared methods into two types: CNNs and GNNs. The experimental results shown in Table 3 demonstrate that compared to five other state-of-the-art models[19,21,23,41,42], our model achieves a higher mIoU value and shows more competitive results on the ShapeNet dataset. Figure 9 presents some part segmentation results, where black circles are used to highlight areas containing obvious segmentation errors.
Categories | Method | mIoU | Air plane | Bag | Cap | Car | Chair | Ear phone | Guitar | Knife | Lamp | Laptop | Motor bike | Mug | Pistol | rocket | Skate board | Table |
CNNS | PointNet | 83.7 | 83.4 | 78.7 | 82.5 | 74.9 | 89.6 | 73.0 | 91.5 | 85.9 | 80.8 | 95.3 | 65.2 | 93.0 | 81.2 | 57.9 | 72.8 | 80.6 |
Pointnet++ | 85.1 | 82.4 | 79.0 | 87.7 | 77.3 | 90.8 | 71.8 | 91.0 | 85.9 | 83.7 | 95.3 | 71.6 | 94.1 | 81.3 | 58.7 | 76.4 | 82.6 | |
DGCNN | 85.2 | 84.0 | 83.4 | 86.7 | 77.8 | 90.6 | 74.7 | 91.2 | 87.5 | 82.8 | 95.7 | 66.3 | 94.9 | 81.1 | 63.5 | 74.5 | 82.6 | |
GNNS | 3D-GCN | 85.1 | 83.1 | 84.0 | 86.6 | 77.5 | 90.3 | 74.1 | 90.9 | 86.4 | 83.8 | 95.6 | 66.8 | 94.8 | 81.3 | 59.6 | 75.7 | 82.8 |
RGCNN | 84.3 | 80.2 | 82.8 | 92.6 | 75.3 | 89.2 | 73.7 | 91.3 | 88.4 | 83.3 | 96.0 | 63.9 | 95.7 | 60.9 | 44.6 | 72.9 | 80.4 | |
Ours | 85.4 | 83.1 | 86.6 | 86.5 | 80.3 | 90.4 | 72.9 | 91.2 | 86.0 | 81.7 | 95.4 | 67.2 | 94.3 | 75.4 | 65.2 | 74.9 | 82.8 |
In order to better assess the feasibility of our approach, this work conducts an ablation experiment on the ModelNet40 benchmark to analyze the effectiveness of each individual module in the 3D point cloud classification task. This work samples 1024 points from each point cloud instance as the input. Please note that, this work uses the optimal parameters determined from the classification experiments to evaluate the impact of different modules on classification accuracy. The experimental results presented in Table 4 provide insights into how each module affects the overall classification performance.
AGConv | GAFM | EFFM | oAcc (%) |
√ | 90.2 | ||
√ | √ | 90.7 | |
√ | √ | 92.4 | |
√ | √ | √ | 93.1 |
In this section, this work assesses the complexity of our model by analyzing parameters, FLOPs (floating-point operations), and execution time. To ensure fair and comparable experiments, all evaluations were conducted on the same hardware configuration, which consisted of an NVIDIA GTX 3060 GPU and an Intel i5-12400 CPU with 6 cores running at 2.50 GHz. During the experiments, this work utilized 1024 points as input, with a batch size set at 32 and an initial learning rate of 0.001. Based on prior experimentation, this work determined that an optimal value for k is 20, which is used to construct the local neighborhood graph. Our model, primarily focused on global feature extraction from point clouds, possesses a slightly higher number of model parameters compared to PointNet++ and DGCNN. However, in the broader context, our model offers several advantages. For a detailed comparison of model complexity, including parameters, FLOPs and execution time, please refer to Table 5. In the table, 'M' denotes 'million', 'MS' signifies 'milliseconds' and 'B' stands for 'billion'.
Method | Params (M) | Time (MS) | FLOPS (B) | Accuracy (%) |
Pointnet | 3.48 | 16.6 | 14.7 | 89.2 |
PointNet++ | 1.48 | 163.2 | 26.94 | 91.9 |
DGCNN | 1.84 | 27.2 | 42.71 | 92.2 |
Ours | 1.97 | 26.0 | 24.4 | 93.1 |
This work proposed a novel AGConv, which incorporates attention mechanisms into graph convolutional layers for learning local features of points. Additionally, this work introduced a GAFM module to capture global features of the point cloud, effectively considering both local and global information. Unlike previous methods that used concatenation to fuse features, which could lead to feature loss, this work introduced an EFFM module that learns an error signal to mitigate this issue. Extensive experiments demonstrated the effectiveness of our proposed method in learning both local and global features of point clouds. Compared to state-of-the-art methods, it effectively reduces feature loss during the fusion process and exhibits excellent performance in point cloud classification and segmentation tasks. Furthermore, our method shows fast convergence during training. The method proposed in this paper has been experimentally compared and can improve the classification and segmentation accuracy of point clouds to some extent. When creating a local neighborhood graph of a point cloud, this work experiments repeatedly and select the graph build scale with the best experimental results. This work only use the coordinates of the point cloud as input, ignoring the effects of other features of the point cloud on the experimental results. In future work, we will focus on the study of adaptive graph building methods and consider applying our methods to the classification and segmentation of large point clouds.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
This work was supported by the National Natural Science Foundation of China under Grant No.62362016, the Fundation on Enhancement of Basic Research Ability of Young and Middle-aged Teachers in Guangxi Universities and Colleges under Grant No.2022KY0788 and the Natural Science Foundation of Guilin University of Aerospace Technology under Grant No.XJ21KT21.
The authors declare there is no conflicts of interest.
1. | Jianhua Wang, Wenping Fan, Xueyan Song, Guobiao Yao, Mengmeng Bo, Ze Liu, NLA-GCL-Net: semantic segmentation of large-scale surveying point clouds based on neighborhood label aggregation (NLA) and global context learning (GCL), 2024, 38, 1365-8816, 2325, 10.1080/13658816.2024.2382273 |
Method | Points | Mean class accuracy (%) | overall class accuarcy (%) |
kd-Net[31] | 1024 | 86.3 | 90.6 |
PointNet[19] | 1024 | 86.0 | 89.2 |
PointCNN[22] | 1024 | 88.1 | 92.2 |
PointConv[39] | 1024 | - | 92.5 |
KC-Net[31] | 1024 | - | 91.0 |
PointNet++[21] | 5000 | - | 91.9 |
So-Net[29] | 5000 + normal | 90.8 | 90.4 |
SpiderCNN[40] | 1024 + normal | - | 92.4 |
DGCNN[23] | 1024 | 88.9 | 91.7 |
Ours | 1024 | 91.2 | 93.1 |
k | oAcc (%) |
k=5 | 87.92 |
k=10 | 89.49 |
k=15 | 91.74 |
k=20 | 93.15 |
k=25 | 92.16 |
k=30 | 90.58 |
Categories | Method | mIoU | Air plane | Bag | Cap | Car | Chair | Ear phone | Guitar | Knife | Lamp | Laptop | Motor bike | Mug | Pistol | rocket | Skate board | Table |
CNNS | PointNet | 83.7 | 83.4 | 78.7 | 82.5 | 74.9 | 89.6 | 73.0 | 91.5 | 85.9 | 80.8 | 95.3 | 65.2 | 93.0 | 81.2 | 57.9 | 72.8 | 80.6 |
Pointnet++ | 85.1 | 82.4 | 79.0 | 87.7 | 77.3 | 90.8 | 71.8 | 91.0 | 85.9 | 83.7 | 95.3 | 71.6 | 94.1 | 81.3 | 58.7 | 76.4 | 82.6 | |
DGCNN | 85.2 | 84.0 | 83.4 | 86.7 | 77.8 | 90.6 | 74.7 | 91.2 | 87.5 | 82.8 | 95.7 | 66.3 | 94.9 | 81.1 | 63.5 | 74.5 | 82.6 | |
GNNS | 3D-GCN | 85.1 | 83.1 | 84.0 | 86.6 | 77.5 | 90.3 | 74.1 | 90.9 | 86.4 | 83.8 | 95.6 | 66.8 | 94.8 | 81.3 | 59.6 | 75.7 | 82.8 |
RGCNN | 84.3 | 80.2 | 82.8 | 92.6 | 75.3 | 89.2 | 73.7 | 91.3 | 88.4 | 83.3 | 96.0 | 63.9 | 95.7 | 60.9 | 44.6 | 72.9 | 80.4 | |
Ours | 85.4 | 83.1 | 86.6 | 86.5 | 80.3 | 90.4 | 72.9 | 91.2 | 86.0 | 81.7 | 95.4 | 67.2 | 94.3 | 75.4 | 65.2 | 74.9 | 82.8 |
AGConv | GAFM | EFFM | oAcc (%) |
√ | 90.2 | ||
√ | √ | 90.7 | |
√ | √ | 92.4 | |
√ | √ | √ | 93.1 |
Method | Params (M) | Time (MS) | FLOPS (B) | Accuracy (%) |
Pointnet | 3.48 | 16.6 | 14.7 | 89.2 |
PointNet++ | 1.48 | 163.2 | 26.94 | 91.9 |
DGCNN | 1.84 | 27.2 | 42.71 | 92.2 |
Ours | 1.97 | 26.0 | 24.4 | 93.1 |
Method | Points | Mean class accuracy (%) | overall class accuarcy (%) |
kd-Net[31] | 1024 | 86.3 | 90.6 |
PointNet[19] | 1024 | 86.0 | 89.2 |
PointCNN[22] | 1024 | 88.1 | 92.2 |
PointConv[39] | 1024 | - | 92.5 |
KC-Net[31] | 1024 | - | 91.0 |
PointNet++[21] | 5000 | - | 91.9 |
So-Net[29] | 5000 + normal | 90.8 | 90.4 |
SpiderCNN[40] | 1024 + normal | - | 92.4 |
DGCNN[23] | 1024 | 88.9 | 91.7 |
Ours | 1024 | 91.2 | 93.1 |
k | oAcc (%) |
k=5 | 87.92 |
k=10 | 89.49 |
k=15 | 91.74 |
k=20 | 93.15 |
k=25 | 92.16 |
k=30 | 90.58 |
Categories | Method | mIoU | Air plane | Bag | Cap | Car | Chair | Ear phone | Guitar | Knife | Lamp | Laptop | Motor bike | Mug | Pistol | rocket | Skate board | Table |
CNNS | PointNet | 83.7 | 83.4 | 78.7 | 82.5 | 74.9 | 89.6 | 73.0 | 91.5 | 85.9 | 80.8 | 95.3 | 65.2 | 93.0 | 81.2 | 57.9 | 72.8 | 80.6 |
Pointnet++ | 85.1 | 82.4 | 79.0 | 87.7 | 77.3 | 90.8 | 71.8 | 91.0 | 85.9 | 83.7 | 95.3 | 71.6 | 94.1 | 81.3 | 58.7 | 76.4 | 82.6 | |
DGCNN | 85.2 | 84.0 | 83.4 | 86.7 | 77.8 | 90.6 | 74.7 | 91.2 | 87.5 | 82.8 | 95.7 | 66.3 | 94.9 | 81.1 | 63.5 | 74.5 | 82.6 | |
GNNS | 3D-GCN | 85.1 | 83.1 | 84.0 | 86.6 | 77.5 | 90.3 | 74.1 | 90.9 | 86.4 | 83.8 | 95.6 | 66.8 | 94.8 | 81.3 | 59.6 | 75.7 | 82.8 |
RGCNN | 84.3 | 80.2 | 82.8 | 92.6 | 75.3 | 89.2 | 73.7 | 91.3 | 88.4 | 83.3 | 96.0 | 63.9 | 95.7 | 60.9 | 44.6 | 72.9 | 80.4 | |
Ours | 85.4 | 83.1 | 86.6 | 86.5 | 80.3 | 90.4 | 72.9 | 91.2 | 86.0 | 81.7 | 95.4 | 67.2 | 94.3 | 75.4 | 65.2 | 74.9 | 82.8 |
AGConv | GAFM | EFFM | oAcc (%) |
√ | 90.2 | ||
√ | √ | 90.7 | |
√ | √ | 92.4 | |
√ | √ | √ | 93.1 |
Method | Params (M) | Time (MS) | FLOPS (B) | Accuracy (%) |
Pointnet | 3.48 | 16.6 | 14.7 | 89.2 |
PointNet++ | 1.48 | 163.2 | 26.94 | 91.9 |
DGCNN | 1.84 | 27.2 | 42.71 | 92.2 |
Ours | 1.97 | 26.0 | 24.4 | 93.1 |