
Citation: Alessia Civallero, Cristina Zucca. The Inverse First Passage time method for a two dimensional Ornstein Uhlenbeck process with neuronal application[J]. Mathematical Biosciences and Engineering, 2019, 16(6): 8162-8178. doi: 10.3934/mbe.2019412
[1] | Nilkanth Mukund Deshpande, Shilpa Gite, Biswajeet Pradhan, Ketan Kotecha, Abdullah Alamri . Improved Otsu and Kapur approach for white blood cells segmentation based on LebTLBO optimization for the detection of Leukemia. Mathematical Biosciences and Engineering, 2022, 19(2): 1970-2001. doi: 10.3934/mbe.2022093 |
[2] | Kun Lan, Jianzhen Cheng, Jinyun Jiang, Xiaoliang Jiang, Qile Zhang . Modified UNet++ with atrous spatial pyramid pooling for blood cell image segmentation. Mathematical Biosciences and Engineering, 2023, 20(1): 1420-1433. doi: 10.3934/mbe.2023064 |
[3] | Macarena Boix, Begoña Cantó . Using wavelet denoising and mathematical morphology in the segmentation technique applied to blood cells images. Mathematical Biosciences and Engineering, 2013, 10(2): 279-294. doi: 10.3934/mbe.2013.10.279 |
[4] | Tingxi Wen, Hanxiao Wu, Yu Du, Chuanbo Huang . Faster R-CNN with improved anchor box for cell recognition. Mathematical Biosciences and Engineering, 2020, 17(6): 7772-7786. doi: 10.3934/mbe.2020395 |
[5] | Xi Lu, Zejun You, Miaomiao Sun, Jing Wu, Zhihong Zhang . Breast cancer mitotic cell detection using cascade convolutional neural network with U-Net. Mathematical Biosciences and Engineering, 2021, 18(1): 673-695. doi: 10.3934/mbe.2021036 |
[6] | Hua Wang, Weiwei Li, Wei Huang, Jiqiang Niu, Ke Nie . Research on land use classification of hyperspectral images based on multiscale superpixels. Mathematical Biosciences and Engineering, 2020, 17(5): 5099-5119. doi: 10.3934/mbe.2020275 |
[7] | Qingwei Wang, Xiaolong Zhang, Xiaofeng Li . Facial feature point recognition method for human motion image using GNN. Mathematical Biosciences and Engineering, 2022, 19(4): 3803-3819. doi: 10.3934/mbe.2022175 |
[8] | Mohamed A Elblbesy, Mohamed Attia . Optimization of fractal dimension and shape analysis as discriminators of erythrocyte abnormalities. A new approach to a reproducible diagnostic tool. Mathematical Biosciences and Engineering, 2020, 17(5): 4706-4717. doi: 10.3934/mbe.2020258 |
[9] | Katrine O. Bangsgaard, Morten Andersen, Vibe Skov, Lasse Kjær, Hans C. Hasselbalch, Johnny T. Ottesen . Dynamics of competing heterogeneous clones in blood cancers explains multiple observations - a mathematical modeling approach. Mathematical Biosciences and Engineering, 2020, 17(6): 7645-7670. doi: 10.3934/mbe.2020389 |
[10] | Rafsanjany Kushol, Md. Hasanul Kabir, M. Abdullah-Al-Wadud, Md Saiful Islam . Retinal blood vessel segmentation from fundus image using an efficient multiscale directional representation technique Bendlets. Mathematical Biosciences and Engineering, 2020, 17(6): 7751-7771. doi: 10.3934/mbe.2020394 |
White blood cells (WBCs), also known as leukocytes, play a pivotal role in the immune system's defense against various infections. Accurate quantification and classification of WBCs can provide valuable insights for diagnosing a wide range of diseases, including infections, leukemia [1], and AIDS [2]. In the laboratory environment, the traditional examination of blood smears was a laborious and time-consuming manual task. However, with the advent of computer-aided automatic cell analysis systems, rapid and high-throughput image analysis tasks can now be accomplished [3]. Some automatic recognizing system of white blood cells typically entails three major steps: Image acquisition, cell segmentation and cell type classification. Among these steps, cell segmentation is widely recognized as the most crucial and challenging one, as it significantly influences the accuracy and computational complexity of subsequent processes [4]. Some segmentation-free methods take the whole image as input to the classifier without extracting region of interest (ROI) [5,6,7].
Accurately segmenting WBCs in cell images, thereby distinguishing between lymphocytes, monocytes, eosinophils, basophils, and neutrophils as shown in Figure 1, provides a wealth of crucial information for hematological diagnostics [8,9]. However, achieving high-quality images requires careful consideration of various factors, including image resolution, exposure duration, illumination levels, and the proper utilization of optical filters. If inappropriate factors are chosen in the imaging process, it can adversely affect image quality, thereby posing challenges for analyzing WBC images.
Deep learning, particularly convolutional neural networks (CNNs), has revolutionized medical image segmentation [10]. The U-shaped network (U-Net) [11], a symmetrical encoder-decoder convolutional network featuring skip connections, stands as a prime example. The U-Net has gained significant popularity in medical image processing, especially for datasets with limited samples. Extensive research has demonstrated the effectiveness of this architecture in extracting multi-scale image features [12].
Subsequent iterations, such as U-Net++ [13] and U-Net3+ [14], have been proposed to further enhance performance. U-Net++ introduces nested and dense skip connections to address the semantic gap and incorporates deep supervision learning techniques to improve segmentation performance [13]. Regularized U-Net (RU-Net) [15] proposed a new activation function with piecewise smooth effect to solve the under-segmentation problem. There have been various advancements in CNN-based architectures for image segmentation. One notable example is the multiscale information fusion network (MIF-Net), which incorporates the boundary splitter and information fusion mechanisms using strided convolutions [16]. These techniques contribute to improved segmentation accuracy. In parallel, Transformer-based approaches, such as Vision Transformer (ViT) [17] and Swin Transformer [18], have emerged, leveraging self-attention mechanisms for better feature extraction. Specifically, the ViT partitions images into nonoverlapping patches and treats the patches as sequence data, where the self-attention mechanism is subsequently used to extract long-range information among patches. Furthermore, the Swin Transformer applies a shifted window to make ViT more computationally efficient. Though the original ViT model exhibits significantly better performance on large objects, it obtains lower performance on small objects [19]. This limitation might arise from the fixed scale of patches generated by transformer-based methods. A potential solution to enhance the small object detection (SOD) capability is to explore more refined patch sizes, which might increase the computational cost. Notably, U-Net architectures have also incorporated transformers, yielding U-Net Transformer [20], Medical Transformer [21], and Swin-Unet [22]—all of which have set new performance benchmarks in medical image segmentation. However, these architectures, rooted in pixel-based learning demand substantial memory resources, leading to inefficiencies, especially when available training samples are scanty [23]. Here, large models might face narrowing expressivity gaps against parameter-efficient counterparts. To mitigate this, embedding prior knowledge can reduce the computational burden.
Graph structure data provides an elegant way off describing the geometry of data, which contains abundant relational information. For example, diverse types of relational systems or structured entities can be described by graphs to include the interior connections, where some typical examples include particle system analysis [24,25], social networks [26], and molecular properties prediction [27]. Correspondingly, graph neural networks (GNNs) are specifically designed to process graph data [28,29,30,31,32], where researchers have developed graph convolutional networks (GCNs) and various variants to update node features by aggregating information from neighboring nodes.
The transformation of image data, particularly those without an inherent geometric structure, into graph data, represents a substantial challenge. This challenge is twofold: Encoding Euclidean space data into graph representations and decoding them back to their original image domain. A prevalent approach to address this involves the use of a patch graph method, where image patches are treated as graph nodes. For example, the Graph-FCN [33] applies a fully convolutional network (FCN) to extract image features, and the graph structure is constructed based on the k nearest neighbor (kNN) methods where the weight adjacent matrix is generated with the Gaussian kernel function. In [34], the dual graph convolution network (DGCNet) constructs the graph structure not only on the spatial domain but also on the feature domain. In the semantic segmentation task, the bilinear interpolation upsampling operation acted on the downsampled output of the DGCNet to recover the same image size as the label. There has also been recent work aiming to combine the local feature extraction ability of CNNs with the long-range interaction ability of GNNs. The vision graph U-Net (VGU-Net) model was proposed to construct multi-scale graph structures, enhancing the model's learning capacity [35].
However, while the patch-based method offers convenience in graph construction, it has its limitations. The fixed structure of image patches can lead to the omission of critical boundary details. An alternative lies in the superpixel approach. Superpixels, by design, can dramatically lower both computational and memory costs for image processing. Since image superpixel can significantly reduce the computational and memory overhead for image processing tasks, superpixel methods are commonly implemented as a preprocessing step before the deep reasoning models [36,37,38,39,40,41,42,43]. Various superpixel methods over-segment the image into multiple nonoverlapped regions based on the pixel features and homogeneous pixels are grouped inside single superpixel. Traditional superpixel generation method can be roughly divided into graph-based [44,45,46] and clustering-based [47,48,49,50] methods. These methods are efficient and fast to generate high-quality superpixels and require no human label and less memory in computing. Recently, the deep learning-based approaches are employed in superpixel sampling [51,52,53,54,55]. These methods are accurate but not efficient in memory saving since learning high-level image features requires a relatively large amount of convolution kernel parameters. Based on the pre-computed superpixel graph and GNN, [56] captures global feature interactions for brain tumor segmentation. In [57], superpixel-based graph data and an edge-labeling graph neural network (EGNN) [58] are implemented for biological image segmentation.
Medical image segmentation has long depended on the precision achieved by supervised learning methods. Yet, the perennial issue remains; that is, the paucity of richly labeled datasets in clinical contexts. This limitation has driven the pivot to metric learning, which disrupts the entrenched belief that robust intelligence is the sole preserve of abundant labeled data [59,60,61,62]. Metric learning approaches, such as contrastive methods, learn representations in a discriminative manner by contrasting positive sample pairs against negative pairs. By tapping into vast reservoirs of unlabeled samples, they set the stage for pretraining deep learning models. The subsequent phase involves meticulous fine-tuning, utilizing just a fraction of labeled samples. Remarkably, the outcome is a model performance that stands shoulder to shoulder with traditional supervised strategies.
Notably, there's a burgeoning interest in supervised metric learning methods, specifically tailored to unravel cross-image intricacies. These techniques use sample labels as the blueprint to categorize them into positive and negative sets [63]. The confluence of metric learning methods offers deep learning models a unique advantage. By bridging labeled and unlabeled data, they are empowered to deliver stellar results, even when navigating the constraints of scantily labeled samples.
In this work, we propose a novel approach for WBCs segmentation, namely superpixel metric graph neural network (SMGNN). The core strength of SMGNN lies in its dual promise: delivering unparalleled accuracy while simultaneously optimizing memory efficiency. The foundation of our technique is a superpixel graph constructed from image data. This restructuring drastically diminishes the problem's dimensionality and serves as a conduit for infusing abundant prior information into the graph data. In addition to leveraging prior knowledge on a single training sample, our proposed approach introduces superpixel metric learning to capture "global" context across the training samples. In clinical image segmentation scenarios with limited training samples, we believe incorporating this "global" context can enhance the expressivity of deep learning models. Our proposed metric learning operates on the superpixel embeddings rather than the vast number of pixel embeddings, which offers the advantage of memory saving.
The contributions of this paper can be summarized as follows.
● Our proposed lightweight SMGNN significantly reduces the learnable parameters by at most 10000 times compared with mainstream segmentation models.
● Our proposed superpixel-based model reduces the problem size and poses rich prior knowledge to the rarely considered graph structure data, which helps SMGNN achieve state of the art (SOTA) performance on WBC images.
● We innovatively propose superpixel metric learning according to the definition of superpixel metric score, which is more efficient than pixel-level metric learning.
● The whole deep learning-based nucleus and cytoplasm segmentation and cell type classification system is accurate and efficient to execute in hematological laboratories.
The remaining sections of this article are structured as follows. Section 2 describes our methodology in depth. Section 3 depicts the workflow and architecture of our proposed model. In Section 4, through extended segmentation experiments, the SMGNN model achieves SOTA segmentation performance in terms of both accuracy and memory efficiency. The cell type classification task is conducted with a lightweight residual network (ResNet) based on the segmentation result. The whole procedure of the proposed automatic recognition system is shown as Figure 3.
Deep learning methods for medical image processing have predominantly concentrated on discerning the local context, which refers to inter-pixel dependencies within individual images [63]. However, there's a missed opportunity: Capturing the "global" context that exists between training samples. While pixel-level contrast or metric learning provides a way to bridge this gap, the sheer computational and memory overheads—due to contrast or metric computations spanning every pixel pair—render them less feasible. We propose an innovative efficient superpixel-level metric learning on metric loss, which not only captures the desired global context but does so while drastically cutting computational and memory costs.
The utility of superpixel methods in image data preprocessing is well acknowledged, particularly for their ability to condense data and reduce computational demands. Consequently, this study capitalizes on these advantages, transforming the image data into a more compact graph representation. Within this framework, each superpixel evolves into a graph node. The interconnectedness of these nodes—whether driven by spatial positioning or feature similarity—determines the graph's topology. These node features aren't rigid; their definition can range from basic five-dimensional attributes encompassing color (three dimensions) and location (two dimensions) to more intricate data points like histograms, positional variance and variations in pixel values.
Given that the adjacency matrix exhibits sparse characteristics, it's predominantly the node features that dictate memory consumption. Opting for the more straightforward features facilitates significant data compression. Specifically, for three-channel red-green-blue (RGB) images, we've discerned a compression ratio roughly represented as c=graph dataimage data≈5K3n. Importantly, this efficient compression does not compromise on quality. Our subsequent numerical experiments demonstrate that this ratio is concomitant with optimal segmentation outcomes.
The segmentation task relies on the quality of the generated superpixels, and good superpixel results coherent with the boundary of labeled images. Suppose X∈RH×W×3 is the input image. Let V be the set of all superpixels, N=H×W be the number of pixels, K=|V| be the number of superpixels, and Q∈RN×K be the association matrix between pixels and superpixels, then we have
Qi,j={1, if ˆXi∈Vj0, otherwise ,ˆX=Flatten(X)∈RN×3, |
where Vj is the jth superpixel. Flatten operation converts a two-dimensional image matrix into a one-dimensional vector. The role of the association matrix builds the bridge between image space and graph space. For computation convenience, we define the column normalized association matrix as
¯Qi,j={1/[Sj], if ˆXi∈Vj0, otherwise ,[Sj]=∑x∈Vj1. |
Let Y∈RN be the label of the image pixels and Y∈RK be superpixel metric or metric score of the graph data. Using the pixel label, we can formulate metric score as
Yj=∑x∈VjYx∑x∈Vj1, | (2.1) |
for the jth superpixel. Y can also be efficiently computed with column normalized association matrix, i.e., Y=¯QTY. We can back-project the supepixel label to image space by association matrix, i.e., ¯Y=QY. We define the intersection of union (IoU) reconstruction score to evaluate the quality, which reads
IoUr=1cc∑i=0{x|¯Yx=i}∩{x|Yx=i}{x|¯Yx=i}∪{x|Yx=i}, | (2.2) |
where {x|Yx=i} denotes the set of pixels whose class label equals i and c is the number of classes.
GNNs have the advantage of being lightweight compared to other deep learning models that often require deeper and larger networks to achieve higher performance. GNNs leverage relational information and utilize shallow layers to achieve satisfactory results.
Given an undirected attributed graph G=(V,E,S), G consists of a nonempty finite set of K=|V| nodes V and a set of edges E between node pairs. Denote A∈RK×K the graph adjacency matrix and S∈RK×d the node attributes. A graph convolution learns a matrix representation H that embeds the structure A and feature matrix S={Sj}Nj=1 with Sj for node j. Most graph convolutions follow the message passing [64] update scheme, which finds a central node's smooth representation by aggregating its 1-hop neighbor information. At layer ℓ, the propagation for the ith node reads
Hℓi=γ(Hℓ−1i,◻j∈N(i)ϕ(Hℓ−1i,Hℓ−1j,Aij)),H0=S | (2.3) |
where ◻(⋅) is a differentiable and permutation invariant aggregation function, such as summation, average, or maximization. The set N(i) includes Vi and its 1-hop neighbors. Both γ(⋅) and ϕ(⋅) are differentiable aggregation functions, such as multilayer perceptrons (MLPs).
In our approach, we construct graph data using superpixels and their predefined relationships. GNNs learn features from the graph space to enhance the segmentation capability. We employ the graph isomorphism network (GIN) [32] as the backbone graph representation network. At each layer ℓ, the GIN model updates the ith node representation as follows:
H(ℓ)i=MLP(ℓ)((1+w(ℓ))⋅H(ℓ−1)i+∑j∈N(i)H(ℓ−1)j), | (2.4) |
where w is a learnable weight.
GIN has been proven to possess expressive power equivalent to the 2-Weisfeiler-Lehman test [65]. By utilizing GIN, we can effectively extract informative features from the superpixel graph, enabling accurate and efficient segmentation performance.
Although pixel-wise contrast can learn the global context to form a good segmentation embedding space [63], computing the contrastive loss requires using training image pixels, which leads to a significant amount of computation and memory cost. In this study, we propose superpixel-based methods that can significantly reduce the number of data samples from N to K, and we introduce a memory-efficient distance-based metric loss function.
The fundamental concept of metric learning is to bring similar samples closer together in the embedding space while pushing dissimilar samples further apart. However, pixel-wise contrast methods that involve setting a large number of anchor pixels and using tensor multiplication to compute positive similarity incurs high memory costs. To address this, we define the superpixel metric loss using the mean square error (MSE) between the similarity and metric score of the embeddings, as follows:
LSM(A,R)=MSE(SIM(A,R),Metric(A,R)), | (2.5) |
SIMi,j(A,R)=a⋅cos(Ai,Rj)+b,SIM∈Rn1×n2, | (2.6) |
Metrici,j(A,R)=Y(Ai)+Y(Rj)2,Metric∈Rn1×n2, | (2.7) |
where n1 and n2 represent the number of anchor samples A∈Rn1×d and reference samples R∈Rn2×d, respectively. Here, d denotes the dimension of the embedding, and a and b are learnable parameters. Additionally, Y(x) represents the superpixel label of node x, which is defined by Eq (2.1). It's important to note that the anchor/reference samples are not restricted to being from the same image A. The objective of Eq (2.5) is to bring the embeddings of similar superpixel samples closer together and push dissimilar ones apart.
We use the parameter-free methods to generate superpixels [47,66], on which we construct graph structure with two alternative strategies. Each superpixel is treated as a node and the mean RGB values and mean postion value consist five dimension node features. Suppose the mean scale of superpixels S=H×WK. We can define the adjacency between nodes according to their positional relation as
Ei,j={1,|xi−xj|+|yi−yj|<α√S0, otherwise , |
where (xi,yi) and (xj,yj) are the positions of superpixel i and superpixel j respectively, and α≥2 is a hyperparameter to control the number of neighboring nodes. The above definition will pose strong local connectivity to the graph data. For batched images, we can use parallel computing to accelerate the graph generation process and multiple subgraphs to combine as a large graph, where only connected nodes can perform message passing with the GNNs.
Regarding the model architecture, we utilize three layers of GIN to generate the embeddings of superpixels, upon which metric learning is performed. In addition to the transformed graph data derived from superpixels, we retain the original image data. We concatenate the features of superpixels and pixels and pass them through a lightweight CNN to smooth out small pixel groups in the output of the GNNs. This process helps enhance the segmentation accuracy by incorporating nondegraded image information. The overall architecture of our proposed model, named SMGNN, is illustrated in Figure 5.
To tackle the clinical image segmentation, we employ the Dice loss [67], which is a structure-aware and widely used loss function for medical image segmentation. This loss function is designed to measure the similarity between predicted and ground truth segmentation masks. We also use the Dice coefficient to evaluate the performance of different models [67], which a widely used metric on image segmentation. To be more specific, given a set G, we define its characteristic/label function by ιG(i)={1,i∈G0,o.w.. The Dice coefficient of two sets G and ˆG is defined as
Dice(G,ˆG)=2∑i∈ΩιG(i)⋅ιˆG(i)∑i∈Ω(ιG(i)+ιˆG(i)), | (3.1) |
where Ω indicates the domain containing the two sets. The Dice metric is also directly used as a loss function to train a supervised segmentation task. The Dice loss function is formulated as
LDice(G,ˆG)=−2∑i∈ΩιG(i)⋅ιˆG(i)∑i∈Ω(ιG(i)+ιˆG(i)). | (3.2) |
The joint loss function for both the superpixel metric learning and segmentation tasks is defined as a combination of the Dice loss and the superpixel metric loss. This joint loss function enables us to optimize the model parameters simultaneously for both tasks, effectively leveraging the benefits of both superpixel-based metric learning and pixel-wise segmentation. The joint loss function is formulated as
LJoint=LDice+λLSM, | (3.3) |
where λ≥0 is a hyperparameter to trade off the learning on image space and graph space. Empirically, as shown on Figure 4, we set λ=0.1 in the numerical experiments for better performance based on the extended experiments on the searching space [0,0.1,0.5,1,10].
We employed a widely used WBCs dataset to evaluate the effectiveness of our proposed recognizing system. We employed the widely-acclaimed Adam optimization technique [68] for model training during the backpropagation phase. The implementations are programmed with PyTorch-Geometric (version 2.2.0) and PyTorch (version 1.12.1) and executed on an NVIDIA Tesla A100 GPU with 6,912 CUDA cores and 80GB HBM2 installed on an HPC cluster.
We verify the robustness of our methods on two WBC image datasets. The first dataset (dataset-1) originates from Jiangxi Tecom Science corporation of China [69], which contains 300 120×120 color images (176 neutrophils, 22 eosinophils, 1 basophils, 48 monocytes and 53 lymphocytes). Dataset-2 contains 100 300×300 color images (30 neutrophils, 12 eosinophils, 3 basophils, 18 monocytes and 37 lymphocytes). The second dataset (dataset-2) is publicly available on CellaVision blog * and widely used to conduct leukocyte research. These WBCs datasets leverage three-channel RGB images, which are processed via neural networks in an end-to-end training regimen. Each WBC image is manually-labeled, marking three primary regions: Nuclei (represented in white), cytoplasm (depicted in gray), and the surrounding peripheral blood (captured in black). The number of training/validation/testing data is 80/10/10% of the total numbers and the Dice loss function is applied to train the segmentation model. The dataset comprises five different cell types, staining effect, and illumination conditions which causes large variations in the sample distribution.
To efficiently over-segment our input images, we adopted the simple non-iterative clustering (SNIC) superpixel generation methodology [66]. A distinctive feature of SNIC is its ability to visit each pixel just once—with the sole exception being those situated on superpixel boundaries. The computational traversal is characterized by the total number of pixels, N, augmented by a variable dictated by the desired superpixel count, K. Such a design renders SNIC more computationally nimble compared to alternatives like simple linear iterative clustering (SLIC) [47].
The superpixel quality, and its potential ramifications on segmentation, is an aspect we delve deeply into. By modulating the number of superpixels, we could ascertain its influence. For instance, the WBCs dataset results (illustrated by the red trajectory in Figure 6) signify that as the granularity of the superpixel method amplifies, there's a corresponding upswing in the mIoU (mean Intersection over Union) score. Balancing optimal segmentation outcomes with computational practicality, we've pegged the mean scale of superpixels (S) at 16 for all ensuing experiments.
A particularly intricate aspect of the WBCs dataset segmentation is the differentiation between the cytoplasm and the nuclei. Several images from dataset-2 present nuclei that are suboptimally stained, leading to a coloration reminiscent of the cytoplasm. Ideally, accurate segmentation demands that the representation of the nucleus be a cohesive, uninterrupted region. This color overlap often ensnares traditional CNN-based segmentation models, like U-Net, Attention U-Net and MIF-Net, resulting in predicted segmentations marred by interruptions or holes. The RU-Net demonstrates the ability to preserve piecewise constant nuclei regions; however, it tends to over-segment the cytoplasm region, which hinders overall segmentation performance. On the other hand, the Swin U-Net leverages the self-attention mechanism of the Transformer model and shows promising segmentation results. Nevertheless, the Swin U-Net's large number of parameters hampers training efficiency and requires significant computational resources. In comparison, the VGU-Net achieves a balance between efficiency and effectiveness, although it still utilizes a considerable number of parameters compared to the SMGNN. The proposed SMGNN model only utilizes about 7,000 parameters and takes an innovative approach by bolstering the connectivity of adjacent superpixels. Therefore, it is efficient in learnable parameters and proficient in preserving the integrity of the nucleus region, where the capability is vividly showcased in Figure 8. Compared to those end-to-end segmentation models, the SMGNN model takes a preprocessing step to cluster homogeneous pixels into superpixels and constructs graph structured data. We take the SNIC algorithm, which is non-iterative, requires less memory, is faster, and yet is a simpler superpixel segmentation algorithm. This step might cost some computational time, but the SMGNN segmentation model is very efficient. We compare the computational time of different methods in Table 3.
In Figure 9, we show the Dice performance and number of learnable parameters of different deep learning models. Our SMGNN model can reach the SOTA performance while using far fewer parameters. In Tables 1 and 2, we show the quantitative comparison of these mainstream baseline segmentation models using the Dice coefficient, Hausdroff distance, positive predicted value (PPV), accuracy and sensitivity as the metric. The proposed SMGNN model can achieve SOTA segmentation performance while using remarkablely less parameters.
Model | Parameters↓ | Dice↑ | Hausdorff ↓ | PPV↑ | Accuracy↑ | Sensitivity↑ |
Unet | 5.43M | 0.9405 | 4.214 | 0.9468 | 0.9913 | 0.9325 |
RUnet | 31.03M | 0.9452 | 4.137 | 0.9528 | 0.9915 | 0.9377 |
Att-Unet | 34.88M | 0.9410 | 4.287 | 0.9413 | 0.9908 | 0.9301 |
Swin-Unet | 73.46M | 0.9501 | 4.056 | 0.9578 | 0.9918 | 0.9385 |
MIF-Net | 2.67M | 0.9523 | 4.035 | 0.9556 | 0.9911 | 0.9373 |
VGU-Net | 4.99M | 0.9604 | 3.987 | 0.9643 | 0.9923 | 0.9414 |
SMGNN (ours) | 7e−3M | 0.9572 | 4.017 | 0.9545 | 0.9914 | 0.9401 |
Model | Parameters↓ | Dice↑ | Hausdorff ↓ | PPV↑ | Accuracy↑ | Sensitivity↑ |
Unet | 5.43M | 0.9420 | 4.5018 | 0.9419 | 0.9910 | 0.9204 |
RUnet | 31.03M | 0.9462 | 4.8409 | 0.9303 | 0.9876 | 0.9116 |
Att-Unet | 34.88M | 0.9482 | 4.7425 | 0.9333 | 0.9914 | 0.9307 |
Swin-Unet | 73.46M | 0.9503 | 4.7886 | 0.9438 | 0.9919 | 0.9232 |
MIF-Net | 2.67M | 0.9517 | 4.224 | 0.9428 | 0.9909 | 0.9311 |
VGU-Net | 4.99M | 0.9520 | 4.209 | 0.9493 | 0.9913 | 0.9378 |
SMGNN (ours) | 7e−3M | 0.9495 | 4.4531 | 0.9431 | 0.9903 | 0.9373 |
Model | Time Cost |
Unet | 0.1094 |
RUnet | 0.1145 |
Att-Unet | 0.2681 |
Swin U-Net | 0.7794 |
MIF-Net | 0.0986 |
VGU-Net | 0.1756 |
SMGNN (ours) | 0.8173 |
To optimize node embeddings within our methodology, we selected the GIN model for the GNN module due to its superior discriminative capacity. Comparative tests with popular GNN models like GCN and graph attention network (GAT) revealed that a configuration using three layers of GIN demonstrated an enhanced performance, significantly improving node classification accuracy.
Beyond the conventional setup detailed in Figure 5, which incorporates pixel-level embedding, we ventured into an approach that solely leverages a GNN, transitioning from image-level segmentation components to a strictly node-based classification method, as illustrated in Figure 10. The training for this node classification is driven by a cross-entropy loss function, delineated as follows:
LCE(S,ˆS)=−1KK∑i=1C∑c=1Siclog(ˆSic), | (4.1) |
where we use the majority voting rule to define the supepixel label as
S=⌊Y+0.5⌋, | (4.2) |
guiding the supervised learning in graph space. In Eq (4.2), ⌊⋅⌋ is the round down function, ˆS is the predicted probability of superpixel, and C is the number of semantic classes. Though such a rule may group mistaken pixels whose pixel label is not the majority, the pure GNN methods may achieve good performance when the scale of the superpixel is small.
Our ablation studies, as depicted in Figure 11, offer keen insights into the performance nuances of pure GNN-based segmentation models. These models manifest commendable segmentation outcomes when oriented to small-scale superpixel environments. However, as we scale the superpixels, the model's performance is inversely impacted by its heightened sensitivity to superpixel quality, leading to notable performance drops. Introducing convolutional filters via CNN feature embedding for image-level segmentation does augment the model with additional parameters. Nevertheless, the significance of these filters is evident in the stability they confer upon the model, especially when navigating varying superpixel scales.
This investigation underpins a critical takeaway: The scale of superpixels and a model's sensitivity to their quality must be harmoniously calibrated. Relying exclusively on GNN-driven segmentation models may prove suboptimal when maneuvering larger superpixel frameworks.
To understand the impact of metric learning on the embedding space, we visually represent the spatial relationships of superpixels. We assign different colors to these superpixels based on their labels, as determined by Eq (4.2). Figure 12, created using uniform manifold approximation and projection (UMAP) [70], demonstrates the distances among embeddings derived from the GNN. A notable observation from this visualization is the pronounced separation between superpixels with distinct labels—a testament to the efficacy of incorporating metric learning. Furthermore, there's a heightened cosine similarity between samples that are alike, while distinct samples exhibit reduced similarity. This distinction underscores the model's ability to effectively differentiate and group superpixels in the embedding space.
Though the segmentation of cell salient regions, such as the nucleus and cytoplasm, is fundamental and challenging, there are various off-the-shelf methods available for subsequent cell type classification [71,72,73]. Segmentation may provide a direct means to obtain distinguishing characteristics for cell type classification, as cell morphology is closely related to cell type.
Algorithm 1 Training SMGNN Segmentation Model and ResNet Classification Model |
Input: White blood cell image dataset D={Xi,Yi,Ci}ni=1 Ensure: Optimal θGNN, θCNN and θResNet //X,Y,C denotes the image, segmentation label and cell type label respectively; 1). Preprocessing: X,Y,C← random mini-batch from D //Using SNIC algorithm [66] to generate superrpixels S and association matrix Q; S,Q=SNIC(X) //Using Eq (1) of the manuscript as the generation of superpixel metric labels (GSML); Y=GSML(Y,Q) //Construction of Superpixel Graph (CSG) to get the adjacency matrix of graph data with position relationship; A=CSG(S) 2). Training SMGNN Segmentation Model: θGNN,θCNN← initialize network parameters Repeat : H=GNN(S,A) //Convert embedding of graph space to image space with association matrix Q; h=QHhout=CNN(h,X)hlabel=Softmax(hout) //Compute the loss function according to Equation (10); LJoint=LDice(hlabel,Y)+λLSM(H,Y) // Update parameters according to gradients; θGNN+←−∇θGNNLJoint;θCNN+←−∇θCNNLJoint; Until deadline 3). Training Lightweight ResNet: θResNet← initialize network parameters Repeat : // Get the segmentation result with trained SMGNN model; hlabel=SMGNN(X)ˆY=argmaxhlabel // Training the ResNet Model; ˆC=ResNet(ˆY,X) Compute cross-entropy loss; Lcross-entropy=Lcross-entropy(ˆC,C) θResNet+←−∇θResNetLcross-entropy; Until deadline |
In this part, we employ a lightweight ResNet neural network [74,75] to train a classifier based on the outputs of segmentation networks, as shown in Figure 13. The overall recognition algorithm is shown in Algorithm 1. We sample about 1/3 training images of dataset-2 from each class and leave the remaining for testing. We train the segmentation and classification model separately. The classification result is shown in Table 4, and our classification method can achieve about 96.72% overall accuracy. In addition to our proposed segmentation-based cell type recognition system, we also implemented a baseline method to predict cell types without utilizing segmented cell regions. The corresponding results are presented in Table 5, and such methods without extraction on salient cell regions can barely achieve 72.13% overall accuracy. There are also traditional methods that employ handcrafted features extracted from segmented regions, combined with machine learning classifiers like support vector machine (SVM) [71], and such methods can get overall accuracy ranging from 89.69 to 96%. Our proposed deep learning-based automatic recognition system demonstrates high efficiency, and its accuracy can be further improved with an increase in the number of available training samples. To compare the overall accuracy of cell type classification task, we take the segmented regions of different models as input and compare the classification accuracy as shown in Table 6. Our proposed method achieves SOTA performance in the WBC recognition workflow.
Recognized Basophil | Recognized Eosinophil | Recognized Lymphocyte | Recognized Monocyte | Recognized Neutrophil | Accuracy | |
Basophil | 2 | 0 | 0 | 0 | 0 | 100% |
Eosinophil | 0 | 8 | 0 | 0 | 0 | 100% |
Lymphocyte | 0 | 0 | 23 | 1 | 0 | 92.00% |
Monocyte | 0 | 0 | 0 | 10 | 0 | 100% |
Neutrophil | 1 | 0 | 0 | 0 | 16 | 94.12% |
Overall Accuracy | - | - | - | - | - | 96.72% |
Recognized Basophil | Recognized Eosinophil | Recognized Lymphocyte | Recognized Monocyte | Recognized Neutrophil | Accuracy | |
Basophil | 1 | 1 | 0 | 0 | 0 | 50% |
Eosinophil | 2 | 5 | 0 | 0 | 1 | 62.50% |
Lymphocyte | 0 | 1 | 20 | 2 | 1 | 83.33% |
Monocyte | 0 | 0 | 3 | 6 | 1 | 60% |
Neutrophil | 2 | 1 | 2 | 0 | 12 | 70.58% |
Overall Accuracy | - | - | - | - | - | 72.13% |
Model | Parameters↓ | Dice↑ | CA↑ |
Unet | 5.43M | 0.9420 | 0.9344 |
RUnet | 31.03M | 0.9462 | 0.9672 |
Att-Unet | 34.88M | 0.9482 | 0.9508 |
Swin-Unet | 73.46M | 0.9503 | 0.9508 |
MIF-Net | 2.67M | 0.9517 | 0.9508 |
VGU-Net | 4.99M | 0.9520 | 0.9672 |
SMGNN(ours) | 7e−3M | 0.9495 | 0.9672 |
In this research paper, we proposed a deep learning based automatic recognizing system for the challenging WBC image recognizing task. In the first part, we proposed the SMGNN segmentation model, which combines superpixel methods and a lightweight GIN to significantly reduce memory usage while preserving segmentation capabilities. We innovatively proposed superpixel metric learning to capture cross-image global context information, making it highly suitable for medical images with limited training samples. Comparing our model to other mainstream deep learning models, we achieved comparable segmentation performance with a remarkable reduction of at most 10000 times fewer parameters. Through extended numerical experiments, we further investigated the effectiveness of metric learning and the quality of superpixels In the second part, the segmentation-based cell type classification processes exhibited satisfactory results, indicating that the overall automatic recognition algorithms are accurate and efficient for execution in hematological laboratories. We have made our code publicly available at https://github.com/jyh6681/SPXL-GNN, and we encourage its widespread implementation in portable devices of hematologists and remote rural areas.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
The authors declare there is no conflict of interest.
[1] | X. Chen, L. Cheng, J. Chadam, et al., Existence and uniqueness of solutions to the inverse boundary crossing problem for diffusions, Ann. Appl. Probab., 21(2011), 1663–1693. |
[2] | E. Ekström and S. Janson, The inverse first-passage problem and optimal stopping, Ann. Appl. Probab., 26(2016), 3154–3177. |
[3] | M. Abundo, An inverse first-passage problem for one-dimensional diffusions with random starting point, Stat. Probab. Lett. 82(2012), 7–14. |
[4] | P. Lansky, L. Sacerdote and C. Zucca, The Gamma renewal process as an output of the diffusion leaky integrate-and-fire neuronal model, Biol. Cybern., 110(2016),193–200. |
[5] | L. Sacerdote, A. E. P. Villa and C. Zucca, On the classification of experimental data modeled via a stochastic leaky integrate and fire model through boundary values, Bull. Math. Biol., 68(2006),1257–1274. |
[6] | C. Zucca and L. Sacerdote, On the inverse first-passage-time problem for a Wiener process, Ann. Appl. Prob., 19(2009), 1319–1346. |
[7] | G. L. Gerstein and B. Mandelbrot, Random walk models for the spike activity of a single neuron, Biophys. J., 4(1964),41–68. |
[8] | A. Klaus, S. Yu and D. Plenz, Statistical Analyses Support Power Law Distributions Found in Neuronal Avalanches, PLoS ONE, 6(2011), e19779. |
[9] | P. Lansky and R. Rodriguez, Two-compartment stochastic model of a neuron, Physica D, 132(1999), 267–286. |
[10] | L. M. Ricciardi and L.Sacerdote, The Ornstein-Uhlenbeck process as a model for neuronal activity. I. Mean and variance of the firing time, Biological Cybernetics, 35(1979), 1–9. |
[11] | L. Sacerdote and M. T. Giraudo, Stochastic Integrate and Fire Models: a Review on Mathematical Methods and their Applications Stochastic Biomathematical Models with Applications to Neuronal Modeling, in Lecture Notes in Mathematics series (Biosciences subseries) (eds. Bachar, Batzel and Ditlevsen), Springer, 2058(2013), 99–142. |
[12] | Y. Tsubo, Y. Isomura and T. Fukai, Power-Law Inter-Spike Interval Distributions Infer a Conditional Maximization of Entropy in Cortical Neurons, PLoS Comput. Biol., 8(2012), e1002461. |
[13] | N. Yannaros, On Cox processes and Gamma-renewal processes, J. Appl. Probab., 25(1988), 423–427. |
[14] | J. L. Folks and R. S. Chhikara, The Inverse Gaussian Distribution and Its Statistical Application–A Review, J. Royal Statist. Soc. Series B Methodol. 40(1978), 263–289. |
[15] | S. Iyengar and G. Patwardhan, Recent developments in the inverse Gaussian distribution, Handbook Statist., 7(1988),479–490. |
[16] | E. Benedetto, L. Sacerdote and C. Zucca, A first passage problem for a bivariate diffusion process: numerical solution with an application to neuroscience when the process is Gauss-Markov, J. Comp. Appl. Math., 242(2013), 41–52. |
[17] | L. Arnold, Stochastic Differential Equations: Theory and Applications, Krieger Publishing Company, Malabar, Florida, 1974. |
[18] | M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions With Formulas, Graphs, and Mathematical Tables, Dover, New York, 1964. |
[19] | K. E. Atkinson, An introduction to numerical analysis, Wiley, New York, 1989. |
Model | Parameters↓ | Dice↑ | Hausdorff ↓ | PPV↑ | Accuracy↑ | Sensitivity↑ |
Unet | 5.43M | 0.9405 | 4.214 | 0.9468 | 0.9913 | 0.9325 |
RUnet | 31.03M | 0.9452 | 4.137 | 0.9528 | 0.9915 | 0.9377 |
Att-Unet | 34.88M | 0.9410 | 4.287 | 0.9413 | 0.9908 | 0.9301 |
Swin-Unet | 73.46M | 0.9501 | 4.056 | 0.9578 | 0.9918 | 0.9385 |
MIF-Net | 2.67M | 0.9523 | 4.035 | 0.9556 | 0.9911 | 0.9373 |
VGU-Net | 4.99M | 0.9604 | 3.987 | 0.9643 | 0.9923 | 0.9414 |
SMGNN (ours) | 7e−3M | 0.9572 | 4.017 | 0.9545 | 0.9914 | 0.9401 |
Model | Parameters↓ | Dice↑ | Hausdorff ↓ | PPV↑ | Accuracy↑ | Sensitivity↑ |
Unet | 5.43M | 0.9420 | 4.5018 | 0.9419 | 0.9910 | 0.9204 |
RUnet | 31.03M | 0.9462 | 4.8409 | 0.9303 | 0.9876 | 0.9116 |
Att-Unet | 34.88M | 0.9482 | 4.7425 | 0.9333 | 0.9914 | 0.9307 |
Swin-Unet | 73.46M | 0.9503 | 4.7886 | 0.9438 | 0.9919 | 0.9232 |
MIF-Net | 2.67M | 0.9517 | 4.224 | 0.9428 | 0.9909 | 0.9311 |
VGU-Net | 4.99M | 0.9520 | 4.209 | 0.9493 | 0.9913 | 0.9378 |
SMGNN (ours) | 7e−3M | 0.9495 | 4.4531 | 0.9431 | 0.9903 | 0.9373 |
Model | Time Cost |
Unet | 0.1094 |
RUnet | 0.1145 |
Att-Unet | 0.2681 |
Swin U-Net | 0.7794 |
MIF-Net | 0.0986 |
VGU-Net | 0.1756 |
SMGNN (ours) | 0.8173 |
Algorithm 1 Training SMGNN Segmentation Model and ResNet Classification Model |
Input: White blood cell image dataset D={Xi,Yi,Ci}ni=1 Ensure: Optimal θGNN, θCNN and θResNet //X,Y,C denotes the image, segmentation label and cell type label respectively; 1). Preprocessing: X,Y,C← random mini-batch from D //Using SNIC algorithm [66] to generate superrpixels S and association matrix Q; S,Q=SNIC(X) //Using Eq (1) of the manuscript as the generation of superpixel metric labels (GSML); Y=GSML(Y,Q) //Construction of Superpixel Graph (CSG) to get the adjacency matrix of graph data with position relationship; A=CSG(S) 2). Training SMGNN Segmentation Model: θGNN,θCNN← initialize network parameters Repeat : H=GNN(S,A) //Convert embedding of graph space to image space with association matrix Q; h=QHhout=CNN(h,X)hlabel=Softmax(hout) //Compute the loss function according to Equation (10); LJoint=LDice(hlabel,Y)+λLSM(H,Y) // Update parameters according to gradients; θGNN+←−∇θGNNLJoint;θCNN+←−∇θCNNLJoint; Until deadline 3). Training Lightweight ResNet: θResNet← initialize network parameters Repeat : // Get the segmentation result with trained SMGNN model; hlabel=SMGNN(X)ˆY=argmaxhlabel // Training the ResNet Model; ˆC=ResNet(ˆY,X) Compute cross-entropy loss; Lcross-entropy=Lcross-entropy(ˆC,C) θResNet+←−∇θResNetLcross-entropy; Until deadline |
Recognized Basophil | Recognized Eosinophil | Recognized Lymphocyte | Recognized Monocyte | Recognized Neutrophil | Accuracy | |
Basophil | 2 | 0 | 0 | 0 | 0 | 100% |
Eosinophil | 0 | 8 | 0 | 0 | 0 | 100% |
Lymphocyte | 0 | 0 | 23 | 1 | 0 | 92.00% |
Monocyte | 0 | 0 | 0 | 10 | 0 | 100% |
Neutrophil | 1 | 0 | 0 | 0 | 16 | 94.12% |
Overall Accuracy | - | - | - | - | - | 96.72% |
Recognized Basophil | Recognized Eosinophil | Recognized Lymphocyte | Recognized Monocyte | Recognized Neutrophil | Accuracy | |
Basophil | 1 | 1 | 0 | 0 | 0 | 50% |
Eosinophil | 2 | 5 | 0 | 0 | 1 | 62.50% |
Lymphocyte | 0 | 1 | 20 | 2 | 1 | 83.33% |
Monocyte | 0 | 0 | 3 | 6 | 1 | 60% |
Neutrophil | 2 | 1 | 2 | 0 | 12 | 70.58% |
Overall Accuracy | - | - | - | - | - | 72.13% |
Model | Parameters↓ | Dice↑ | CA↑ |
Unet | 5.43M | 0.9420 | 0.9344 |
RUnet | 31.03M | 0.9462 | 0.9672 |
Att-Unet | 34.88M | 0.9482 | 0.9508 |
Swin-Unet | 73.46M | 0.9503 | 0.9508 |
MIF-Net | 2.67M | 0.9517 | 0.9508 |
VGU-Net | 4.99M | 0.9520 | 0.9672 |
SMGNN(ours) | 7e−3M | 0.9495 | 0.9672 |
Model | Parameters↓ | Dice↑ | Hausdorff ↓ | PPV↑ | Accuracy↑ | Sensitivity↑ |
Unet | 5.43M | 0.9405 | 4.214 | 0.9468 | 0.9913 | 0.9325 |
RUnet | 31.03M | 0.9452 | 4.137 | 0.9528 | 0.9915 | 0.9377 |
Att-Unet | 34.88M | 0.9410 | 4.287 | 0.9413 | 0.9908 | 0.9301 |
Swin-Unet | 73.46M | 0.9501 | 4.056 | 0.9578 | 0.9918 | 0.9385 |
MIF-Net | 2.67M | 0.9523 | 4.035 | 0.9556 | 0.9911 | 0.9373 |
VGU-Net | 4.99M | 0.9604 | 3.987 | 0.9643 | 0.9923 | 0.9414 |
SMGNN (ours) | 7e−3M | 0.9572 | 4.017 | 0.9545 | 0.9914 | 0.9401 |
Model | Parameters↓ | Dice↑ | Hausdorff ↓ | PPV↑ | Accuracy↑ | Sensitivity↑ |
Unet | 5.43M | 0.9420 | 4.5018 | 0.9419 | 0.9910 | 0.9204 |
RUnet | 31.03M | 0.9462 | 4.8409 | 0.9303 | 0.9876 | 0.9116 |
Att-Unet | 34.88M | 0.9482 | 4.7425 | 0.9333 | 0.9914 | 0.9307 |
Swin-Unet | 73.46M | 0.9503 | 4.7886 | 0.9438 | 0.9919 | 0.9232 |
MIF-Net | 2.67M | 0.9517 | 4.224 | 0.9428 | 0.9909 | 0.9311 |
VGU-Net | 4.99M | 0.9520 | 4.209 | 0.9493 | 0.9913 | 0.9378 |
SMGNN (ours) | 7e−3M | 0.9495 | 4.4531 | 0.9431 | 0.9903 | 0.9373 |
Model | Time Cost |
Unet | 0.1094 |
RUnet | 0.1145 |
Att-Unet | 0.2681 |
Swin U-Net | 0.7794 |
MIF-Net | 0.0986 |
VGU-Net | 0.1756 |
SMGNN (ours) | 0.8173 |
Algorithm 1 Training SMGNN Segmentation Model and ResNet Classification Model |
Input: White blood cell image dataset D={Xi,Yi,Ci}ni=1 Ensure: Optimal θGNN, θCNN and θResNet //X,Y,C denotes the image, segmentation label and cell type label respectively; 1). Preprocessing: X,Y,C← random mini-batch from D //Using SNIC algorithm [66] to generate superrpixels S and association matrix Q; S,Q=SNIC(X) //Using Eq (1) of the manuscript as the generation of superpixel metric labels (GSML); Y=GSML(Y,Q) //Construction of Superpixel Graph (CSG) to get the adjacency matrix of graph data with position relationship; A=CSG(S) 2). Training SMGNN Segmentation Model: θGNN,θCNN← initialize network parameters Repeat : H=GNN(S,A) //Convert embedding of graph space to image space with association matrix Q; h=QHhout=CNN(h,X)hlabel=Softmax(hout) //Compute the loss function according to Equation (10); LJoint=LDice(hlabel,Y)+λLSM(H,Y) // Update parameters according to gradients; θGNN+←−∇θGNNLJoint;θCNN+←−∇θCNNLJoint; Until deadline 3). Training Lightweight ResNet: θResNet← initialize network parameters Repeat : // Get the segmentation result with trained SMGNN model; hlabel=SMGNN(X)ˆY=argmaxhlabel // Training the ResNet Model; ˆC=ResNet(ˆY,X) Compute cross-entropy loss; Lcross-entropy=Lcross-entropy(ˆC,C) θResNet+←−∇θResNetLcross-entropy; Until deadline |
Recognized Basophil | Recognized Eosinophil | Recognized Lymphocyte | Recognized Monocyte | Recognized Neutrophil | Accuracy | |
Basophil | 2 | 0 | 0 | 0 | 0 | 100% |
Eosinophil | 0 | 8 | 0 | 0 | 0 | 100% |
Lymphocyte | 0 | 0 | 23 | 1 | 0 | 92.00% |
Monocyte | 0 | 0 | 0 | 10 | 0 | 100% |
Neutrophil | 1 | 0 | 0 | 0 | 16 | 94.12% |
Overall Accuracy | - | - | - | - | - | 96.72% |
Recognized Basophil | Recognized Eosinophil | Recognized Lymphocyte | Recognized Monocyte | Recognized Neutrophil | Accuracy | |
Basophil | 1 | 1 | 0 | 0 | 0 | 50% |
Eosinophil | 2 | 5 | 0 | 0 | 1 | 62.50% |
Lymphocyte | 0 | 1 | 20 | 2 | 1 | 83.33% |
Monocyte | 0 | 0 | 3 | 6 | 1 | 60% |
Neutrophil | 2 | 1 | 2 | 0 | 12 | 70.58% |
Overall Accuracy | - | - | - | - | - | 72.13% |
Model | Parameters↓ | Dice↑ | CA↑ |
Unet | 5.43M | 0.9420 | 0.9344 |
RUnet | 31.03M | 0.9462 | 0.9672 |
Att-Unet | 34.88M | 0.9482 | 0.9508 |
Swin-Unet | 73.46M | 0.9503 | 0.9508 |
MIF-Net | 2.67M | 0.9517 | 0.9508 |
VGU-Net | 4.99M | 0.9520 | 0.9672 |
SMGNN(ours) | 7e−3M | 0.9495 | 0.9672 |