Optimal nutritional intake for fetal growth

  • The regular nutritional intake of an expectant mother clearly affects the weight development of the fetus. Assuming the growth of the fetus follows a deterministic growth law, like a logistic equation, albeit dependent on the nutritional intake, the ideal solution is usually determined by the birth-weight being pre-assigned, for example, as a percentage of the mother's average weight. This problem can then be specified as an optimal control problem with the daily intake as the control, which appears in a Michaelis-Menten relationship, for which there are well-developed procedures to follow. The best solution is determined by requiring minimum total intake under which the preassigned birth weight is reached. The algorithm has been generalized to the case where the fetal weight depends in a detailed way on the cumulative intake, suitably discounted according to the history. The optimality system is derived and then solved numerically using an iterative method for the specific values of parameter. The procedure is generic and can be adapted to any growth law and any parameterisation obtained by the detailed physiology.

    Citation: Chanakarn Kiataramkul, Graeme Wake, Alona Ben-Tal, Yongwimon Lenbury. Optimal nutritional intake for fetal growth[J]. Mathematical Biosciences and Engineering, 2011, 8(3): 723-732. doi: 10.3934/mbe.2011.8.723

    Related Papers:

    [1] Qian Wu, Yuyao Pei, Zihao Cheng, Xiaopeng Hu, Changqing Wang . SDS-Net: A lightweight 3D convolutional neural network with multi-branch attention for multimodal brain tumor accurate segmentation. Mathematical Biosciences and Engineering, 2023, 20(9): 17384-17406. doi: 10.3934/mbe.2023773
    [2] Tongping Shen, Fangliang Huang, Xusong Zhang . CT medical image segmentation algorithm based on deep learning technology. Mathematical Biosciences and Engineering, 2023, 20(6): 10954-10976. doi: 10.3934/mbe.2023485
    [3] Shen Jiang, Jinjiang Li, Zhen Hua . Transformer with progressive sampling for medical cellular image segmentation. Mathematical Biosciences and Engineering, 2022, 19(12): 12104-12126. doi: 10.3934/mbe.2022563
    [4] Xiaomeng Feng, Taiping Wang, Xiaohang Yang, Minfei Zhang, Wanpeng Guo, Weina Wang . ConvWin-UNet: UNet-like hierarchical vision Transformer combined with convolution for medical image segmentation. Mathematical Biosciences and Engineering, 2023, 20(1): 128-144. doi: 10.3934/mbe.2023007
    [5] Jun Liu, Zhenhua Yan, Chaochao Zhou, Liren Shao, Yuanyuan Han, Yusheng Song . mfeeU-Net: A multi-scale feature extraction and enhancement U-Net for automatic liver segmentation from CT Images. Mathematical Biosciences and Engineering, 2023, 20(5): 7784-7801. doi: 10.3934/mbe.2023336
    [6] Xiaoyan Zhang, Mengmeng He, Hongan Li . DAU-Net: A medical image segmentation network combining the Hadamard product and dual scale attention gate. Mathematical Biosciences and Engineering, 2024, 21(2): 2753-2767. doi: 10.3934/mbe.2024122
    [7] Zhigao Zeng, Cheng Huang, Wenqiu Zhu, Zhiqiang Wen, Xinpan Yuan . Flower image classification based on an improved lightweight neural network with multi-scale feature fusion and attention mechanism. Mathematical Biosciences and Engineering, 2023, 20(8): 13900-13920. doi: 10.3934/mbe.2023619
    [8] Haiyan Song, Cuihong Liu, Shengnan Li, Peixiao Zhang . TS-GCN: A novel tumor segmentation method integrating transformer and GCN. Mathematical Biosciences and Engineering, 2023, 20(10): 18173-18190. doi: 10.3934/mbe.2023807
    [9] Yuqing Zhang, Yutong Han, Jianxin Zhang . MAU-Net: Mixed attention U-Net for MRI brain tumor segmentation. Mathematical Biosciences and Engineering, 2023, 20(12): 20510-20527. doi: 10.3934/mbe.2023907
    [10] Zhenyin Fu, Jin Zhang, Ruyi Luo, Yutong Sun, Dongdong Deng, Ling Xia . TF-Unet:An automatic cardiac MRI image segmentation method. Mathematical Biosciences and Engineering, 2022, 19(5): 5207-5222. doi: 10.3934/mbe.2022244
  • The regular nutritional intake of an expectant mother clearly affects the weight development of the fetus. Assuming the growth of the fetus follows a deterministic growth law, like a logistic equation, albeit dependent on the nutritional intake, the ideal solution is usually determined by the birth-weight being pre-assigned, for example, as a percentage of the mother's average weight. This problem can then be specified as an optimal control problem with the daily intake as the control, which appears in a Michaelis-Menten relationship, for which there are well-developed procedures to follow. The best solution is determined by requiring minimum total intake under which the preassigned birth weight is reached. The algorithm has been generalized to the case where the fetal weight depends in a detailed way on the cumulative intake, suitably discounted according to the history. The optimality system is derived and then solved numerically using an iterative method for the specific values of parameter. The procedure is generic and can be adapted to any growth law and any parameterisation obtained by the detailed physiology.


    Accurate medical images segmentation is basic and crucial for medical image processing and analysis [1,2]. Generally, the targets on medical images are segmented by sketching the outline manually, but this is time-consuming and requires professional knowledge of physicians. Lots of morphology-based automatic segmentation methods have been proposed in the past, including edge detection, area detection, and template matching [3]. However, it is difficult to design specifically and easily deformable models for various segmentation tasks [4]. The significant variations in the scale and shape of segmented targets add to the difficulty of segmentation tasks [5].

    With the great development of deep learning, deep convolutional neural networks (DCNNs) have achieved excellent performance in medical image segmentation filed [3,6,7]. Compared with traditional methods, DCNNs can automatically extract features and show higher accuracy and robustness. Many structures [8,9,10] were founded on Fully Convolutional Network (FCN), and U-Net [10] with its variants, have been widely implemented to many tasks, such as skin lesions [11,12], thyroid gland [13,14], lung [7,15], nuclei [16,17,18,19], etc. U-Net adopts the encoder-decoder structure. The encoder captures feature information using continuous stacked convolutional layers and the decoder is sited to recover the categories of each pixel. At the same time, multiple skip connections are also applied to spread feature information for final segmentation. The variants, such as MultiResUNet [12], UNet++ [20], rethought the U-Net architecture and achieved better performance in segmentation tasks. Nevertheless, there are still some shortcomings of U-Net and its variants. First, the pooling operation may lose some important features which are conductive to improve the segmentation accuracy. Second, these methods couldn't dynamically adjust to the variation of features, such as shape and size. Third, continuous stacked convolution layers deepen the network architecture and enhance the feature extraction capability of models, but a major critique of such models is their large parameter count [21].

    The attention mechanism helps the network to draw what we need. By imitating the way humans allocate attention, the attention feature vectors or maps dynamically add the important weights to critical information and omit useless ones. Using the squeeze-and-excitation module, SE-Net [22], showed effectiveness of the inter-channel relationship. ECA module [23], inspired by SE-Net, generated feature weights by 1D convolution operation and effected on output. [24] adopted the attention mechanism for image classification based on RNN model and achieved good performance. [25] first applied attention mechanism to the field of NLP in machine translation tasks, and [26] proposed a self-attention mechanism. For medical image segmentation field, Attention U-Net [27] applied attention gates for capturing richer contextual information. CA-Net [11] proposed a comprehensive attention network to emphasize significant features in multi-scale feature maps. These methods take advantage of context information and achieve higher accuracy, but the parameters are relatively higher.

    The emergence of depthwise separable convolution has shown great efficiency and reduced training parameters over regular convolution [28,29,30]. It separates the standard convolution operation into two layers: depthwise convolution and pointwise convolution. Each input channel is first convoluted spatially, and the pointwise convolution subsequently processes the channels into a new channel dimensional space, subsequently. In MobileNets architecture [29], depthwise separable convolution was employed to build lightweight networks and embedded in mobile visual applications. DeepLabV3+ [31] applied it to the ASPP module, which achieved a faster and more powerful network for semantic image segmentation. X-Net [32] adopted it to scale the network size down and performed well. MobileNetV3-UNet [33] created a lightweight encoder and decoder architecture based on depthwise separable convolution, which achieved high accuracy on medical image segmentation tasks.

    Combining the advantages of attention mechanism and depthwise separatable convolution into U-shaped architecture, a lightweight DSCA-Net is proposed in this paper for medical image segmentation. Three novel attention modules are proposed and integrated into the encoder and decoder of U-Net, separately. The chief contributions of our work are summarized as follows:

    1) A Pooling Attention module is proposed to reduce the feature loss caused by down-sampling.

    2) A Context Attention module is designed to exploit the concatenation feature maps from the encoder and decoder, which combines the spatial and channel attention mechanisms to focus on useful position features.

    3) To make better use of multi-scale information from different level stages of the decoder, a Multiscale Edge Attention module is proposed to deal with combined features for the final prediction.

    4) We integrate all proposed modules into DSCA-Net for medical image segmentation and all convolution operations are implemented by depthwise separable convolution. The proposed network was evaluated on four public datasets and the experimental results reveal that our proposed network outperforms previous state-of-the-art frameworks.

    The remainder of our paper is structured below. Section 2 goes over detailed information of proposed DSCA-Net architecture, and Section 3 describes the experimental settings and results. Finally, some discussions and conclusions are given in Sections 4 and 5.

    By combining attention mechanism and depthwise separable convolution with the architecture of U-Net, we propose DSCA-Net, which is shown in Figure 1. The network is composed of encoding part, decoding part, and multiscale edge part. Firstly, we replace the stacked $ 3\times 3 $ convolution layers of U-Net with DC module. The depth of encoder is 128, which enables our proposed model better extracting abundant features while reducing parameter amount. Secondly, to reduce the feature loss, PA module is embedded in place of maximum pooling layer, which has almost no effect on the number of parameters. Then, long-range skip connections are utilized to transfer feature maps from encoder to symmetrical decoder stage after passing through CA module, which fuses and recalibrates the context information at five different resolution levels. Finally, MEA module reemphasizes the salient scale information from concatenating multiscale feature maps, which enable the last CNN layer to be aware of segmenting target edge.

    Figure 1.  The overall architecture of DSCA-Net.

    Recent studies show that extending the network depth leads to better segmentation performance [28,34]. Based on depthwise separable convolution operation [28] and DenseNet [35], the dense convolution module is proposed. We utilize it in encoder to extract high-dimensional feature information and recover segmented target details in decoder. As shown in Figure 2, every depthwise separable convolution layer is followed by one group normalization [36] and LeakyReLU [37], which improves nonlinear expression capability of model. For convenience, we assume the input as $ {x}_{input}\in {\mathbb{R}}^{C\times H\times W} $. $ C, H, W $ denote channel, height, and weight, respectively. At beginning, one $ 1\times 1 $ convolution layer $ {F}_{1\times 1}^{conv} $ expands $ {x}_{input} $ channel numbers 2 times. Then, multiple residual connections from former layers are summed to all subsequent layer with two continuous $ 3\times 3 $ convolution layers $ {F}_{3\times 3}^{conv} $. The elementwise summation operations are used for fusing extracted information without adding parameters. DC module is described by the following equation:

    $ {x}_{0} = {F}_{1\times 1}^{conv}\left({x}_{input}\right) $ (1)
    $ {x}_{i} = {{F}_{3\times 3}^{conv}\left(SUM\right(x}_{0};{x}_{1};{x}_{2};\dots ;{x}_{i-1}\left)\right) $ (2)
    Figure 2.  Dense convolution module.

    where $ {x}_{0}{\in \mathbb{R}}^{C\times H\times W} $ denotes the input feature map and $ {x}_{i}{\in \mathbb{R}}^{C\times H\times W} $ represents the feature maps in layer $ i $.

    Consecutive pooling operation in encoder enlarges the reception of convolution operation but lose certain features. Therefore, we rethink SE-Net [22] and ECA-Net [23], and propose PA module to replace the original pooling layer, as shown in Figure 3. PA module mainly takes in a two-branch structure. One branch tries to obtain an attention channel feature vector and the other rescales height and width of feature maps. First, a $ 1 $D convolution $ {F}_{5\times 1\times 1}^{conv} $ with shared kernel weights of $ 5 $ is used to extract more abundant feature information after adaptive maximum pooling $ {P}_{1\times 1}^{max} $ and average pooling $ {P}_{1\times 1}^{avg} $ layers. Then, the vector $ {V}_{sum} $ is summed element-by-element and activated by $ Sigmoid $ function. Finally, the output $ {y}_{out} $ is multiplied by rescaled feature maps $ {M}_{scaled} $. PA module can be expressed as follows:

    $ {V}_{sum} = {Sigmoid\left(GN\right({F}_{5\times 1\times 1}^{conv}(P}_{1\times 1}^{max}\left({x}_{input}\right)) \oplus {{F}_{5\times 1\times 1}^{conv}(P}_{1\times 1}^{avg}\left({x}_{input}\right))\left)\right) $ (3)
    $ {M}_{scaled} = {P}_{\frac{h}{2}\times \frac{w}{2}}^{max}\left({x}_{input}\right) \oplus {P}_{\frac{h}{2}\times \frac{w}{2}}^{avg}\left({x}_{input}\right) $ (4)
    $ {y}_{out} = {V}_{sum}\otimes {M}_{scaled} $ (5)
    Figure 3.  Pooling attention module.

    where $ {x}_{input}{\in \mathbb{R}}^{C\times H\times W} $ denotes input feature maps, $ \oplus $ and $ \otimes $ denote element-wise summation and element-wise production, respectively.

    In the process of context information extraction, simple concatenation of U-Net is not sufficient to gradually restore needed information. Drawing lessons from dynamic weight similarity calculation, we propose CA module to fuse context information, as shown in Figure 4. $ {x}_{low}{\in \mathbb{R}}^{C\times H\times W} $ and $ {x}_{high}{\in \mathbb{R}}^{C\times \frac{H}{2}\times \frac{W}{2}} $ represent feature maps from encoder and decoder, respectively. At first, we obtain $ {x}_{input}{\in \mathbb{R}}^{2C\times H\times W} $ via concatenating $ {x}_{low} $ and $ {x}_{high} $ from upper decoder layer. Then, to capture detailed context information, CA module adopts a three-branch structure, including a spatial attention branch, a channel attention branch, and a convolution branch, which has the same dimensions of $ {x}_{input} $ and $ {y}_{out} $. The learned feature maps from spatial attention branch $ {x}_{spatial}{\in \mathbb{R}}^{2C\times H\times W} $ and channel attention branch $ {x}_{channel}{\in \mathbb{R}}^{2C\times H\times W} $ multiply convolutional feature maps $ {x}_{conv}{\in \mathbb{R}}^{2C\times H\times W} $, separately. Finally, feature maps are concatenated and one $ 1\times 1 $ convolution $ {F}_{1\times 1}^{conv} $ reconstructs $ {y}_{out}{\in \mathbb{R}}^{C\times H\times W} $. The relevant formula can be stated as follows:

    $ {x}_{input} = Cat\left[{x}_{low};U\left({x}_{high}\right)\right] $ (6)
    $ {x}_{channel} = {x}_{input} \otimes \left({Sigmoid\left({F}_{5\times 1\times 1}^{conv}\right(P}_{1\times 1}^{max}\left({x}_{input}\right)\right) \oplus {{F}_{5\times 1\times 1}^{conv}(P}_{1\times 1}^{avg}\left({x}_{input}\right)\left)\right)) $ (7)
    $ {x}_{spatial} = {{x}_{input} \otimes \left(Sigmoid\right({F}_{1\times 1}^{conv}\left(Cat\right[P}_{H\times W}^{max}\left({x}_{input}\right);{P}_{H\times W}^{avg}\left({x}_{input}\right)\left]\right)\left)\right) $ (8)
    $ {y}_{out}{ = F}_{1\times 1}^{conv}\left(Cat\left[{x}_{spatial} \otimes {x}_{conv};{x}_{channel} \otimes {x}_{conv};{x}_{input}\right]\right) $ (9)
    Figure 4.  Context attention module.

    where $ Cat[ \cdot ] $ denotes the concatenation operation along with channel dimension. $ {P}_{H\times W}^{avg} $ denotes adaptive average pooling operation and $ {P}_{H\times W}^{max} $ denotes adaptive maximum pooling operation. For the process of bottom feature information, we use a deformable CA module to capture context information with single input.

    U-Net uses decoder to restore the categories of each pixel. However, segmented objects with large variant scales and blurred edges increase the difficulty of accurate segmentation. The pixel position of target edge in feature maps of union scales from decoder is slightly different and the high-level feature maps in decoder contain more sufficient edge information. Learning the scale-dynamic weights of all fused feature map pixels for calibrating object edge is desirable. To utilize multiscale feature maps, we propose MEA module, as shown in Figure 5. First, we use bilinear up-sampling layers of different scale factors ($ s = 1, 2, 4, 8 $) to unify feature map scale obtained from decoder to the final output size and concatenate them. Then, for learning scale-dynamic features, one $ 1\times 1 $ convolution and $ Sigmoid $ function generate calibrated weights and multiplied original input to obtain $ {y}_{out}{\in \mathbb{R}}^{C\times H\times W} $. MEA module can be described as follows:

    $ {x}_{input} = Cat\left[s\left({x}_{1}\right);s\left({x}_{2}\right);\dots s\left({x}_{i}\right)\right] $ (10)
    $ {y}_{out} = {x}_{input}\otimes GN\left(Sigmoid\left({x}_{input}\right)\right) $ (11)
    Figure 5.  Multiscale edge attention module.

    where $ s\left(\cdot \right) $ denotes resampled function with different scale factors. $ {x}_{input} $ denotes concatenated feature map and $ GN $ denotes Group Normalization.

    To assess our proposed network, we validated DSCA-Net and compared with other state-of-the-art methods on four public datasets: ISIC 2018 dataset [5,38], thyroid gland segmentation dataset [39], lung segmentation (LUNA) dataset, and nuclei segmentation (TNBC) dataset [17]. Each dataset poses its separate challenge, and the corresponding samples are shown in Figure 6. On each task, we compared the results with state-of-the-art networks and implemented ablation studies to demonstrate effectiveness of modules, which will be discussed in Sections 3.3–3.6.

    Figure 6.  Samples of four datasets.

    During the experimental period, all models in this paper were achieved based on Pytorch and the experimental planform was supported by Linux 18.04 operating system, which was equipped with Intel Xeon CPU @2.30 GHz and 27GB RAM. The GPU was 16 GB Nvidia Tesla P100-PCIE. The Adam optimizer [40] was used with learning rate $ {10}^{-4} $, and weight decay $ {10}^{-8} $. The dynamic learning rate was decayed by 0.5 every 100 epochs. We utilized the Soft Dice loss for model training and kept optimal result upon validation dataset. Quantitative results were obtained in test.

    To maximize the use of GPU, the batch sizes are set to 8, 4, 12, and 2 for ISIC, thyroid gland, LUNA, and TNBC datasets, respectively. For better fitting data, the number of iterations for TNBC dataset is 500, and 300 for others. The training process stops automatically after the maximum epoch. We utilized Fivefold cross-validation for result to assess the stability and effectiveness of DSCA-Net. Every input image was normalized from $ \left[0, \;255\right] $ to $ \left[0, \;1\right] $. During model training, random rotation and flipping of the angle in $ (-\frac{\pi }{9}, \frac{\pi }{9}) $ with the probability of 0.5 were applied for data augmentation.

    In this paper, Dice coefficient (Dice), Intersection over Union (IoU), accuracy (Acc), specificity (Spec), sensitivity (Sens) and average symmetric surface distance (ASSD) are used as evaluation metrics. The formula for all metrics can be expressed as follows:

    $ Dice = \frac{2*TP}{2TP+FN+FP} $ (12)
    $ IoU = \frac{TP}{TP+FN+FP} $ (13)
    $ Accuracy = \frac{TP+TN}{TP+FP+TN+FN} $ (14)
    $ Specific = \frac{TN}{TN+FP} $ (15)
    $ Sensitivity = \frac{TP}{TP+FN} $ (16)

    where $ TP, TN, FP, FN $ represent the predicted pixel numbers of true positive, true negative, false positive, and false negative, respectively. Assuming $ {S}_{a} $ and $ {S}_{b} $ are the set of border points from prediction result and corresponding label, individually, $ ASSD $ is defined as:

    $ ASSD = \frac{\left(\sum _{a\in {S}_{a}}d\left(a, {S}_{b}\right)+\sum _{b\in {S}_{b}}d\left(b, {S}_{a}\right)\right)}{{|S}_{a}|+|{S}_{b}|} $ (17)

    where $ {d(v, S}_{a}) = {min}_{x\in {S}_{a}}\left(\left|v-x\right|\right) $ represents the shortest Pythagorean distance between point $ v $ and $ {S}_{a} $.

    The skin lesion segmentation dataset has 2594 images and their corresponding label in 2018 [5,38]. We randomly divided the dataset by the ratio of 7:2:1 into 1815,261, and 520 used for training, validation, and testing, respectively. The original size of images in dataset varies from $ 720\times 540 $ to $ 6708\times 4439 $. To facilitate the training process of our proposed network, all images and corresponding masks were cropped to $ 256\times 256. $

    Some skin lesion segmentation samples of our proposed network and U-Net are shown in Figure 7. U-Net performs unsatisfactorily compared with DSCA-Net in regular skin lesion segmentation images. When the skin lesion has a similar color to surroundings or occluded by hair and tissue fluid, U-Net gets error segmentation results. The more blurred boundary of skin lesion, the more incorrect segmentation is obtained by U-Net. Comparatively, DSCA-Net performs better.

    Figure 7.  Visualization results of skin lesion segmentation dataset.

    To fully confirm the validity of our method, we compared DSCA-Net with U-Net [10], Attention U-Net [27], RefineNet [41], EOCNet [42], CA-Net [11], DeepLabv3+ [31], MobileNetV3-UNet [33] and IBA-U-Net [44] on this dataset. The results are listed in Table 1. Our proposed model performs an Acc of 0.9532, 0.0755 higher than U-Net, 0.0053 higher than second-place method MobileNetV3-UNet. Although Dice is 0.0002 less than DeepLabV3+, the difference is not significant. Our model has 1/3.53, 1/15.85, 1/24.86, 1/1.27, 1/3.77 and 1/6.32 times fewer parameters than U-Net, attention U-Net, DeepLabV3+, CA-Net, MobileNetV3-UNet, and IBA-U-Net with better segmentation performance, respectively.

    Table 1.  Comparisons of segmentation performance and number of parameters between DSCA-Net and other networks on skin lesion segmentation.
    Methods Dice Acc Params
    U-Net [10] 0.8739 0.8777 7.76 M
    Attention-UNet [27] 0.8846 0.8846 34.88 M
    RefineNet [41] 0.9155 0.9155 46.3 M
    EOC-Net [42] 0.8611 0.8401 -
    CA-Net [11] 0.9208 0.9268 2.8 M
    DeepLabV3+ [43] 0.9221 0.9179 54.7 M
    Mobi leNetV3-UNet [33] 0.9098 0.9479 8.3 M
    IBA-U-Net [44] - 0.9440 13.91 M
    Ours 0.9282 0.9532 2.2 M

     | Show Table
    DownLoad: CSV

    Table 2 lists the comparison results. Lightweight U-Net were achieved by depthwise separable convolution instead of original stacked convolution layers of U-Net. DSCA-Net is the network adding all designed modules. The quantitative results show that our proposed modules strengthen the feature extraction ability. Every proposed module improves segmentation performance. At the same time, Backbone + DC + PA, and Backbone + DC + PA + CA shows better segmentation results than U-Net.

    Table 2.  Quantitative evaluation of ablation study on skin lesion segmentation dataset.
    Methods Dice IoU ASSD
    Lightweight U-Net (Backbone) 0.8232 0.7164 1.6189
    Backbone + DC 0.8612 0.7672 1.0833
    Backbone + DC + PA 0.8941 0.8157 0.8364
    Backbone + DC + PA + CA 0.9219 0.8711 0.6753
    DSCA-Net 0.9282 0.8733 0.6318

     | Show Table
    DownLoad: CSV

    The thyroid public dataset [39] was acquired by a GE Logiq E9 XDclear 2.0 system equipped with a GE ML6-15 ultrasound probe with Ascension driveBay electromagnetic tracking. It took from healthy thyroid records and the volumes were taken straight from the ultrasound imaging instrument, which was recorded in DICOM format. The matching label, which was produced by a medical expert, contains the isthmus as part of the segmented region. To train our model, we split the volume into 3998 individual slices with label. We randomly used 2798 images for training, 400 images for validation and 800 for testing, with a ratio of 7:1:2. The shape of input was randomly cropped in $ 256\times 256 $.

    Figure 8 presents several test segmenting results on thyroid gland dataset. The edge of thyroid gland and background information usually have some outliers and similarities in vision, but not relate to our interest. Observations show that U-Net under-segmented thyroid isthmus while DSCA-Net better.

    Figure 8.  Visualization results of thyroid gland segmentation dataset.

    We tested DSCA-Net against three methods: SegNet [8], SUMNet [14], and Attention-UNet [27]. Quantitative evaluation results present in Table 3. The Dice increases from 0.9332 to 0.9727 by 4.2%, the Sens increases from 0.9526 to 0.9873 by 3.6% and Spec increases from 0.9169 to 0.9921 by 8.2% compared with U-Net. Our model has 1/51.05 times fewer parameters than SegNet and performs better through evaluation metrics.

    Table 3.  Comparisons of segmentation performance and number of parameters between DSCA-Net and other networks on thyroid gland segmentation.
    Methods Dice Sens Spec Params
    U-Net [10] 0.9332 0.9526 0.9169 7.76 M
    SegNet [8] 0.8401 0.9811 0.8437 112.32 M
    SUMNet [14] 0.9207 0.9830 0.8911 -
    Attention-UNet [27] 0.9582 0.9801 0.9444 34.88 M
    DSCA-Net 0.9727 0.9873 0.9921 2.2 M

     | Show Table
    DownLoad: CSV

    Additionally, Table 4 presents the quantitative analysis results of ablation study on thyroid segmentation. DSCA-Net scored the best performance in every metric. The Dice increased significantly after adding CA module, which indicates that CA module can efficiently extract context information for thyroid segmentation performance.

    Table 4.  Quantitative evaluation of ablation study on thyroid gland segmentation dataset.
    Methods Dice IoU ASSD
    Lightweight U-Net (Backbone) 0.8837 0.8079 1.0703
    Backbone + DC 0.9017 0.8325 0.8331
    Backbone + DC + PA 0.9113 0.8469 0.6557
    Backbone + DC + PA + CA 0.9687 0.9422 0.1072
    DSCA-Net 0.9727 0.9544 0.0953

     | Show Table
    DownLoad: CSV

    Lung segmentation requires segmenting the lung structure from a competition called Lung Nodule Analysis (LUNA). It contains 534 2D CT samples with corresponding label. The original resolution of images is $ 512\times 512, $ and we randomly cropped them into $ 256\times 256 $. Separately, 70, 10, and 20% of dataset is allocated for training, validation, and testing with corresponding number 374, 53, and 107.

    From the visualization results shown in Figure 9, DSCA-Net performs better than U-Net in detailed edge processing. Affected by the noise of lung CT images, U-Net produces some erroneous segmented areas. DSCA-Net has a greater tolerance to noise than U-Net. We demonstrate the validation of our approach by achieving a promising improvement despite the relatively simple task.

    Figure 9.  Visualization results of lung segmentation dataset.

    To quantitatively analyze the effectiveness, we assessed DSCA-Net with four methods: U-Net [10], CE-Net [4], RU-Net [15], and R2U-Net [15]. Table 5 demonstrates that all methods achieve excellent performance in four metrics, and our network reached 0.9828 in Dice, 0.9920 in Acc, 0.9836 in Sens, and 0.9895 in Spec, better than U-Net. In spite of the slightly lower performance of DSCA-Net than R2U-Net in Spec, our model has 1/1.9 times fewer parameters than R2U-Net while three metric scores are higher than R2U-Net. Noting that $ t $ in Table 5 means recurrent convolution time-step.

    Table 5.  Comparisons of segmentation performance and number of parameters between DSCA-Net and other networks on lung segmentation.
    Methods Dice Acc Sens Spec Params
    U-Net [10] 0.9675 0.9768 0.9441 0.9869 7.76 M
    CE-Net [4] - 0.9900 0.9800 - -
    ResU-Net (t = 2) [15] 0.9690 0.9849 0.9555 0.9945 -
    RU-Net (t = 2) [15] 0.9638 0.9836 0.9734 0.9866 4.2 M
    R2U-Net (t = 3) [15] 0.9826 0.9918 0.9826 0.9944 4.2 M
    DSCA-Net 0.9828 0.9920 0.9836 0.9895 2.2 M

     | Show Table
    DownLoad: CSV

    Table 6 shows the segmentation results of ablation study on lung segmentation dataset. By adding designed modules in sequence, each of the proposed modules improved segmentation performance of DSCA-Net. The backbone + DC + PA + CA exceeds U-Net 0.0138 in Dice, and DSCA-Net shows best performance in Dice, IoU and ASSD evaluation metrics.

    Table 6.  Quantitative evaluation of ablation study on lung segmentation dataset.
    Methods Dice IoU ASSD
    Lightweight U-Net (Backbone) 0.6160 0.4562 7.7274
    Backbone + DC 0.8597 0.7661 1.0903
    Backbone + DC + PA 0.9160 0.8471 1.3105
    Backbone + DC + PA + CA 0.9813 0.9631 0.1192
    DSCA-Net 0.9828 0.9662 0.0882

     | Show Table
    DownLoad: CSV

    The last application is nuclei segmentation of Triple-Negative Breast Cancer (TNBC) dataset. It has 50 images from 11 patients with the size of $ 512\times 512 $. To avoid overfitting of training process, we used data augmentation method to expand dataset with a total number of 500, including random flipping, random cropping, and random rotation with the angle in $ (-\frac{\pi }{6}, \frac{\pi }{6}) $. The probability of process triggering in data augmentation methods is 0.5. As usual, we adopted the same split ratio of 7:1:2, with 350, 50,100 for training, validation, and testing.

    Figure 10 illustrates some comparative cases of prediction results between our designed network and U-Net on TNBC dataset. It can be viewed that DSCA-Net performs better than U-Net. However, incorrect segmentation results are also obtained in some segmenting areas, as shown in second line. The obscure color transitional areas and overlaid nuclei increases the difficulty to be segmented. For the relatively easy segmentation target, our network performs better.

    Figure 10.  Visualization results of nuclei segmentation dataset.

    Additionally, we compared DACA-Net with other networks: U-Net [10], DeconvNet [17], Ensemble [17], Kang et al. [19], DeepLabV3+ [43], and Up-Net-N4 [16]. The comparison results are shown in Table 7. Although the Sens is 0.0456 lower than Ensemble in Table 7, a combination of attention mechanism and data augmentation allows our DSCA-Net to score higher than state-of-the-art methods in Dice and Acc. Our model has 1/233.13 and 1/3.36 times fewer parameters than DeconvNet and Up-Net-N4, separately.

    Table 7.  Comparisons of segmentation performance and number of parameters between DSCA-Net and other networks on nuclei segmentation.
    Methods Dice Acc Sens Params
    U-Net [10] 0.8087 0.9344 0.7915 7.76 M
    DeconvNet [18] 0.8151 0.9541 0.7731 512.9 M
    Ensemble [17] 0.8083 0.9441 0.9000 -
    Kang et al. [19] 0.8343 - 0.8330 -
    DeepLabV3+ [43] 0.8014 0.9549 - 54.7 M
    Up-Net-N4 [16] 0.8369 0.9604 - 7.4 M
    DSCA-Net 0.8995 0.9583 0.8544 2.2 M

     | Show Table
    DownLoad: CSV

    According to the quantitative evaluation results, Table 8 demonstrates the effectiveness of our proposed modules. After adding the MEA module, our proposed network performs better, which indicates that segmented edge is closer to the label with less error.

    Table 8.  Quantitative evaluation of ablation study on nuclei segmentation dataset.
    Methods Dice IoU ASSD
    Lightweight U-Net (Backbone) 0.6261 0.4562 7.7274
    Backbone + DC 0.7587 0.6471 1.7348
    Backbone + DC + PA 0.7691 0.6680 0.8761
    Backbone + DC + PA + CA 0.8337 0.8025 0.6065
    DSCA-Net 0.8995 0.8231 0.5597

     | Show Table
    DownLoad: CSV

    To lighten the network parameters and maintain performance, we take fully advantages of U-Net and integrate designed modules in DSCA-Net for 2D medical image segmentation. First, DC module replaces stacked convolutional layers of U-Net for feature extraction and restoration. Second, PA module is designed to recover down-sampling feature loss. Third, CA module substitutes the simple concatenation operation in U-Net to extract richer context information. In addition, MEA module is proposed to realize segmenting target edges from multi-scale encoder information for final prediction. Evaluation metrics with other state-of-the-art networks showed the performance of DSCA-Net is better.

    Multi-group experimental visualized results are shown in Figures 710. It can be summarized that our model is more robust than U-Net. For the blurred edge details and occlusions in electron microscope images, our network can also distinguish the segmented target correctly. For the most challenging task like TNBC, the similarity of adherent nuclei and unobvious changed color with great morphological changes increases the difficulty of segmentation. Our proposed network has achieved better results compared with other networks. However, it still needs further development.

    The target of this study is to lighten parameters of the network while maintaining good performance. We design a lightweight depthwise separable convolutional neural network with an attention mechanism named DSCA-Net for accurate medical image segmentation. Our proposed network extracts richer feature information and reduces feature loss in segmentation processing compared with U-Net. We assessed our network on four datasets and collected segmentation results against state-of-the-art networks under various metrics. The visualization and quantitative results show that our network has better segmenting ability. We intend to utilize DSCA-Net to segment 3D images in the future.

    This study was supported by Shanghai Jiao Tong University Medical-industrial Cross-key Project under Grant ZH2018ZDA26, the Jiangsu Provincial Key Research and Development Fund Project under Grant BE2017601.

    The authors have no conflicts of interest to declare.

  • This article has been cited by:

    1. Viet-Tien Pham, Ngoc-Tu Vu, Van-Truong Pham, Thi-Thao Tran, 2023, CellFormer: A DaViT Transformer-Based Method for Nuclei Segmentation, 979-8-3503-1584-4, 165, 10.1109/RIVF60135.2023.10471814
    2. João D. Nunes, Diana Montezuma, Domingos Oliveira, Tania Pereira, Jaime S. Cardoso, A survey on cell nuclei instance segmentation and classification: Leveraging context and attention, 2025, 99, 13618415, 103360, 10.1016/j.media.2024.103360
    3. MuYun Liu, XiangXi Du, JunYuan Hu, Xiao Liang, HaiJun Wang, Utilization of convolutional neural networks to analyze microscopic images for high-throughput screening of mesenchymal stem cells, 2024, 19, 2391-5412, 10.1515/biol-2022-0859
    4. Qile Zhang, Xiaoliang Jiang, Xiuqing Huang, Chun Zhou, MSR-Net: Multi-Scale Residual Network Based on Attention Mechanism for Pituitary Adenoma MRI Image Segmentation, 2024, 12, 2169-3536, 119371, 10.1109/ACCESS.2024.3449925
    5. Shangwang Liu, Peixia Wang, Yinghai Lin, Bingyan Zhou, SMRU-Net: skin disease image segmentation using channel-space separate attention with depthwise separable convolutions, 2024, 27, 1433-7541, 10.1007/s10044-024-01307-7
    6. Binh-Duong Dinh, Thanh-Thu Nguyen, Thi-Thao Tran, Van-Truong Pham, 2023, 1M parameters are enough? A lightweight CNN-based model for medical image segmentation, 979-8-3503-0067-3, 1279, 10.1109/APSIPAASC58517.2023.10317244
    7. Chenyang Lu, Guangtong Yang, Xu Qiao, Wei Chen, Qingyun Zeng, UniverDetect: Universal landmark detection method for multidomain X-ray images, 2024, 600, 09252312, 128157, 10.1016/j.neucom.2024.128157
    8. Ngoc-Tu Vu, Viet-Tien Pham, Thi-Thao Tran, Van-Truong Pham, 2023, PyBoFormer: Pyramid Selected Boundary Transformer for Polyp Segmentation, 979-8-3503-2878-3, 194, 10.1109/ICCAIS59597.2023.10382322
    9. Jiayuan Bai, Deep learning‐based intraoperative video analysis for supporting surgery, 2023, 35, 1532-0626, 10.1002/cpe.7837
    10. Niranjan Yadav, Rajeshwar Dass, Jitendra Virmani, Assessment of encoder-decoder-based segmentation models for thyroid ultrasound images, 2023, 61, 0140-0118, 2159, 10.1007/s11517-023-02849-4
    11. Wang Jiangtao, Nur Intan Raihana Ruhaiyem, Fu Panpan, A Comprehensive Review of U‐Net and Its Variants: Advances and Applications in Medical Image Segmentation, 2025, 19, 1751-9659, 10.1049/ipr2.70019
    12. Jing Sha, Xu Wang, Zhongyuan Wang, Lu Wang, DPFA‐UNet: Dual‐Path Fusion Attention for Accurate Brain Tumor Segmentation, 2025, 19, 1751-9659, 10.1049/ipr2.70084
  • Reader Comments
  • © 2011 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(3377) PDF downloads(508) Cited by(1)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog