
As an essential water quality parameter in aquaculture ponds, dissolved oxygen (DO) affects the growth and development of aquatic animals and their feeding and absorption. However, DO is easily influenced by external factors. It is not easy to make scientific and accurate predictions of DO concentration trends, especially in long-term predictions. This paper uses a one-dimensional convolutional neural network to extract the features of multidimensional input data. Bidirectional long and short-term memory neural network propagated forward and backward twice and thoroughly mined the before and after attribute relationship of each data of dissolved oxygen sequence. The attention mechanism focuses the model on the time series prediction step to improve long-term prediction accuracy. Finally, we built an integrated prediction model based on convolutional neural network (CNN), bidirectional long and short-term memory neural network (BiLSTM) and attention mechanism (AM), which is called CNN-BiLSTM-AM model. To determine the accuracy of the CNN-BiLSTM-AM model, we conducted short-term (30 minutes, one hour) and long-term (6 hours, 12 hours) experimental validation on real datasets monitored at two aquaculture farms in Yantai City, Shandong Province, China. Meanwhile, the performance was compared and visualized with support vector regression, recurrent neural network, long short-term memory neural network, CNN-LSTM model and CNN-BiLSTM model. The results show that compared with other comparative models, the proposed CNN-BiLSTM-AM model has an excellent performance in mean absolute error, root means square error, mean absolute percentage error and determination coefficient.
Citation: Wenbo Yang, Wei Liu, Qun Gao. Prediction of dissolved oxygen concentration in aquaculture based on attention mechanism and combined neural network[J]. Mathematical Biosciences and Engineering, 2023, 20(1): 998-1017. doi: 10.3934/mbe.2023046
[1] | Tuoi Vo, William Lee, Adam Peddle, Martin Meere . Modelling chemistry and biology after implantation of a drug-eluting stent. Part Ⅰ: Drug transport. Mathematical Biosciences and Engineering, 2017, 14(2): 491-509. doi: 10.3934/mbe.2017030 |
[2] | Adam Peddle, William Lee, Tuoi Vo . Modelling chemistry and biology after implantation of a drug-eluting stent. Part Ⅱ: Cell proliferation. Mathematical Biosciences and Engineering, 2018, 15(5): 1117-1135. doi: 10.3934/mbe.2018050 |
[3] | Xiaobing Feng, Tingao Jiang . Mathematical and numerical analysis for PDE systems modeling intravascular drug release from arterial stents and transport in arterial tissue. Mathematical Biosciences and Engineering, 2024, 21(4): 5634-5657. doi: 10.3934/mbe.2024248 |
[4] | Nilay Mondal, Koyel Chakravarty, D. C. Dalal . A mathematical model of drug dynamics in an electroporated tissue. Mathematical Biosciences and Engineering, 2021, 18(6): 8641-8660. doi: 10.3934/mbe.2021428 |
[5] | Nattawan Chuchalerm, Wannika Sawangtong, Benchawan Wiwatanapataphee, Thanongchai Siriapisith . Study of Non-Newtonian blood flow - heat transfer characteristics in the human coronary system with an external magnetic field. Mathematical Biosciences and Engineering, 2022, 19(9): 9550-9570. doi: 10.3934/mbe.2022444 |
[6] | Chen Pan, Xinyun Zeng, Yafeng Han, Jiping Lu . Investigation of braided stents in curved vessels in terms of "Dogbone" deformation. Mathematical Biosciences and Engineering, 2022, 19(6): 5717-5737. doi: 10.3934/mbe.2022267 |
[7] | Martina Bukač, Sunčica Čanić . Longitudinal displacement in viscoelastic arteries:A novel fluid-structure interaction computational model, and experimental validation. Mathematical Biosciences and Engineering, 2013, 10(2): 295-318. doi: 10.3934/mbe.2013.10.295 |
[8] | Saied M. Abd El-Atty, Konstantinos A. Lizos, Osama Alfarraj, Faird Shawki . Internet of Bio Nano Things-based FRET nanocommunications for eHealth. Mathematical Biosciences and Engineering, 2023, 20(5): 9246-9267. doi: 10.3934/mbe.2023405 |
[9] | Juan Palomares-Ruiz, Efrén Ruelas, Flavio Muñoz, José Castro, Angel Rodríguez . A fractional approach to 3D artery simulation under a regular pulse load. Mathematical Biosciences and Engineering, 2020, 17(3): 2516-2529. doi: 10.3934/mbe.2020138 |
[10] | DongMei Li, Rui-xue Zhang, Qian Xie, Qi Wang . Mathematical model for treatment of neonatal hyperbilirubinemia. Mathematical Biosciences and Engineering, 2021, 18(6): 8758-8782. doi: 10.3934/mbe.2021432 |
As an essential water quality parameter in aquaculture ponds, dissolved oxygen (DO) affects the growth and development of aquatic animals and their feeding and absorption. However, DO is easily influenced by external factors. It is not easy to make scientific and accurate predictions of DO concentration trends, especially in long-term predictions. This paper uses a one-dimensional convolutional neural network to extract the features of multidimensional input data. Bidirectional long and short-term memory neural network propagated forward and backward twice and thoroughly mined the before and after attribute relationship of each data of dissolved oxygen sequence. The attention mechanism focuses the model on the time series prediction step to improve long-term prediction accuracy. Finally, we built an integrated prediction model based on convolutional neural network (CNN), bidirectional long and short-term memory neural network (BiLSTM) and attention mechanism (AM), which is called CNN-BiLSTM-AM model. To determine the accuracy of the CNN-BiLSTM-AM model, we conducted short-term (30 minutes, one hour) and long-term (6 hours, 12 hours) experimental validation on real datasets monitored at two aquaculture farms in Yantai City, Shandong Province, China. Meanwhile, the performance was compared and visualized with support vector regression, recurrent neural network, long short-term memory neural network, CNN-LSTM model and CNN-BiLSTM model. The results show that compared with other comparative models, the proposed CNN-BiLSTM-AM model has an excellent performance in mean absolute error, root means square error, mean absolute percentage error and determination coefficient.
Brain tumors are abnormal cell growths located in or near brain tissue that damage the nervous system, causing symptoms such as headaches, dizziness, dementia, seizures, and other neurological signs [1]. Magnetic resonance imaging (MRI)—including T1-weighted (T1), post-contrast T1-weighted (T1CE), T2-weighted (T2), and fluid-attenuated inversion recovery (FLAIR) sequences—is a prevalent diagnostic tool for brain tumors due to its sensitivity to soft tissue and high image contrast, as shown in Figure 1. Physicians utilize MRI for lesion diagnosis, but accuracy can be hindered by factors such as fatigue and emotional state. Automated methods have garnered extensive attention in the medical field due to their capability to objectively and accurately analyze imaging information.
Most multimodal approaches assume complete data availability; however, in reality, missing modalities are common. As illustrated in Figure 2, various missing scenarios can occur during both training and inference stages. The absence of certain MRI sequences may fail to capture tumor characteristics, thereby limiting a comprehensive understanding of the tumor [2]. Therefore, it is crucial for multimodal learning methods to maintain robustness in the presence of missing modalities during inference.
Currently, a prevalent approach to tackle segmentation arising from missing modality is knowledge distillation [3,4], where information is transferred from a teacher-student network to recover missing data, but this can be computationally intensive. Another method is image synthesis [5], leveraging generative models to reconstruct the missing data. However, synthetic images may introduce noise to the task. Additionally, mapping available modalities into a common latent subspace aims to compensate for or recover the missing information [6,7,8]. However, existing approaches often require training multiple sets of parameters to address various missing modality scenarios, thereby escalating the model's complexity and computational overhead.
With the expansion of data scale and enhancement of computational resources, researchers favor general neural networks for diverse tasks, minimizing the need for task-specific model design and training. Recently, transformer [9] has shown great potential in natural language processing, visual recognition, intensive prediction. However, its complex architecture and high computational demands limit comprehensive fine-tuning for downstream tasks, especially accurate segmentation, potentially leading to overfitting and reduced generalization ability.
Inspired by recent advancements in prompt learning [10,11,12] and efficient fine-tuning techniques [13,14,15], we introduce a novel brain tumor segmentation framework, called DPONet. This framework employs an encoder-decoder structure for the segmentation network, enhancing performance in both incomplete and complete modality scenarios. Specifically, we leverage image frequency information as frequency filtering prompt (FFP) to facilitate the pre-trained model in extracting discriminative features. Furthermore, by learning a series of spatial perturbation prompt (SPP), we map these discriminative features into a common latent space, mitigating the challenges of modality fusion in the decoder. Finally, we validate the robustness of our approach on two commonly used public datasets. To sum up, our main contributions are as follows:
● We propose a new framework for incomplete-modal image segmentation that effectively handles common cases of missing modalities. This approach requires only 7% of the trainable parameters to adjust the pre-trained model, thereby avoiding the heavy fine-tuning typically necessary for transformers.
● We introduce a frequency filtering prompt to extract spatial frequency components from images. This method addresses the model's oversight of target domain features and enhances its adaptation to brain tumor datasets.
● We propose a spatial perturbation prompt that incorporates learnable parameters into a spatial modulation model. This aims to achieve consistent multimodal feature embeddings even in the presence of missing partial modalities.
Incomplete multimodal learning refers to scenarios in multimodal learning tasks where partial modality information is missing or incomplete. This issue becomes particularly prominent in brain tumor segmentation tasks, where medical imaging data is typically composed of multiple MRI sequences. The absence of one modality results in the challenge of incomplete modality information learning. Many studies [16,17,18] are devoted to solving this problem, demonstrating impressive performance in various incomplete multimodal learning tasks. Zhou et al. [16] showed that there exists a certain correlation within the latent representations of modalities, which can be utilized to describe missing modalities by calculating the correlation between modalities in a latent space. Ting et al. [17] combines available modality information to estimate the latent features of missing modalities. Liu et al. [18] explicitly considers the relationship between modalities and regions, giving different attention to different modalities for each region. However, these models require full fine-tuning of the pre-trained model, which increases the computational burden and leads to a decrease in generalization ability.
The task of most neural networks is to learn the optimal points in functions. Fourier Transform establishes the transformation relationship between the function in the spatial domain and the frequency domain, so that we can analyze a function by the frequency component to approximate the objective function more effectively [19]. The frequency of an image represents the intensity of gray change in the image. Fourier transform analyzes the features by analyzing the coefficients of each frequency component [20]. The performance of computer vision models is significantly affected by the Fourier statistical properties of the training data and show a certain sensitivity to the Fourier basis direction, and their robustness can be improved by learning this sensitivity [21]. For example, Fang et al. [22] and Xu et al. [23] argued that different parts of the same organ in MRI images exhibit regularity and that high-frequency structural information can more effectively capture these similarities and regularities.
Prompt learning is an effective transfer learning approach in natural language processing [10,24,25], which fine-tunes pre-trained models on source tasks by embedding contextual prompts. Recently, prompts have also been employed in computer vision tasks [26,27,28] and multimodal learning tasks [11,29,30], introducing self-adaptation in the input space to optimize the target task. For instance, Jia et al. [26] proposed the Pyramid Vision Transformer model (PVT), achieving downstream performance comparable to full fine-tuning by adding a small number of learnable prompt embeddings on the patch embedding. Different from the PVT model, Bahng et al. [27] further proposed a method to learn a single disturbance to adjust the pixel space and affect the model output. These studies suggest that continuously adjusting and optimizing prompts can enhance the adaptability of model. Lee et al. [29] treats different scenarios of missing modalities as different types of inputs and employs learnable prompts to guide the predictions of model under various missing conditions. Qiu et al. [30] utilizes an intermediate classifier to generate a prompt for each missing scenario based on intermediate features for segmentation prediction. The difference is that our work does not require learning a set of prompts for each missing scenario but aims to learn generic visual prompts and generalize them to modulate feature space in missing scenes.
In this paper, we focus on brain tumor segmentation under common missing modality scenarios. We simulate real-world data incompleteness by assuming absences of one or multiple modalities (Figure 2). Additionally, due to the difficulty of fully training a pre-trained transformer with limited computational resources, we design a discriminative prompt optimization network that avoids fine-tuning the entire pre-trained model. In this section, we will elaborate on the framework and its components.
The pyramid vision transformer (PVT) [31] introduces a progressive shrinking strategy within the transformer block to control the scale of feature maps for dense prediction tasks. We chose the backbone is initialized with the weights pre-trained on ImageNet. PVT comprises four stages, each consisting of a patch embedding layer and l transformer encoder layers, which generate feature maps of different scales. Given an input image X∈RH×W×C, the patch embedding layer divides the sample X into HWpi non-overlapping patches, where pi represents the patch size of the i-th layer. As the stage progresses, the patch size decreases accordingly. The flattened patches are then fed into a linear projection to obtain embedded patches. The embedded patches, along with positional embedding information, are subsequently input into the transformer encoder to produce a feature map x of size Hpi×Wpi×C.This process can be described as follows:
xl=MLP(LN(SRA(xl−1))), | (3.1) |
where xl−1 represents the feature map output from the previous layer, SRA(⋅) denotes the spatial reduction attention proposed in PVT, and LN(⋅) and MLP(⋅) refer to normalization and multi-layer perceptron operations, respectively. SRA is like multi-head attention. The formula is as follows:
SQA=Attention(QWQ,SRA(K)WK,SR(V)WV), | (3.2) |
where WQ, WK, and WV are the parameters of the linear projections. SRA(⋅) is used to reduce the spatial dimension. This can be expressed as:
SRA(x)=LN(Reshape(xi,ri)WS), | (3.3) |
The ri represents the feature map reduction rate for stage i.
The Reshape(⋅) operation reshapes the input x∈Rhi×wi×ci to hiwir2i×(r2ici)). The WS is a linear projection that reduces the dimensionality of the input. The attention calculation is as follows:
Attention(q,k,v)=Softmax(qkT√d)v, | (3.4) |
where q, k and v are the query, key, and value transform matrices, and d is the dimension.
We consider a multimodal dataset consisting of N(N=4) modalities, M= FLAIR, T1CE, T1 and T2. The dataset is denoted as D=D14,D13,…,Di,…,D0, where D14 represents the complete set of modalities, and other sets represent missing modalities subsets, such as D0=XF0,XT1c0,XT10,XT21 indicating only T2 mode is available. Xmk represents the input sample, where m denotes the modality type, and k represents the modal state. For the model, it is unaware of which specific modality is missing, therefore, we introduce placeholder values (set to 0) to assign to the missing modality data XF0,XT1c0,XT10,XT20 to ensure the format of the multimodal input.
We propose a novel discriminative prompt optimization network, as shown in Figure 3, which aims to provide natural insertion points for intermediate features of the network while preserving the integrity of the pre-trained model and enabling fine-tuning for downstream tasks. We adapt a pre-trained transformer as feature extractor and keep it frozen during training. Multimodal images D={Xkm}k=[0,1] are fed into four extractors, and task-relevant information is aggregated through discriminative prompts to fully exploiting the discriminative features. Next, a spatial perturbation prompt module is introduced, which hierarchically fuses the discriminative features of available modalities and maps them to a shared feature representation space to learn cross-modal shared information. Furthermore, the fused features are mapped back to the original input size through up-sampling in the decoder, and the resulting segmentation masks are obtained from these feature maps. Notably, during training, the trainable parameters are confined to the prompt components and the decoder.
The frequency filtering prompt method, as illustrated in Figure 4, utilizes Fourier transform to extract frequency features and jointly modulates the intermediate features with image embeddings. The frequency processing method decomposes images into different frequency components, which are distributed across different spatial locations of the image, encouraging the model to focus on critical information of the image [21]. The core idea is to remodulate the intermediate features using frequency domain information, shifting the distribution from the pre-trained dataset to the target dataset. Furthermore, since there may be commonalities between features of different modalities, even if image data from a particular modality is missing, the remaining modalities still contain corresponding frequency information, which enhances the robustness of the information to a certain extent. Taking a single branch as an example, for a given image, we apply the fast Fourier transform (FFT) along the spatial dimension to obtain frequency components corresponding to different spatial locations. FFT is applied to each channel to convert the spatial domain representation into a frequency representation in the frequency domain, and filtering operations are performed in the frequency domain. Then, an attention mask is learned in the frequency domain to analyze the dominant frequency components in the feature map. Finally, the feature representation is transformed back to the spatial domain using inverse FFT (iFFT). The transformation from the spatial domain to the frequency domain is expressed as follows:
F(x)(μ,υ)=H−1∑h=0W−1∑w=0x(h,w)e−i2π(hμH+wυW), | (3.5) |
After obtaining the frequency representation, different frequency components are modulated by filtering through the attention mechanism. Specifically, the attention mechanism compresses information across channels through convolution and a sigmoid function. The expression of the frequency filtering mechanism is as follows:
F′(x)=Fx⊗σ(conv([AvgPool(Fx),Maxpool(Fx)])), | (3.6) |
where, σdenotes the Sigmoid function, AvgPool(⋅) and MaxPool(⋅) represent the average pooling and max pooling operations respectively.
Finally, the inverse FFT is used to transform back to the spatial domain features:
x′(h,w)=1H⋅WH−1∑h=0W−1∑w=0F′(x)ei2π(hμH+wυW), | (3.7) |
Inspired by AdaptFormer [32], we employ a frequency enhancement adaptor, a bottleneck structure that limits the number of parameters. It takes the combination of filtered frequency features and image features as input and generates relevant frequency prompts through a down-projection layer, a lightweight multi-layer perceptron, and an up-projection layer. Formally, this process can be expressed as:
pif=MLPup(GELU(MLPidown(x′+x))), | (3.8) |
Thirdly, the generated prompts are appended to the transformer layers to facilitate the model in learning more representative and discriminative image features.
To enable the model to handle missing modalities, we employ null values for filling, however, such null values are likely to disturb the feature space and result in failure of modal feature fusion. Therefore, we propose learnable spatial perturbation prompts, as show in Figure 5, aiming to learn a task-specific visual prompt (P) within a latent space that encourages the sharing of cross-modal information. Prompts interact dynamically with input features, facilitating adaptive modal fusion rather than simply injecting fixed information using learning prompts.
First, the extracted discriminative features are concatenated through element-wise addition fic=[fif,fit1c,fit1,fit2] and then passed through a 3 × 3 convolutional layer followed by a Sigmoid activation function to generate prompt weights ωi∈[0,1]. These weights describe the importance of each spatial data point in the input. Inspired by EVP [27], we add random visual embeddings of the same size as the transformer tokens, train only these random embeddings in the training phase, and the trained visual prompts as the guidance for the model, denoted as Fi=(Fitoken,pim). The process can be described as:
ωi=σ(conv([fif,fit1c,fit1,fit2])), | (3.9) |
pim=conv(N∑c=1ωipic), | (3.10) |
Fi=transformer(fic+pim), | (3.11) |
where, σ is the Sigmoid function. Finally, the cross-modal information features (F) are fed into Transformer encoder block to establish cross-modal long-range dependencies.
We introduce a consistency loss to optimize the prompts to capture task-shared knowledge and transform it into representations that are beneficial for the task. Specifically, we map the feature maps obtained from the transformer encoder stages to the same size as the input image and use mean squared error ensuring that the model learns coherent and consistent information at each stage. Note that, since shallower layers may lack sufficient semantic information, we apply the consistency loss only in the last two stages of the transformer encoder.
Lm=1NN∑i=1M∑m=1(ˆfi−fmi)2, | (3.12) |
where, N is the number of samples, M is the number of decoder layers, and the rescaled features of images in transformer layer m, and their average is denoted as ˆfi=1m∑mk=1fki.
In addition, we mapped the feature map into a segmentation map, and calculated Dice loss from the ground truth to prompt the model capture consistent feature representations.
Ld=1NN∑i=1M∑m=1Dice(yi−f(xim)), | (3.13) |
where, yi denotes the ground-truth labels of the image xi, and f(xim) denotes the prediction corresponding to the m-th layer features of the image.
The feature consistency loss and prediction consistency loss are combined to supervise prompt generation.
Lc=γLm+(1−γ)Ld, | (3.14) |
where, γ is the weight parameter used to balance the two losses. We experiment with different values of γ and found that γ=0.3 gives the best result.
The convolutional decoder gradually restores the spatial resolution from the fused features to the original segmentation space. The convolutional decoder employs skip connections to merge features from different modalities at specific hierarchical levels into the encoder, to preserve more low-level details. Therefore, the overall processing steps are as follows:
Di=conv(upsample(conv(fic,Di−1))), | (3.15) |
where Di is the feature map from the i-th layer of the convolutional decoder, and fic is the combined feature from multiple encoder layers.
We employ a hybrid loss to measure the difference between the predictions and the ground truth. Dice Loss is used to calculate the similarity between the predicted segmentation result and the true segmentation result. Cross-entropy loss measures the prediction performance by quantifying the difference between the predicted probability distribution and the true probability distribution. Gradients are calculated based on the feedback of the sum of the two losses to update the parameters. The definition is as follows:
LDice=−2∑Niyif(xi)∑Niyi+∑Nif(xi), | (3.16) |
LCE=−∑Niyilogp(f(xi)), | (3.17) |
where f(xi) and yi represent the prediction and ground-truth labels, respectively. Besides, N is the number of pixels, p(⋅) is the SoftMax of the prediction. Last, our hybid loss function Lseg can be given by
Lseg=Lc+LDice+LCE, | (3.18) |
We use two public datasets from the Multimodal Brain Tumor Segmentation Challenge (BraTS) to demonstrate the effectiveness of the proposed method, BraTS 2018 and BraTS 2020 [33,34,35]. BraTS 2018 contains 285 cases of patients for training, while BraTS 2020 includes 369 cases for training and 125 for validation. In these datasets, each case comprises four MRI modalities: Flair, T1ce, T1, and T2. The volume of each modality is 240 × 240 × 155, aligned within the same spatial space. Medical experts provide manual pixel-level annotations of three mutually inclusive tumor regions in each image, namely, whole tumor (WT), tumor core (TC), and enhancing tumor (ET). WT encompasses all tumor tissues, while TC comprises ET, necrosis, and non-enhancing tumor core.
Data preprocessing is performed on the two datasets before training. For each dataset, we slice along the axial plane of the 3D medical images. To eliminate non-informative slices and irrelevant background regions, thereby saving training efficiency and time, we use central slices as the training data and reshape each 2D slice to 224 × 224. We design a simulation method for missing modalities. The MRI modalities are randomly removed from the input. The missing modality can be any one or multiple modalities, and the missing rate for each modality is random. The purpose of this is to simulate the scenario where missing modalities may occur in real-world situations.
In this study, our method is implemented in Pytorch utilizing a single NVIDIA Tesla V100 32 GB GPU. We adopt the U-Net architecture composed of transformer blocks as the benchmark, and the transformer is pre-trained on ImageNet-1K. We utilize the SGD optimizer with an initial learning rate of 0.01. After many experiments and parameter tuning, we set our model to train 100 epochs with an initial learning rate of 1e−2 and a batch size of 12. For the segmentation task, we use the Dice coefficient (which computes the similarity of two sets), the Hausdorff distance (HD95, which measures the distance between two sets), and the sensitivity (the ratio of the number of positive samples correctly identified by the model to the number of all true positive samples) as performance metrics to evaluate various methods.
We focus on exploring the robustness of discriminative optimization networks to general incompleteness in multimodal image without fine-tuning the entire pretraind model. In this chapter, we first introduce the excellent results obtained by our method. Subsequently, a series of ablation experiments on the proposed components. Considering that the BraTS 2020 dataset contains many patient cases and is representative, we experimented with it in the ablation study.
As shown in Table 1, our method achieves remarkable performance in Dice score on both the modality-complete and modality-missing scenarios. For example, our proposed approach has significantly better mean Dice scores for whole tumors, tumor cores, and enhanced tumors than suboptimal approaches. From the experimental results in Table 2, we observed that the baseline model generally exhibited unsatisfactory performance on the T1 modality. However, our model achieved significant improvements in this aspect, effectively enhancing the performance under the T1 modality. In Figures 6 and 7, we present the visualization of segmentation results. Furthermore, Table 3 clearly exhibits that our method outperforms other approaches in terms of HD95 and sensitivity under complete modality testing, further validating the superior performance of our approach.
Modalities | Dice (%) ↑ | |||||||||||||||||
Complete | Core | Enhancing | ||||||||||||||||
F | T1 | T1c | T2 | D | Z | T | Q | Our | D | Z | T | Q | Our | D | Z | T | Q | Our |
✓ | 86.1 | 86.1 | 86.5 | 86.7 | 93.9 | 71.0 | 70.9 | 71.5 | 71.0 | 93.3 | 46.3 | 46.3 | 45.6 | 47.2 | 76.1 | |||
✓ | 76.8 | 78.5 | 77.4 | 79.5 | 91.6 | 81.5 | 84.0 | 83.4 | 84.3 | 95.3 | 74.9 | 80.1 | 78.9 | 81.4 | 88.4 | |||
✓ | 77.2 | 78.0 | 78.1 | 79.5 | 89.1 | 66.0 | 65.9 | 66.8 | 67.7 | 91.9 | 37.3 | 38.0 | 41.3 | 39.1 | 71.6 | |||
✓ | 87.3 | 87.4 | 89.1 | 86.9 | 95.2 | 69.2 | 68.8 | 69.3 | 69.9 | 93.5 | 38.2 | 42.4 | 43.6 | 42.8 | 74.6 | |||
✓ | ✓ | 87.7 | 87.8 | 88.4 | 88.4 | 94.5 | 83.5 | 84.8 | 86.4 | 86.3 | 95.8 | 75.9 | 79.4 | 81.7 | 80.1 | 88.9 | ||
✓ | ✓ | 81.1 | 81.8 | 81.2 | 83.1 | 92.1 | 83.4 | 83.6 | 85.2 | 85.8 | 95.4 | 78.0 | 80.1 | 79.2 | 81.7 | 88.3 | ||
✓ | ✓ | 89.7 | 89.8 | 89.9 | 89.8 | 95.5 | 73.1 | 73.8 | 73.9 | 74.4 | 94.3 | 41.0 | 45.9 | 48.2 | 46.8 | 77.3 | ||
✓ | ✓ | 87.7 | 87.8 | 88.0 | 87.9 | 94.4 | 73.1 | 73.4 | 73.3 | 72.9 | 94.1 | 45.7 | 46.8 | 50.1 | 47.3 | 77.5 | ||
✓ | ✓ | 89.9 | 89.9 | 90.5 | 90.1 | 95.5 | 74.1 | 74.6 | 75.5 | 74.5 | 94.1 | 49.3 | 48.6 | 48.6 | 49.5 | 76.6 | ||
✓ | ✓ | 89.9 | 89.3 | 90.0 | 90.0 | 95.6 | 84.7 | 84.8 | 85.5 | 86.6 | 95.9 | 76.7 | 81.9 | 81.8 | 81.2 | 88.9 | ||
✓ | ✓ | ✓ | 90.7 | 90.1 | 90.7 | 90.6 | 95.6 | 85.1 | 85.2 | 86.5 | 86.7 | 95.8 | 76.8 | 82.1 | 81.8 | 81.8 | 88.8 | |
✓ | ✓ | ✓ | 90.6 | 90.6 | 90.3 | 90.6 | 95.7 | 75.2 | 75.6 | 75.9 | 75.8 | 94.7 | 49.9 | 50.3 | 52.5 | 51.1 | 78.0 | |
✓ | ✓ | ✓ | 90.7 | 90.4 | 90.6 | 90.8 | 95.8 | 85.0 | 85.3 | 86.4 | 86.4 | 96.0 | 77.1 | 78.7 | 81.0 | 80.0 | 88.9 | |
✓ | ✓ | ✓ | 88.3 | 88.2 | 88.7 | 88.9 | 94.6 | 83.5 | 84.2 | 86.5 | 86.5 | 95.8 | 77.0 | 79.3 | 78.5 | 82.1 | 88.9 | |
✓ | ✓ | ✓ | ✓ | 91.1 | 90.6 | 90.6 | 91.0 | 95.9 | 85.2 | 84.6 | 87.4 | 86.4 | 95.9 | 78.0 | 79.9 | 81.6 | 81.0 | 88.9 |
Average | 87.0 | 87.1 | 87.3 | 87.6 | 94.3 | 78.2 | 78.6 | 79.6 | 79.7 | 94.8 | 61.5 | 64.0 | 64.9 | 64.9 | 82.8 |
Modalities | Dice (%) ↑ | |||||||||||||||||
Complete | Core | Enhancing | ||||||||||||||||
F | T1 | T1c | T2 | Z | Y | T | L | Our | Z | Y | T | L | Our | Z | Y | T | L | Our |
✓ | 81.2 | 76.3 | 86.6 | 84.8 | 94.3 | 64.2 | 56.7 | 68.8 | 69.4 | 94.4 | 43.1 | 16.0 | 41.4 | 47.6 | 76.2 | |||
✓ | 72.2 | 42.8 | 77.8 | 75.8 | 92.6 | 75.4 | 65.1 | 81.5 | 82.9 | 95.4 | 72.6 | 66.3 | 75.7 | 73.7 | 89.2 | |||
✓ | 67.5 | 15.5 | 78.7 | 74.4 | 90.9 | 56.6 | 16.8 | 65.6 | 66.1 | 93.2 | 32.5 | 8.1 | 44.5 | 37.1 | 74.7 | |||
✓ | 86.1 | 84.2 | 88.4 | 88.7 | 95.2 | 61.2 | 47.3 | 66.7 | 66.4 | 94.2 | 39.3 | 8.1 | 40.5 | 35.6 | 74.8 | |||
✓ | ✓ | 83.0 | 84.1 | 88.2 | 86.3 | 95.0 | 78.6 | 80.3 | 84.8 | 84.2 | 96.1 | 74.5 | 68.7 | 77.7 | 75.3 | 90.0 | ||
✓ | ✓ | 74.4 | 62.1 | 81.8 | 77.2 | 93.1 | 78.6 | 78.2 | 83.5 | 83.4 | 95.7 | 74.0 | 70.7 | 77.1 | 74.7 | 89.5 | ||
✓ | ✓ | 87.1 | 87.3 | 89.7 | 89.0 | 95.6 | 65.9 | 61.6 | 72.0 | 70.8 | 95.2 | 43.0 | 9.5 | 44.4 | 41.2 | 77.9 | ||
✓ | ✓ | 82.2 | 84.2 | 88.4 | 88.7 | 94.9 | 61.2 | 47.3 | 66.7 | 66.4 | 95.1 | 45.0 | 16.5 | 47.7 | 48.7 | 77.7 | ||
✓ | ✓ | 87.6 | 87.9 | 90.3 | 89.9 | 95.9 | 69.8 | 62.6 | 71.8 | 70.9 | 95.1 | 47.5 | 17.4 | 48.3 | 45.4 | 78.1 | ||
✓ | ✓ | 87.1 | 87.5 | 89.5 | 89.7 | 95.6 | 77.9 | 80.8 | 84.8 | 84.4 | 96.1 | 75.1 | 64.8 | 76.8 | 75.0 | 90.0 | ||
✓ | ✓ | ✓ | 87.3 | 87.7 | 90.4 | 88.9 | 95.7 | 79.8 | 80.9 | 85.2 | 84.1 | 96.2 | 75.5 | 65.7 | 77.4 | 74.0 | 90.0 | |
✓ | ✓ | ✓ | 87.8 | 88.4 | 89.7 | 89.9 | 96.0 | 71.5 | 63.7 | 74.1 | 72.7 | 95.5 | 47.7 | 19.4 | 50.0 | 44.8 | 78.7 | |
✓ | ✓ | ✓ | 88.1 | 88.8 | 90.6 | 90.4 | 96.0 | 79.6 | 80.7 | 85.8 | 84.6 | 96.3 | 75.7 | 66.4 | 76.6 | 73.8 | 90.1 | |
✓ | ✓ | ✓ | 82.7 | 80.9 | 88.4 | 86.1 | 95.1 | 80.4 | 79.0 | 85.8 | 84.4 | 96.2 | 74.8 | 68.3 | 78.5 | 75.4 | 90.1 | |
✓ | ✓ | ✓ | ✓ | 89.6 | 88.8 | 90.6 | 90.1 | 96.1 | 85.8 | 80.1 | 85.9 | 84.5 | 96.3 | 77.6 | 68.4 | 80.4 | 75.5 | 90.0 |
Average | 82.9 | 76.4 | 87.3 | 86.0 | 94.8 | 72.4 | 65.4 | 77.5 | 77.0 | 95.4 | 59.9 | 42.3 | 62.5 | 59.9 | 83.8 |
Method | Dice ↑ | HD95 ↓ | Sensitivity ↑ | |||||||||
WT | TC | ET | Avg | WT | TC | ET | Avg | WT | TC | ET | Avg | |
Ding et al. | 86.13 | 71.93 | 58.98 | 72.35 | - | - | - | - | - | - | - | - |
Zhang et al. | 87.08 | 78.69 | 64.08 | 76.62 | 2.90 | 6.21 | 44.64 | 17.92 | 99.60 | 99.81 | 99.82 | 99.74 |
Ting et al. | 90.71 | 84.60 | 79.07 | 84.79 | 4.05 | 5.78 | 33.77 | 14.53 | 90.98 | 83.90 | 77.68 | 84.18 |
Qiu et al. | 87.58 | 79.67 | 64.87 | 77.37 | 2.82 | 5.71 | 43.92 | 17.48 | 99.66 | 99.83 | 99.81 | 99.77 |
baseline(fine-tune) | 77.63 | 78.94 | 70.85 | 93.56 | 2.61 | 2.09 | 2.39 | 2.36 | 86.28 | 86.50 | 82.74 | 85.17 |
baseline(frozen) | 58.11 | 61.09 | 40.88 | 89.16 | 2.83 | 2.29 | 2.97 | 2.70 | 81.41 | 84.68 | 85.90 | 84.00 |
our | 94.96 | 94.12 | 89.98 | 93.02 | 2.58 | 2.09 | 2.21 | 2.29 | 96.81 | 96.32 | 93.01 | 95.38 |
We further conducted experiments to analyze the robustness of our proposed method to varying missing modality rates between the training and testing phases. As shown in Figure 8(a), we trained the model with a 70% missing rate and randomly removed multiple modalities to simulate modality missing scenarios for testing. We found that, compared to the baseline, our DPONet method was robust to different missing rates during testing. Moreover, in Figure 8(b), where we used 10%, 70%, and 90% to represent the degree of missingness during training (through many experiments, we found that these missing rates are representative), we observed that when training with more complete modality data, the performance was significantly higher when testing with low missing rates. In this paper, the experiments based on the general reality that collecting complete modality data cannot be guaranteed. However, there are still some publicly available datasets with complete modalities. Therefore, we trained the models using complete data, as shown in Figure 8(c), where the baseline model could not handle data missing, our method consistently improved upon the baseline.
We explored the effects of frequency filtering prompts and spatial perturbation prompts, the results showing in Table 4, our method achieved a higher Dice score of 93.02. The term baseline (fine-tune) refers to a pre-trained transformer that is comprehensively fine-tuned on the BraTS dataset. The term baseline (frozen) refers to a baseline model where the pre-trained backbone parameters are frozen.
Method | Dice ↑ | HD95 ↓ | Sensitivity ↑ | |||||||||
WT | TC | ET | Avg | WT | TC | ET | Avg | WT | TC | ET | Avg | |
baseline (fine-tune) | 77.63 | 78.94 | 70.85 | 75.81 | 2.61 | 2.09 | 2.39 | 2.36 | 86.28 | 86.50 | 82.74 | 85.17 |
baseline (frozen) | 58.11 | 61.09 | 40.88 | 53.36 | 2.83 | 2.29 | 2.97 | 2.70 | 81.41 | 84.68 | 85.90 | 84.00 |
baseline + FFP | 93.65 | 92.40 | 85.08 | 90.38 | 2.45 | 2.04 | 2.16 | 2.22 | 96.54 | 96.11 | 91.26 | 94.64 |
baseline + SPP | 94.56 | 94.40 | 87.37 | 92.11 | 2.47 | 2.05 | 2.22 | 2.25 | 96.59 | 96.07 | 90.53 | 94.40 |
baseline + FFP + SPP | 94.96 | 94.12 | 89.98 | 93.02 | 2.58 | 2.09 | 2.21 | 2.29 | 96.81 | 96.32 | 93.01 | 95.38 |
We introduced frequency filtering prompts into the baseline model, the model achieved comparable performance to fine-tuned model, demonstrating the efficiency of proposed component. Furthermore, as shown in Figure 9, during training with complete modalities, when a significant portion of modalities were absent during inference (i.e., retaining only one modality), the baseline model suffered a severe performance degradation. Excitingly, when prompts were introduced, the model was able to perform image segmentation normally even with a single modality input, indicating that the proposed visual prompts facilitated the encoder to learn discriminative features across modalities.
We introduced the spatial perturbation prompts module into the baseline, the overall robustness of the model was improved. As shown in Table 4, our method achieved a higher Dice score of 93.02, exceeding the baseline model by 17.21. Furthermore, the Dice score for the ET region saw a significant increase, indicating that the spatial perturbation prompt facilitated the fusion of inter-modal information and preserved more edge details and small-scale information. Figure 10 visualizes the segmentation results before and after using the spatial perturbation prompt, clearly demonstrating that more small-scale lesion areas are preserved.
Additionally, in Table 5, we described the parameter information before and after adding the module. It indicates that our method only introduced approximately 7% of the total trainable parameters but achieved excellent segmentation performance. Once extended to large models with billions of parameters, our proposed method will be more favorable and suitable for multimodal downstream tasks with missing modalities, achieving a favorable trade-off between computational cost and performance.
Method | Param (M) | Tunable Param (M) |
baseline (fine-tune) | 194.82 | 194.82 |
baseline (frozen) | 194.82 | 49.30 |
baseline + FFP | 160.42 | 58.97 |
baseline + SPP | 173.93 | 48.69 |
baseline + FFP + SPP | 153.43 | 10.58 |
In this paper, we introduce a parameter-efficient and discriminatively optimized segmentation network that exhibits robust adaptability to generalized missing modality inputs. Our model filters frequency features to generate discriminative visual cues and introduces learnable spatial perturbation prompts into shared feature representations, effectively addressing the challenge of incomplete multimodal brain tumor segmentation. Compared to fine-tuning the entire transformer model, our approach requires only 7% of the trainable parameters while demonstrating superior performance in handling real-world scenarios with missing modality data. Extensive experiments and ablation studies on the publicly available BraTS2018 and BraTS2020 datasets validate the effectiveness of our proposed method.
In this work, we investigate a parametrically efficient incomplete modal image segmentation method for brain tumors. Although our model successfully captures consistent features by mapping robust multimodal features to the same potential space, we must point out that our model cannot recover information about missing modalites from available multimodal inputs. Therefore, our next plan will study how to use the available multimodal image to estimate the missing modal information to obtain rich image information.
This work is supported by the National Nature Science Foundation of China (No.U24A20231, No.62272283), New Twentieth Items of Universities in Jinan (No.2021GXRC049).
All authors declare no conflicts of interest in this paper.
[1] | Agriculture Organization of the United Nations, Fisheries Department, The State of World Fisheries and Aquaculture, Food & Agriculture Org, 2000. |
[2] |
X. Li, J. Li, Y. Wang, L. Fu, Y. Fu, B. Li, et al., Aquaculture industry in china: Current state, challenges, and outlook, Rev. Fish. Sci., 19 (2011), 187–200. https://doi.org/10.1080/10641262.2011.573597 doi: 10.1080/10641262.2011.573597
![]() |
[3] |
J. Huan, H. Li, M. Li, B. Chen, Prediction of dissolved oxygen in aquaculture based on gradient boosting decision tree and long short-term memory network: A study of Chang Zhou fishery demonstration base, China, Comput. Electron. Agric., 175 (2020), 105530. https://doi.org/10.1016/j.compag.2020.105530 doi: 10.1016/j.compag.2020.105530
![]() |
[4] |
S. Midilli, D. Çoban, M. Güler, S. Küçük, Gas bubble disease in nile tilapia and hybrid red tilapia (cichlidae, oreochromis spp.) under culture conditions, J. Fish. Aquat. Sci., 36 (2019), 285–291. http://dx.doi.org/10.12714/egejfas.2019.36.3.09 doi: 10.12714/egejfas.2019.36.3.09
![]() |
[5] |
C. E. Boyd, E. L. Torrans, C. S. Tucker, Dissolved oxygen and aeration in ictalurid catfish aquaculture, J. World Aquacult. Soc., 49 (2018), 7–70. https://doi.org/10.1111/jwas.12469 doi: 10.1111/jwas.12469
![]() |
[6] |
A. Sentas, L. Karamoutsou, N. Charizopoulos, T. Psilovikos, A. Psilovikos, A. Loukas, The use of stochastic models for short-term prediction of water parameters of the thesaurus dam, River Nestos, Greece, Proceedings, 2 (2018), 634. https://doi.org/10.3390/proceedings2110634 doi: 10.3390/proceedings2110634
![]() |
[7] |
M. Valera, R. K. Walter, B. A. Bailey, J. E. Castillo, Machine learning based predictions of dissolved oxygen in a small coastal embayment, J. Mar. Sci. Eng., 8 (2020), 1007. https://doi.org/10.3390/jmse8121007 doi: 10.3390/jmse8121007
![]() |
[8] |
C. Xu, X. Chen, L. Zhang, Predicting river dissolved oxygen time series based on stand-alone models and hybrid wavelet-based models, J. Environ. Manage., 295 (2021), 113085. https://doi.org/10.1016/j.jenvman.2021.113085 doi: 10.1016/j.jenvman.2021.113085
![]() |
[9] |
A. Sorjamaa, J. Hao, N. Reyhani, Y. Ji, A. Lendasse, Methodology for long-term prediction of time series, Neurocomputing, 70 (2007), 2861–2869. https://doi.org/10.1016/j.neucom.2006.06.015 doi: 10.1016/j.neucom.2006.06.015
![]() |
[10] |
M. Längkvist, L. Karlsson, A. Loutfi, A review of unsupervised feature learning and deep learning for time-series modeling, Pattern Recognit. Lett., 42 (2014), 11–24. https://doi.org/10.1016/j.patrec.2014.01.008 doi: 10.1016/j.patrec.2014.01.008
![]() |
[11] |
L. Karamoutsou, A. Psilovikos, Deep learning in water resources management: The case study of Kastoria lake in Greece, Water, 13 (2021), 3364. https://doi.org/10.3390/w13233364 doi: 10.3390/w13233364
![]() |
[12] | Q. Ye, X. Yang, C. Chen, J. Wang, River water quality parameters prediction method based on LSTM-RNN model, in 2019 Chinese Control And Decision Conference (CCDC), (2019), 3024–3028. https://doi.org/10.1109/CCDC.2019.8832885 |
[13] |
J. Yan, Y. Gao, Y. Yu, H. Xu, Z. Xu, A prediction model based on deep belief network and least squares svr applied to cross-section water quality, Water, 12 (2020), 1929. https://doi.org/10.3390/w12071929 doi: 10.3390/w12071929
![]() |
[14] |
L. Sheng, J. Zhou, X. Li, Y. Pan, L. Liu, Water quality prediction method based on preferred classification, IET Cyber-Phys. Syst. Theory Appl., 5 (2020), 176–180. https://doi.org/10.1049/iet-cps.2019.0062 doi: 10.1049/iet-cps.2019.0062
![]() |
[15] |
J. Wu, Z. Wang, A hybrid model for water quality prediction based on an artificial neural network, wavelet transform, and long short-term memory, Water, 14 (2022), 610. https://doi.org/10.3390/w14040610 doi: 10.3390/w14040610
![]() |
[16] |
B. Lim, S. Zohren, Time-series forecasting with deep learning: A survey, Philos. Trans. A Math. Phys. Eng. Sci., 379 (2021), 20200209. https://doi.org/10.1098/rsta.2020.0209 doi: 10.1098/rsta.2020.0209
![]() |
[17] |
H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, P. A. Muller, Deep learning for time series classification: A review, Data Min. Knowl. Discovery, 33 (2019), 917–963. https://doi.org/10.1007/s10618-019-00619-1 doi: 10.1007/s10618-019-00619-1
![]() |
[18] |
P. Shi, G. Li, Y. Yuan, G. Huang, L. Kuang, Prediction of dissolved oxygen content in aquaculture using clustering-based softplus extreme learning machine, Comput. Electron. Agric., 157 (2019), 329–338. https://doi.org/10.1016/j.compag.2019.01.004 doi: 10.1016/j.compag.2019.01.004
![]() |
[19] |
W. Li, H. Wu, N. Zhu, Y. Jiang, J. Tan, Y. Guo, Prediction of dissolved oxygen in a fishery pond based on gated recurrent unit (GRU), Inf. Process. Agric., 8 (2021), 185–193. https://doi.org/10.1016/j.inpa.2020.02.002 doi: 10.1016/j.inpa.2020.02.002
![]() |
[20] |
C. Li, Z. Li, J. Wu, L. Zhu, J. Yue, A hybrid model for dissolved oxygen prediction in aquaculture based on multi-scale features, Inf. Process. Agric., 5 (2018), 11–20. https://doi.org/10.1016/j.inpa.2017.11.002 doi: 10.1016/j.inpa.2017.11.002
![]() |
[21] | Q. Ren, X. Wang, W. Li, Y. Wei, D. An, Research of dissolved oxygen prediction in recirculating aquaculture systems based on deep belief network, Aquacult. Eng., 90 (2020), 102085. |
[22] |
J. Huang, S. Liu, S. G. Hassan, L. Xu, C. Huang, A hybrid model for short-term dissolved oxygen content prediction, Comput. Electron. Agric., 186 (2021), 106216. https://doi.org/10.1016/j.compag.2021.106216 doi: 10.1016/j.compag.2021.106216
![]() |
[23] |
H. Liu, R. Yang, Z. Duan, Wind speed forecasting using a new multi-factor fusion and multi-resolution ensemble model with real-time decomposition and adaptive error correction, Energy Convers. Manage., 217 (2020), 112995. https://doi.org/10.1016/j.enconman.2020.112995 doi: 10.1016/j.enconman.2020.112995
![]() |
[24] | J. Bi, Y. Lin, Q. Dong, H. Yuan, M. Zhou, An improved attention-based lstm for multi-step dissolved oxygen prediction in water environment, in 2020 IEEE International Conference on Networking, Sensing and Control (ICNSC), (2020), 1–6. https://doi.org/10.1109/ICNSC48988.2020.9238097 |
[25] |
Y. Liu, Q. Zhang, L. Song, Y. Chen, Attention-based recurrent neural networks for accurate short-term and long-term dissolved oxygen prediction, Comput. Electron. Agric., 165 (2019), 104964. https://doi.org/10.1016/j.compag.2019.104964 doi: 10.1016/j.compag.2019.104964
![]() |
[26] |
X. Yang, B. Liu, Uncertain time series analysis with imprecise observations, Fuzzy Optim. Decis. Making, 18 (2018), 263–278. https://doi.org/10.1007/s10700-018-9298-z doi: 10.1007/s10700-018-9298-z
![]() |
[27] |
T. Wang, M. Wang, Communication network time series prediction algorithm based on big data method, Wireless Pers. Commun., 102 (2017), 1041–1056. https://doi.org/10.1007/s11277-017-5138-7 doi: 10.1007/s11277-017-5138-7
![]() |
[28] |
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, et al., Backpropagation applied to handwritten zip code recognition, Neural Comput., 1 (1989), 541–551. https://doi.org/10.1162/neco.1989.1.4.541 doi: 10.1162/neco.1989.1.4.541
![]() |
[29] | Á. Zarándy, C. Rekeczky, P. Szolgay, L. O. Chua, Overview of cnn research: 25 years history and the current trends, in 2015 IEEE International Symposium on Circuits and Systems (ISCAS), (2015), 401–404. https://doi.org/10.1109/ISCAS.2015.7168655 |
[30] |
L. Alzubaidi, J. Zhang, A. J. Humaidi, A. Al-Dujaili, Y. Duan, O. Al-Shamma, et al., Review of deep learning: concepts, cnn architectures, challenges, applications, future directions, J. Big Data, 8 (2021), 53. https://doi.org/10.1186/s40537-021-00444-8 doi: 10.1186/s40537-021-00444-8
![]() |
[31] | M. Sahu, R. Dash, A survey on deep learning: convolution neural network (CNN), in Intelligent and Cloud Computing, Springer, (2021), 317–325. |
[32] |
G. Ortac, G. Ozcan, Comparative study of hyperspectral image classification by multidimensional convolutional neural network approaches to improve accuracy, Expert Syst. Appl., 182 (2021), 115280. https://doi.org/10.1016/j.eswa.2021.115280 doi: 10.1016/j.eswa.2021.115280
![]() |
[33] |
X. Song, Y. Liu, L. Xue, J. Wang, J. Zhang, J. Wang, et al., Time-series well performance prediction based on long short-term memory (LSTM) neural network model, J. Pet. Sci. Eng., 186 (2020), 106682. https://doi.org/10.1016/j.petrol.2019.106682 doi: 10.1016/j.petrol.2019.106682
![]() |
[34] |
S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput., 9 (1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 doi: 10.1162/neco.1997.9.8.1735
![]() |
[35] |
R. Yu, J. Gao, M. Yu, W. Lu, T. Xu, M. Zhao, et al., LSTM-EFG for wind power forecasting based on sequential correlation features, Future Gener. Comput. Syst., 93 (2019), 33–42. https://doi.org/10.1016/j.future.2018.09.054 doi: 10.1016/j.future.2018.09.054
![]() |
[36] |
X. Wu, J. Li, Y. Jin, S. Zheng, Modeling and analysis of tool wear prediction based on SVD and BiLSTM, Int. J. Adv. Manuf. Technol., 106 (2020), 4391–4399. https://doi.org/10.1007/s00170-019-04916-3 doi: 10.1007/s00170-019-04916-3
![]() |
[37] |
M. Schuster, K. K. Paliwal, Bidirectional recurrent neural networks, IEEE Trans. Signal Process., 45 (1997), 2673–2681. https://doi.org/10.1109/78.650093 doi: 10.1109/78.650093
![]() |
[38] | S. Siami-Namini, N. Tavakoli, A. S. Namin, The performance of LSTM and BiLSTM in forecasting time series, in 2019 IEEE International Conference on Big Data (Big Data), (2019), 3285–3292. https://doi.org/10.1109/BigData47090.2019.9005997 |
[39] |
Z. Niu, G. Zhong, H. Yu, A review on the attention mechanism of deep learning, Neurocomputing, 452 (2021), 48–62. https://doi.org/10.1016/j.neucom.2021.03.091 doi: 10.1016/j.neucom.2021.03.091
![]() |
[40] |
A. De Santana Correia, E. L. Colombini, Attention, please! A survey of neural attention models in deep learning, Artif. Intell. Rev., 2022 (2022), 1–88. https://doi.org/10.1007/s10462-022-10148-x doi: 10.1007/s10462-022-10148-x
![]() |
1. | Xiaoxiang Zhu, Daniel W. Pack, Richard D. Braatz, Modelling intravascular delivery from drug-eluting stents with biodurable coating: investigation of anisotropic vascular drug diffusivity and arterial drug distribution, 2014, 17, 1025-5842, 187, 10.1080/10255842.2012.672815 | |
2. | Akash Pradip Mandal, Prashanta Kumar Mandal, Computational Modelling of Three-phase Stent-based Delivery, 2017, 2, 2572-5505, 31, 10.14218/JERP.2017.00001 | |
3. | Ramprosad Saha, Prashanta Kumar Mandal, Effect of flow pulsatility and time-dependent release kinetics on stent-based delivery through atherosclerotic plaque, 2018, 6, 2195-268X, 1, 10.1007/s40435-016-0289-z | |
4. | Stefano Morlacchi, Francesco Migliavacca, Modeling Stented Coronary Arteries: Where We are, Where to Go, 2013, 41, 0090-6964, 1428, 10.1007/s10439-012-0681-6 | |
5. | Alessio Milocco, Nicola Scuor, Vanni Lughi, Gaetano Lamberti, Anna Angela Barba, Rosario Divittorio, Gabriele Grassi, Andrea Perkan, Mario Grassi, Michela Abrami, Thermal gelation modeling of a pluronic‐alginate blend following coronary angioplasty, 2020, 137, 0021-8995, 48539, 10.1002/app.48539 | |
6. | Sean McGinty, A decade of modelling drug release from arterial stents, 2014, 257, 00255564, 80, 10.1016/j.mbs.2014.06.016 | |
7. | Michele d’Errico, Paolo Sammarco, Giuseppe Vairo, Analytical modeling of drug dynamics induced by eluting stents in the coronary multi-layered curved domain, 2015, 267, 00255564, 79, 10.1016/j.mbs.2015.06.016 | |
8. | Giuseppe Vairo, Margherita Cioffi, Riccardo Cottone, Gabriele Dubini, Francesco Migliavacca, Drug release from coronary eluting stents: A multidomain approach, 2010, 43, 00219290, 1580, 10.1016/j.jbiomech.2010.01.033 | |
9. | Giuseppe Pontrelli, Filippo de Monte, A multi-layer porous wall model for coronary drug-eluting stents, 2010, 53, 00179310, 3629, 10.1016/j.ijheatmasstransfer.2010.03.031 | |
10. | Elena Cutrì, Paolo Zunino, Stefano Morlacchi, Claudio Chiastra, Francesco Migliavacca, Drug delivery patterns for different stenting techniques in coronary bifurcations: a comparative computational study, 2013, 12, 1617-7959, 657, 10.1007/s10237-012-0432-5 | |
11. | Paolo Zunino, Josip Tambača, Elena Cutrì, Suncica Čanić, Luca Formaggia, Francesco Migliavacca, Integrated Stent Models Based on Dimension Reduction: Review and Future Perspectives, 2016, 44, 0090-6964, 604, 10.1007/s10439-015-1459-4 | |
12. | Akash Pradip Mandal, Prashanta Kumar Mandal, Distribution and retention of drug through an idealised atherosclerotic plaque eluted from a half-embedded stent, 2018, 6, 2195-268X, 1183, 10.1007/s40435-017-0372-0 | |
13. | J. A. Ferreira, J. Naghipoor, Paula de Oliveira, A coupled non-Fickian model of a cardiovascular drug delivery system, 2016, 33, 1477-8599, 329, 10.1093/imammb/dqv023 | |
14. | Ramprosad Saha, Prashanta Kumar Mandal, Effects of Interstrut Distance on the Transport of Drug and its Retention in the Arterial Tissue, 2017, 3, 2349-5103, 2039, 10.1007/s40819-016-0223-2 | |
15. | Sean McGinty, Sean McKee, Roger M. Wadsworth, Christopher McCormick, Modeling Arterial Wall Drug Concentrations Following the Insertion of a Drug-Eluting Stent, 2013, 73, 0036-1399, 2004, 10.1137/12089065X | |
16. | Shaked Regev, Oded Farago, Application of underdamped Langevin dynamics simulations for the study of diffusion from a drug-eluting stent, 2018, 507, 03784371, 231, 10.1016/j.physa.2018.05.082 | |
17. | Mario Grassi, Gaetano Lamberti, Sara Cascone, Gabriele Grassi, Mathematical modeling of simultaneous drug release and in vivo absorption, 2011, 418, 03785173, 130, 10.1016/j.ijpharm.2010.12.044 | |
18. | James M. Weiler, Eph M. Sparrow, Reza Ramazani, Mass transfer by advection and diffusion from a drug-eluting stent, 2011, 00179310, 10.1016/j.ijheatmasstransfer.2011.07.020 | |
19. | J.A. Ferreira, J. Naghipoor, Paula de Oliveira, Analytical and numerical study of a coupled cardiovascular drug delivery model, 2015, 275, 03770427, 433, 10.1016/j.cam.2014.04.021 | |
20. | Filippo de Monte, Giuseppe Pontrelli, Sid Becker, 2013, 9780124158245, 59, 10.1016/B978-0-12-415824-5.00003-5 | |
21. | Akash Pradip Mandal, Prashanta Kumar Mandal, Two-phase binding of drug in a three-layered arterial wall following stent implantation, 2022, 97, 0031-8949, 045004, 10.1088/1402-4896/ac58cc | |
22. | Michele Marino, Giuseppe Vairo, Peter Wriggers, Mechano-chemo-biological Computational Models for Arteries in Health, Disease and Healing: From Tissue Remodelling to Drug-eluting Devices, 2021, 27, 13816128, 1904, 10.2174/1381612826666200728145752 | |
23. | Jianfei Song, Smaine Kouidri, Farid Bakir, Review on the numerical investigations of mass transfer from drug eluting stent, 2021, 41, 02085216, 1057, 10.1016/j.bbe.2021.06.010 | |
24. | Farhad Rikhtegar Nezami, Lambros S. Athanasiou, Elazer R. Edelman, 2021, 9780128171950, 595, 10.1016/B978-0-12-817195-0.00028-7 | |
25. | Akash Pradip Mandal, Prashanta Kumar Mandal, Specific and nonspecific binding of drug eluted from a half-embedded stent in presence of atherosclerotic plaque, 2022, 25, 1025-5842, 922, 10.1080/10255842.2021.1986813 | |
26. | D. A. Klyushin, S. I. Lyashko, N. I. Lyashko, O. S. Bondar, A. A. Tymoshenko, An Analog of the Galerkin Method in Problems of Drug Delivery in Biological Tissues, 2021, 57, 1060-0396, 354, 10.1007/s10559-021-00360-y | |
27. | Xiaobing Feng, Tingao Jiang, Mathematical and numerical analysis for PDE systems modeling intravascular drug release from arterial stents and transport in arterial tissue, 2024, 21, 1551-0018, 5634, 10.3934/mbe.2024248 |
Modalities | Dice (%) ↑ | |||||||||||||||||
Complete | Core | Enhancing | ||||||||||||||||
F | T1 | T1c | T2 | D | Z | T | Q | Our | D | Z | T | Q | Our | D | Z | T | Q | Our |
✓ | 86.1 | 86.1 | 86.5 | 86.7 | 93.9 | 71.0 | 70.9 | 71.5 | 71.0 | 93.3 | 46.3 | 46.3 | 45.6 | 47.2 | 76.1 | |||
✓ | 76.8 | 78.5 | 77.4 | 79.5 | 91.6 | 81.5 | 84.0 | 83.4 | 84.3 | 95.3 | 74.9 | 80.1 | 78.9 | 81.4 | 88.4 | |||
✓ | 77.2 | 78.0 | 78.1 | 79.5 | 89.1 | 66.0 | 65.9 | 66.8 | 67.7 | 91.9 | 37.3 | 38.0 | 41.3 | 39.1 | 71.6 | |||
✓ | 87.3 | 87.4 | 89.1 | 86.9 | 95.2 | 69.2 | 68.8 | 69.3 | 69.9 | 93.5 | 38.2 | 42.4 | 43.6 | 42.8 | 74.6 | |||
✓ | ✓ | 87.7 | 87.8 | 88.4 | 88.4 | 94.5 | 83.5 | 84.8 | 86.4 | 86.3 | 95.8 | 75.9 | 79.4 | 81.7 | 80.1 | 88.9 | ||
✓ | ✓ | 81.1 | 81.8 | 81.2 | 83.1 | 92.1 | 83.4 | 83.6 | 85.2 | 85.8 | 95.4 | 78.0 | 80.1 | 79.2 | 81.7 | 88.3 | ||
✓ | ✓ | 89.7 | 89.8 | 89.9 | 89.8 | 95.5 | 73.1 | 73.8 | 73.9 | 74.4 | 94.3 | 41.0 | 45.9 | 48.2 | 46.8 | 77.3 | ||
✓ | ✓ | 87.7 | 87.8 | 88.0 | 87.9 | 94.4 | 73.1 | 73.4 | 73.3 | 72.9 | 94.1 | 45.7 | 46.8 | 50.1 | 47.3 | 77.5 | ||
✓ | ✓ | 89.9 | 89.9 | 90.5 | 90.1 | 95.5 | 74.1 | 74.6 | 75.5 | 74.5 | 94.1 | 49.3 | 48.6 | 48.6 | 49.5 | 76.6 | ||
✓ | ✓ | 89.9 | 89.3 | 90.0 | 90.0 | 95.6 | 84.7 | 84.8 | 85.5 | 86.6 | 95.9 | 76.7 | 81.9 | 81.8 | 81.2 | 88.9 | ||
✓ | ✓ | ✓ | 90.7 | 90.1 | 90.7 | 90.6 | 95.6 | 85.1 | 85.2 | 86.5 | 86.7 | 95.8 | 76.8 | 82.1 | 81.8 | 81.8 | 88.8 | |
✓ | ✓ | ✓ | 90.6 | 90.6 | 90.3 | 90.6 | 95.7 | 75.2 | 75.6 | 75.9 | 75.8 | 94.7 | 49.9 | 50.3 | 52.5 | 51.1 | 78.0 | |
✓ | ✓ | ✓ | 90.7 | 90.4 | 90.6 | 90.8 | 95.8 | 85.0 | 85.3 | 86.4 | 86.4 | 96.0 | 77.1 | 78.7 | 81.0 | 80.0 | 88.9 | |
✓ | ✓ | ✓ | 88.3 | 88.2 | 88.7 | 88.9 | 94.6 | 83.5 | 84.2 | 86.5 | 86.5 | 95.8 | 77.0 | 79.3 | 78.5 | 82.1 | 88.9 | |
✓ | ✓ | ✓ | ✓ | 91.1 | 90.6 | 90.6 | 91.0 | 95.9 | 85.2 | 84.6 | 87.4 | 86.4 | 95.9 | 78.0 | 79.9 | 81.6 | 81.0 | 88.9 |
Average | 87.0 | 87.1 | 87.3 | 87.6 | 94.3 | 78.2 | 78.6 | 79.6 | 79.7 | 94.8 | 61.5 | 64.0 | 64.9 | 64.9 | 82.8 |
Modalities | Dice (%) ↑ | |||||||||||||||||
Complete | Core | Enhancing | ||||||||||||||||
F | T1 | T1c | T2 | Z | Y | T | L | Our | Z | Y | T | L | Our | Z | Y | T | L | Our |
✓ | 81.2 | 76.3 | 86.6 | 84.8 | 94.3 | 64.2 | 56.7 | 68.8 | 69.4 | 94.4 | 43.1 | 16.0 | 41.4 | 47.6 | 76.2 | |||
✓ | 72.2 | 42.8 | 77.8 | 75.8 | 92.6 | 75.4 | 65.1 | 81.5 | 82.9 | 95.4 | 72.6 | 66.3 | 75.7 | 73.7 | 89.2 | |||
✓ | 67.5 | 15.5 | 78.7 | 74.4 | 90.9 | 56.6 | 16.8 | 65.6 | 66.1 | 93.2 | 32.5 | 8.1 | 44.5 | 37.1 | 74.7 | |||
✓ | 86.1 | 84.2 | 88.4 | 88.7 | 95.2 | 61.2 | 47.3 | 66.7 | 66.4 | 94.2 | 39.3 | 8.1 | 40.5 | 35.6 | 74.8 | |||
✓ | ✓ | 83.0 | 84.1 | 88.2 | 86.3 | 95.0 | 78.6 | 80.3 | 84.8 | 84.2 | 96.1 | 74.5 | 68.7 | 77.7 | 75.3 | 90.0 | ||
✓ | ✓ | 74.4 | 62.1 | 81.8 | 77.2 | 93.1 | 78.6 | 78.2 | 83.5 | 83.4 | 95.7 | 74.0 | 70.7 | 77.1 | 74.7 | 89.5 | ||
✓ | ✓ | 87.1 | 87.3 | 89.7 | 89.0 | 95.6 | 65.9 | 61.6 | 72.0 | 70.8 | 95.2 | 43.0 | 9.5 | 44.4 | 41.2 | 77.9 | ||
✓ | ✓ | 82.2 | 84.2 | 88.4 | 88.7 | 94.9 | 61.2 | 47.3 | 66.7 | 66.4 | 95.1 | 45.0 | 16.5 | 47.7 | 48.7 | 77.7 | ||
✓ | ✓ | 87.6 | 87.9 | 90.3 | 89.9 | 95.9 | 69.8 | 62.6 | 71.8 | 70.9 | 95.1 | 47.5 | 17.4 | 48.3 | 45.4 | 78.1 | ||
✓ | ✓ | 87.1 | 87.5 | 89.5 | 89.7 | 95.6 | 77.9 | 80.8 | 84.8 | 84.4 | 96.1 | 75.1 | 64.8 | 76.8 | 75.0 | 90.0 | ||
✓ | ✓ | ✓ | 87.3 | 87.7 | 90.4 | 88.9 | 95.7 | 79.8 | 80.9 | 85.2 | 84.1 | 96.2 | 75.5 | 65.7 | 77.4 | 74.0 | 90.0 | |
✓ | ✓ | ✓ | 87.8 | 88.4 | 89.7 | 89.9 | 96.0 | 71.5 | 63.7 | 74.1 | 72.7 | 95.5 | 47.7 | 19.4 | 50.0 | 44.8 | 78.7 | |
✓ | ✓ | ✓ | 88.1 | 88.8 | 90.6 | 90.4 | 96.0 | 79.6 | 80.7 | 85.8 | 84.6 | 96.3 | 75.7 | 66.4 | 76.6 | 73.8 | 90.1 | |
✓ | ✓ | ✓ | 82.7 | 80.9 | 88.4 | 86.1 | 95.1 | 80.4 | 79.0 | 85.8 | 84.4 | 96.2 | 74.8 | 68.3 | 78.5 | 75.4 | 90.1 | |
✓ | ✓ | ✓ | ✓ | 89.6 | 88.8 | 90.6 | 90.1 | 96.1 | 85.8 | 80.1 | 85.9 | 84.5 | 96.3 | 77.6 | 68.4 | 80.4 | 75.5 | 90.0 |
Average | 82.9 | 76.4 | 87.3 | 86.0 | 94.8 | 72.4 | 65.4 | 77.5 | 77.0 | 95.4 | 59.9 | 42.3 | 62.5 | 59.9 | 83.8 |
Method | Dice ↑ | HD95 ↓ | Sensitivity ↑ | |||||||||
WT | TC | ET | Avg | WT | TC | ET | Avg | WT | TC | ET | Avg | |
Ding et al. | 86.13 | 71.93 | 58.98 | 72.35 | - | - | - | - | - | - | - | - |
Zhang et al. | 87.08 | 78.69 | 64.08 | 76.62 | 2.90 | 6.21 | 44.64 | 17.92 | 99.60 | 99.81 | 99.82 | 99.74 |
Ting et al. | 90.71 | 84.60 | 79.07 | 84.79 | 4.05 | 5.78 | 33.77 | 14.53 | 90.98 | 83.90 | 77.68 | 84.18 |
Qiu et al. | 87.58 | 79.67 | 64.87 | 77.37 | 2.82 | 5.71 | 43.92 | 17.48 | 99.66 | 99.83 | 99.81 | 99.77 |
baseline(fine-tune) | 77.63 | 78.94 | 70.85 | 93.56 | 2.61 | 2.09 | 2.39 | 2.36 | 86.28 | 86.50 | 82.74 | 85.17 |
baseline(frozen) | 58.11 | 61.09 | 40.88 | 89.16 | 2.83 | 2.29 | 2.97 | 2.70 | 81.41 | 84.68 | 85.90 | 84.00 |
our | 94.96 | 94.12 | 89.98 | 93.02 | 2.58 | 2.09 | 2.21 | 2.29 | 96.81 | 96.32 | 93.01 | 95.38 |
Method | Dice ↑ | HD95 ↓ | Sensitivity ↑ | |||||||||
WT | TC | ET | Avg | WT | TC | ET | Avg | WT | TC | ET | Avg | |
baseline (fine-tune) | 77.63 | 78.94 | 70.85 | 75.81 | 2.61 | 2.09 | 2.39 | 2.36 | 86.28 | 86.50 | 82.74 | 85.17 |
baseline (frozen) | 58.11 | 61.09 | 40.88 | 53.36 | 2.83 | 2.29 | 2.97 | 2.70 | 81.41 | 84.68 | 85.90 | 84.00 |
baseline + FFP | 93.65 | 92.40 | 85.08 | 90.38 | 2.45 | 2.04 | 2.16 | 2.22 | 96.54 | 96.11 | 91.26 | 94.64 |
baseline + SPP | 94.56 | 94.40 | 87.37 | 92.11 | 2.47 | 2.05 | 2.22 | 2.25 | 96.59 | 96.07 | 90.53 | 94.40 |
baseline + FFP + SPP | 94.96 | 94.12 | 89.98 | 93.02 | 2.58 | 2.09 | 2.21 | 2.29 | 96.81 | 96.32 | 93.01 | 95.38 |
Method | Param (M) | Tunable Param (M) |
baseline (fine-tune) | 194.82 | 194.82 |
baseline (frozen) | 194.82 | 49.30 |
baseline + FFP | 160.42 | 58.97 |
baseline + SPP | 173.93 | 48.69 |
baseline + FFP + SPP | 153.43 | 10.58 |
Modalities | Dice (%) ↑ | |||||||||||||||||
Complete | Core | Enhancing | ||||||||||||||||
F | T1 | T1c | T2 | D | Z | T | Q | Our | D | Z | T | Q | Our | D | Z | T | Q | Our |
✓ | 86.1 | 86.1 | 86.5 | 86.7 | 93.9 | 71.0 | 70.9 | 71.5 | 71.0 | 93.3 | 46.3 | 46.3 | 45.6 | 47.2 | 76.1 | |||
✓ | 76.8 | 78.5 | 77.4 | 79.5 | 91.6 | 81.5 | 84.0 | 83.4 | 84.3 | 95.3 | 74.9 | 80.1 | 78.9 | 81.4 | 88.4 | |||
✓ | 77.2 | 78.0 | 78.1 | 79.5 | 89.1 | 66.0 | 65.9 | 66.8 | 67.7 | 91.9 | 37.3 | 38.0 | 41.3 | 39.1 | 71.6 | |||
✓ | 87.3 | 87.4 | 89.1 | 86.9 | 95.2 | 69.2 | 68.8 | 69.3 | 69.9 | 93.5 | 38.2 | 42.4 | 43.6 | 42.8 | 74.6 | |||
✓ | ✓ | 87.7 | 87.8 | 88.4 | 88.4 | 94.5 | 83.5 | 84.8 | 86.4 | 86.3 | 95.8 | 75.9 | 79.4 | 81.7 | 80.1 | 88.9 | ||
✓ | ✓ | 81.1 | 81.8 | 81.2 | 83.1 | 92.1 | 83.4 | 83.6 | 85.2 | 85.8 | 95.4 | 78.0 | 80.1 | 79.2 | 81.7 | 88.3 | ||
✓ | ✓ | 89.7 | 89.8 | 89.9 | 89.8 | 95.5 | 73.1 | 73.8 | 73.9 | 74.4 | 94.3 | 41.0 | 45.9 | 48.2 | 46.8 | 77.3 | ||
✓ | ✓ | 87.7 | 87.8 | 88.0 | 87.9 | 94.4 | 73.1 | 73.4 | 73.3 | 72.9 | 94.1 | 45.7 | 46.8 | 50.1 | 47.3 | 77.5 | ||
✓ | ✓ | 89.9 | 89.9 | 90.5 | 90.1 | 95.5 | 74.1 | 74.6 | 75.5 | 74.5 | 94.1 | 49.3 | 48.6 | 48.6 | 49.5 | 76.6 | ||
✓ | ✓ | 89.9 | 89.3 | 90.0 | 90.0 | 95.6 | 84.7 | 84.8 | 85.5 | 86.6 | 95.9 | 76.7 | 81.9 | 81.8 | 81.2 | 88.9 | ||
✓ | ✓ | ✓ | 90.7 | 90.1 | 90.7 | 90.6 | 95.6 | 85.1 | 85.2 | 86.5 | 86.7 | 95.8 | 76.8 | 82.1 | 81.8 | 81.8 | 88.8 | |
✓ | ✓ | ✓ | 90.6 | 90.6 | 90.3 | 90.6 | 95.7 | 75.2 | 75.6 | 75.9 | 75.8 | 94.7 | 49.9 | 50.3 | 52.5 | 51.1 | 78.0 | |
✓ | ✓ | ✓ | 90.7 | 90.4 | 90.6 | 90.8 | 95.8 | 85.0 | 85.3 | 86.4 | 86.4 | 96.0 | 77.1 | 78.7 | 81.0 | 80.0 | 88.9 | |
✓ | ✓ | ✓ | 88.3 | 88.2 | 88.7 | 88.9 | 94.6 | 83.5 | 84.2 | 86.5 | 86.5 | 95.8 | 77.0 | 79.3 | 78.5 | 82.1 | 88.9 | |
✓ | ✓ | ✓ | ✓ | 91.1 | 90.6 | 90.6 | 91.0 | 95.9 | 85.2 | 84.6 | 87.4 | 86.4 | 95.9 | 78.0 | 79.9 | 81.6 | 81.0 | 88.9 |
Average | 87.0 | 87.1 | 87.3 | 87.6 | 94.3 | 78.2 | 78.6 | 79.6 | 79.7 | 94.8 | 61.5 | 64.0 | 64.9 | 64.9 | 82.8 |
Modalities | Dice (%) ↑ | |||||||||||||||||
Complete | Core | Enhancing | ||||||||||||||||
F | T1 | T1c | T2 | Z | Y | T | L | Our | Z | Y | T | L | Our | Z | Y | T | L | Our |
✓ | 81.2 | 76.3 | 86.6 | 84.8 | 94.3 | 64.2 | 56.7 | 68.8 | 69.4 | 94.4 | 43.1 | 16.0 | 41.4 | 47.6 | 76.2 | |||
✓ | 72.2 | 42.8 | 77.8 | 75.8 | 92.6 | 75.4 | 65.1 | 81.5 | 82.9 | 95.4 | 72.6 | 66.3 | 75.7 | 73.7 | 89.2 | |||
✓ | 67.5 | 15.5 | 78.7 | 74.4 | 90.9 | 56.6 | 16.8 | 65.6 | 66.1 | 93.2 | 32.5 | 8.1 | 44.5 | 37.1 | 74.7 | |||
✓ | 86.1 | 84.2 | 88.4 | 88.7 | 95.2 | 61.2 | 47.3 | 66.7 | 66.4 | 94.2 | 39.3 | 8.1 | 40.5 | 35.6 | 74.8 | |||
✓ | ✓ | 83.0 | 84.1 | 88.2 | 86.3 | 95.0 | 78.6 | 80.3 | 84.8 | 84.2 | 96.1 | 74.5 | 68.7 | 77.7 | 75.3 | 90.0 | ||
✓ | ✓ | 74.4 | 62.1 | 81.8 | 77.2 | 93.1 | 78.6 | 78.2 | 83.5 | 83.4 | 95.7 | 74.0 | 70.7 | 77.1 | 74.7 | 89.5 | ||
✓ | ✓ | 87.1 | 87.3 | 89.7 | 89.0 | 95.6 | 65.9 | 61.6 | 72.0 | 70.8 | 95.2 | 43.0 | 9.5 | 44.4 | 41.2 | 77.9 | ||
✓ | ✓ | 82.2 | 84.2 | 88.4 | 88.7 | 94.9 | 61.2 | 47.3 | 66.7 | 66.4 | 95.1 | 45.0 | 16.5 | 47.7 | 48.7 | 77.7 | ||
✓ | ✓ | 87.6 | 87.9 | 90.3 | 89.9 | 95.9 | 69.8 | 62.6 | 71.8 | 70.9 | 95.1 | 47.5 | 17.4 | 48.3 | 45.4 | 78.1 | ||
✓ | ✓ | 87.1 | 87.5 | 89.5 | 89.7 | 95.6 | 77.9 | 80.8 | 84.8 | 84.4 | 96.1 | 75.1 | 64.8 | 76.8 | 75.0 | 90.0 | ||
✓ | ✓ | ✓ | 87.3 | 87.7 | 90.4 | 88.9 | 95.7 | 79.8 | 80.9 | 85.2 | 84.1 | 96.2 | 75.5 | 65.7 | 77.4 | 74.0 | 90.0 | |
✓ | ✓ | ✓ | 87.8 | 88.4 | 89.7 | 89.9 | 96.0 | 71.5 | 63.7 | 74.1 | 72.7 | 95.5 | 47.7 | 19.4 | 50.0 | 44.8 | 78.7 | |
✓ | ✓ | ✓ | 88.1 | 88.8 | 90.6 | 90.4 | 96.0 | 79.6 | 80.7 | 85.8 | 84.6 | 96.3 | 75.7 | 66.4 | 76.6 | 73.8 | 90.1 | |
✓ | ✓ | ✓ | 82.7 | 80.9 | 88.4 | 86.1 | 95.1 | 80.4 | 79.0 | 85.8 | 84.4 | 96.2 | 74.8 | 68.3 | 78.5 | 75.4 | 90.1 | |
✓ | ✓ | ✓ | ✓ | 89.6 | 88.8 | 90.6 | 90.1 | 96.1 | 85.8 | 80.1 | 85.9 | 84.5 | 96.3 | 77.6 | 68.4 | 80.4 | 75.5 | 90.0 |
Average | 82.9 | 76.4 | 87.3 | 86.0 | 94.8 | 72.4 | 65.4 | 77.5 | 77.0 | 95.4 | 59.9 | 42.3 | 62.5 | 59.9 | 83.8 |
Method | Dice ↑ | HD95 ↓ | Sensitivity ↑ | |||||||||
WT | TC | ET | Avg | WT | TC | ET | Avg | WT | TC | ET | Avg | |
Ding et al. | 86.13 | 71.93 | 58.98 | 72.35 | - | - | - | - | - | - | - | - |
Zhang et al. | 87.08 | 78.69 | 64.08 | 76.62 | 2.90 | 6.21 | 44.64 | 17.92 | 99.60 | 99.81 | 99.82 | 99.74 |
Ting et al. | 90.71 | 84.60 | 79.07 | 84.79 | 4.05 | 5.78 | 33.77 | 14.53 | 90.98 | 83.90 | 77.68 | 84.18 |
Qiu et al. | 87.58 | 79.67 | 64.87 | 77.37 | 2.82 | 5.71 | 43.92 | 17.48 | 99.66 | 99.83 | 99.81 | 99.77 |
baseline(fine-tune) | 77.63 | 78.94 | 70.85 | 93.56 | 2.61 | 2.09 | 2.39 | 2.36 | 86.28 | 86.50 | 82.74 | 85.17 |
baseline(frozen) | 58.11 | 61.09 | 40.88 | 89.16 | 2.83 | 2.29 | 2.97 | 2.70 | 81.41 | 84.68 | 85.90 | 84.00 |
our | 94.96 | 94.12 | 89.98 | 93.02 | 2.58 | 2.09 | 2.21 | 2.29 | 96.81 | 96.32 | 93.01 | 95.38 |
Method | Dice ↑ | HD95 ↓ | Sensitivity ↑ | |||||||||
WT | TC | ET | Avg | WT | TC | ET | Avg | WT | TC | ET | Avg | |
baseline (fine-tune) | 77.63 | 78.94 | 70.85 | 75.81 | 2.61 | 2.09 | 2.39 | 2.36 | 86.28 | 86.50 | 82.74 | 85.17 |
baseline (frozen) | 58.11 | 61.09 | 40.88 | 53.36 | 2.83 | 2.29 | 2.97 | 2.70 | 81.41 | 84.68 | 85.90 | 84.00 |
baseline + FFP | 93.65 | 92.40 | 85.08 | 90.38 | 2.45 | 2.04 | 2.16 | 2.22 | 96.54 | 96.11 | 91.26 | 94.64 |
baseline + SPP | 94.56 | 94.40 | 87.37 | 92.11 | 2.47 | 2.05 | 2.22 | 2.25 | 96.59 | 96.07 | 90.53 | 94.40 |
baseline + FFP + SPP | 94.96 | 94.12 | 89.98 | 93.02 | 2.58 | 2.09 | 2.21 | 2.29 | 96.81 | 96.32 | 93.01 | 95.38 |
Method | Param (M) | Tunable Param (M) |
baseline (fine-tune) | 194.82 | 194.82 |
baseline (frozen) | 194.82 | 49.30 |
baseline + FFP | 160.42 | 58.97 |
baseline + SPP | 173.93 | 48.69 |
baseline + FFP + SPP | 153.43 | 10.58 |