Gas-phase advanced oxidation as an integrated air pollution control technique

Getachew A. Adnew; Carl Meusinger; Nicolai Bork; Michael Gallus; Mildrid Kyte; Vitalijs Rodins; Thomas Rosenørn; Matthew S. Johnson; Getachew A. Adnew; Carl Meusinger; Nicolai Bork; Michael Gallus; Mildrid Kyte; Vitalijs Rodins; Thomas Rosenørn; Matthew S. Johnson

doi:10.3934/environsci.2016.1.141

AIMS Environmental Science

2016, Volume 3, Issue 1: 141-158. doi: 10.3934/environsci.2016.1.141

Previous Article Next Article

Research article Special Issues

Gas-phase advanced oxidation as an integrated air pollution control technique

1.
Department of Chemistry, University of Copenhagen, DK-2100 Copenhagen Ø, Denmark
2.
Infuser ApS, Ole Maaløes vej 5, DK-2200 Copenhagen N, Denmark
3.
Rīga Stradiņš University, 16 Dzirciema Street, Rīga, LV-1007, Latvia
4.
Airlabs, Ole Maaløes vej 3, DK-2200 Copenhagen N, Denmark

Received: 21 January 2016 Accepted: 21 March 2016 Published: 28 March 2016

Gas-phase advanced oxidation (GPAO) is an emerging air cleaning technology based on the natural self-cleaning processes that occur in the Earth’s atmosphere. The technology uses ozone, UV-C lamps and water vapor to generate gas-phase hydroxyl radicals that initiate oxidation of a wide range of pollutants. In this study four types of GPAO systems are presented: a laboratory scale prototype, a shipping container prototype, a modular prototype, and commercial scale GPAO installations. The GPAO systems treat volatile organic compounds, reduced sulfur compounds, amines, ozone, nitrogen oxides, particles and odor. While the method covers a wide range of pollutants, effective treatment becomes difficult when temperature is outside the range of 0 to 80 °C, for anoxic gas streams and for pollution loads exceeding ca. 1000 ppm. Air residence time in the system and the rate of reaction of a given pollutant with hydroxyl radicals determine the removal efficiency of GPAO. For gas phase compounds and odors including VOCs (e.g. C₆H₆ and C₃H₈) and reduced sulfur compounds (e.g. H₂S and CH₃SH), removal efficiencies exceed 80%. The method is energy efficient relative to many established technologies and is applicable to pollutants emitted from diverse sources including food processing, foundries, water treatment, biofuel generation, and petrochemical industries.

Keywords:

Citation: Getachew A. Adnew, Carl Meusinger, Nicolai Bork, Michael Gallus, Mildrid Kyte, Vitalijs Rodins, Thomas Rosenørn, Matthew S. Johnson. Gas-phase advanced oxidation as an integrated air pollution control technique[J]. AIMS Environmental Science, 2016, 3(1): 141-158. doi: 10.3934/environsci.2016.1.141

Related Papers:

[1]	Biao Cai, Qing Xu, Cheng Yang, Yi Lu, Cheng Ge, Zhichao Wang, Kai Liu, Xubin Qiu, Shan Chang . Spine MRI image segmentation method based on ASPP and U-Net network. Mathematical Biosciences and Engineering, 2023, 20(9): 15999-16014. doi: 10.3934/mbe.2023713
[2]	Ke Bi, Yue Tan, Ke Cheng, Qingfang Chen, Yuanquan Wang . Sequential shape similarity for active contour based left ventricle segmentation in cardiac cine MR image. Mathematical Biosciences and Engineering, 2022, 19(2): 1591-1608. doi: 10.3934/mbe.2022074
[3]	Jianhua Song, Lei Yuan . Brain tissue segmentation via non-local fuzzy c-means clustering combined with Markov random field. Mathematical Biosciences and Engineering, 2022, 19(2): 1891-1908. doi: 10.3934/mbe.2022089
[4]	Zijian Wang, Yaqin Zhu, Haibo Shi, Yanting Zhang, Cairong Yan . A 3D multiscale view convolutional neural network with attention for mental disease diagnosis on MRI images. Mathematical Biosciences and Engineering, 2021, 18(5): 6978-6994. doi: 10.3934/mbe.2021347
[5]	Qing Zou, Zachary Miller, Sanja Dzelebdzic, Maher Abadeer, Kevin M. Johnson, Tarique Hussain . Time-Resolved 3D cardiopulmonary MRI reconstruction using spatial transformer network. Mathematical Biosciences and Engineering, 2023, 20(9): 15982-15998. doi: 10.3934/mbe.2023712
[6]	Yuqing Zhang, Yutong Han, Jianxin Zhang . MAU-Net: Mixed attention U-Net for MRI brain tumor segmentation. Mathematical Biosciences and Engineering, 2023, 20(12): 20510-20527. doi: 10.3934/mbe.2023907
[7]	Qian Wu, Yuyao Pei, Zihao Cheng, Xiaopeng Hu, Changqing Wang . SDS-Net: A lightweight 3D convolutional neural network with multi-branch attention for multimodal brain tumor accurate segmentation. Mathematical Biosciences and Engineering, 2023, 20(9): 17384-17406. doi: 10.3934/mbe.2023773
[8]	R. Bhavani, K. Vasanth . Brain image fusion-based tumour detection using grey level co-occurrence matrix Tamura feature extraction with backpropagation network classification. Mathematical Biosciences and Engineering, 2023, 20(5): 8727-8744. doi: 10.3934/mbe.2023383
[9]	Jiajun Zhu, Rui Zhang, Haifei Zhang . An MRI brain tumor segmentation method based on improved U-Net. Mathematical Biosciences and Engineering, 2024, 21(1): 778-791. doi: 10.3934/mbe.2024033
[10]	Yu Li, Meilong Zhu, Guangmin Sun, Jiayang Chen, Xiaorong Zhu, Jinkui Yang . Weakly supervised training for eye fundus lesion segmentation in patients with diabetic retinopathy. Mathematical Biosciences and Engineering, 2022, 19(5): 5293-5311. doi: 10.3934/mbe.2022248

Abstract

1. Introduction

Cardiac personalized modelling has been used for the non-invasive diagnosis and treatment of heart rhythm disorders, including risk classification of patients with heart attacks ^[1,2,3], prediction of the location of re-entry ^[4], and guidance for clinical ablation ^[5]. The key to the clinical application of heart models is the accurate creation of the personalized model, which is currently mostly segmented by experienced experts. Manual segmentation is subjective, irreproducible and time consuming, while model simulation takes a great deal of time and is time-critical in clinical practice, so how to minimize heart modeling time makes an automated segmentation method extremely important for the clinical application of personalized heart modelling.

Before the rise of deep learning, classical medical image segmentation algorithms such as region-based, grayscale-based, and edge-based algorithms were well established for medical images ^[6,7,8,9], while traditional machine learning techniques such as model-based methods (e.g., active contour and deformable models) and atlas-based methods (e.g., single-atlas and multi-atlas) have achieved good performance ^{[10,11,12,13]}. Nevertheless, both classical image segmentation algorithms and machine learning techniques usually require some prior knowledge or feature annotation processing to achieve better results. In contrast, deep learning-based algorithms do not rely on these operations; it automatically discovers and learns complex features from the data for target segmentation and detection. These features are often learned directly from the data through generic learning procedures and end-to-end methods. This allows deep learning-based algorithms to be easily applied to other domains. Deep learning-based segmentation algorithms are gradually surpassing previously advanced traditional methods and are gaining popularity in research, not only because of developments and advances in computer hardware such as graphics processing units (GPU) and tensor processing units (TPU), but also because of the increase in publicly available datasets and open source code. This trend can be observed in Figure 1, where the number of deep learning-based cardiac image segmentation papers has grown considerably in the last few years, especially after year 2018. We searched the web of science database for the keywords cardiac image segmentation and deep learning, and all types of articles were counted.

Figure 1. Overview of the number of papers published between 01 January 2016 and 01 December 2021 on deep learning-based methods for cardiac image segmentation.

DownLoad: Full-Size Img PowerPoint

Accurate localization and segmentation of medical images is a necessary prerequisite for diagnosis and treatment planning of cardiac diseases ^[14]. With the development of deep learning technology, various deep learning algorithms have been introduced into medical image processing and analysis with good results ^[15,16]. Convolutional neural networks (CNN) are one of the most common neural networks in medical image analysis, which are computationally fast and simple, and requiring no major adjustments to the network architecture ^[17]. CNN have been used with great success for medical image classification and segmentation, but a major drawback of this patch-based approach is that a separate network must be deployed for each patch at the time of inference, due to multiple overlapping in the image patches, which results in a large amount of redundancy and wasted resources. To solve this problem, fully convolutional networks ^[18] were created, which were designed to have an encoding-decoding structure that allows them to receive inputs of arbitrary size and produce outputs with the same size. However, this encoder-decoder structure also poses some limitations, such as the loss of some features, so there are many variants based on FCNs, but the most famous one is U-Net ^[19], which uses hopping connections to recover feature information in the down-sample paths to reduce the loss of spatial context information and thus obtain more accurate results. Subsequently U-Net gradually dominated in medical image processing, but it and its variants ^[20,21,22] also faced the lack of ability to build remotely correlated models. This is mainly due to the inherent limitations of the convolution operation ^[23].

On the other hand, the success of Transformer, which captures remote dependencies, has made possible the solution of the above problem in recent years. Transformer was designed for sequence modeling and transformation tasks, and it is known for its focus on modeling remote dependencies in data. Its great success in the language domain has motivated researchers to investigate its adaptability to computer vision, especially since it has achieved good results on some recent image classification and segmentation tasks ^[24,25,26]. ViT ^[25] first introduced transformer to computer vision tasks by segmenting an image into 16 non-overlapping patches, feeding them into standard transformer with positional embedding and comparing it with the CNN-based approach, ViT achieved a fairly good performance, which broke the monopoly of U-Net in computer vision. With the advent of VIT, more and more transformer-based image processing became popular, such as Swin-transformer ^[24] proposed a hierarchical transformer and a sliding window attention-based transformer, while the pyramidal visual transformer (PVT) ^[26] proposed a gradual shrinkage strategy to control the scale of feature maps and proposed a spatially reduced attention (SRA) layer to replace the traditional multiple head attention (MHA) layer in encoders, which were designed mainly to reduce the computational complexity. But these transformer-based networks have a limitation of unable extracting low-level features like convolutional operations ^[27], so some detailed features will be ignored.

To solve the above problem, we propose TF-Unet, a medical image segmentation framework that combines Transformer and U-Net. To fully utilize the advantages of both, we use two convolutional layers to learn high-resolution features and spatial location information in the learning feature phase, and use Transformer blocks to establish remote dependencies in the decoding phase. In terms of structure, inspired by the U-Net network structure, we divide the network into encoder-decoder blocks, and the self-attentive features of the coding blocks are combined with different high-resolution decoding features through hopping connections to reduce information loss. The results show that such a design allows our framework to maintain the advantages of both Convolution and Transformer, while facilitating the segmentation of medical images. Experimental results show that our proposed hybrid network has better performance and robustness compared to previous methods based on pure convolution and pure transformer.

2. Materials and methods

2.1. Data description

The ACDC dataset: (1) Raw Nifti images of 100 patients were used as the training set, and clinical experts used the corresponding manual reference analysis of ED and ES time phases as segmentation criteria, where trabecular and papillary muscles were included in the ventricular blood pool; (2) raw Nifti images of another 50 patients were used as the test set, providing only basic patient information: height and weight, and ED and ES time phases. ACDC data were acquired using 1.5 T and 3.0 T MRI scanners with retrospective or prospective balanced steady-state free-feed sequences. The scan parameters were as follows: layer thickness of 5–8 mm, layer spacing of 5 mm, layer thickness and layer spacing combined were typically 5–10 mm, matrix size was 256 × 256, FOV was 300 × 330 mm², and one complete cardiac cycle consisted of 28–40 time phases.

The Synapse dataset: (1) Raw Nifti images of 30 patients were used as training set; (2) Raw Nifti images of another 20 patients were used as test set, these 50 scans were taken at the portal venography stage with different volumes (512 × 512 × 85–512 × 512 × 198) and fields of view (approximately 280 × 280 × 280–500 × 500 × 650 mm³). The planar resolution varied from 0.54 × 0.54 to 0.98 × 0.98 mm², while the slice thickness ranged from 2.5 to 5.0 mm.

2.2. Methodology overview

The general architecture of TF-Unet is shown in Figure 2, which maintains a U-shape similar to that of U-net ^[11] and consists of two main branches, i.e., encoder and decoder. Specifically, the encoder includes the feature extraction block, the transformer block, and the down-sampling block. The decoder branch includes the transformer block, the up-sampling block and the deconvolution block that finally maps the output. And, to recover the image details in the prediction, we add residual connections ^[28] between the corresponding feature pyramids of the encoder and decoder in a symmetric manner.

Figure 2. The overall architecture of TF-Unet, which is composed of encoder, decoder and skip connections. H, W, C represent the height, width and sequence length of each input patch, respectively.

DownLoad: Full-Size Img PowerPoint

2.2.1. Feature extraction block

The feature extraction block is mainly responsible for converting each input image I into a high-dimensional tensor $ I\in {R}^{\frac{H}{4}\times \frac{W}{4}\times C} $, where H, W, C record the height, width, and sequence length of each input patch, respectively. Unlike Jieneng Chen et al. ^[24], who first flattened the input image directly and then preprocessed it in one dimension, we use a feature extraction layer which extracts low-level but high-resolution 3D features directly from the image and has more accurate spatial information at the pixel level.

We use two consecutive convolutional layers with a kernel size of 3 and step sizes of 2 and 1 and use LeakyReLU nonlinear activation functions and LayerNorm for each layer, which not only allows us to encode spatial information more accurately than the faceted position encoding used in the transformer, but also helps to reduce computational complexity while providing equally sized perceptual fields.

2.2.2. Transformer block

After the feature extraction block, we pass the high-dimensional tensor I to the Transformer block in two consecutive layers. The ability of the transformer to establish remote dependencies is fully exploited to establish the connection between the high-resolution features extracted in the upper layer and the multi-scale features obtained by convolutional downsampling in the next layer. Unlike the traditional multi-headed self-attentive module, this paper uses the Swin-Transformer module ^[24], who is constructed based on a sliding window. Since the window-based self-attentive module lacks cross-window connections, this limits its modeling capabilities. In order to introduce cross-window connectivity while maintaining efficient computation of non-overlapping windows, Ze Liu et al. ^[24] proposed a sliding window partitioning approach. In Figure 3, two consecutive transformer modules are given. Each Swin-Transformer block consists of a LayerNorm (LN) layer, a multi-headed self-attentive module, a skip connection, and an MLP (Multilayer Perceptron) with a LeakyReLU nonlinearity. The window-based multi-headed self-attention (W-MSA) module and the sliding window-based multi-headed self-attention (SW-MSA) module are applied in the two consecutive Transformer blocks, respectively. Based on this window division mechanism, the consecutive sliding Transformer blocks can be represented as Eqs (1)–(4).

$ 41{y{'}}^{l} = W-MSA\left(LN\left({y}^{l-1}\right)\right)+{y}^{l-1} $

(1)

$ 42{y}^{l} = MLP\left(LN\left({y{'}}^{l}\right)\right)+{y{'}}^{l} $

(2)

$ 43{y{'}}^{l+1} = SW-MSA\left(LN\left({y}^{l}\right)\right)+{y}^{l} $

(3)

$ 44{y}^{l+1} = MLP\left(LN\left({y}^{{'}l+1}\right)\right)+{y{'}}^{l+1} $

(4)

Figure 3. Two consecutive transformer blocks, the left is a fixed window, the right is a sliding window. Each transformer block is composed of LayerNorm layer, multi-head self-attention module and 2-layer MLP with LeakyReLU non-linearity.

DownLoad: Full-Size Img PowerPoint

where l is the index of the layer. W-MSA and SW-MSA denote the volume-based multi-headed self-attentive and its transfer version. where $ {y{'}}^{l} $ and $ {y}^{l} $ denotes the output of the W-MSA module and the MLP module of layer l, respectively. The computational complexity of SW-MSA on a volume of H × W × D patches is 4HWDC² + 2S_HS_WS_DHWDC, however, the computational complexity of naïve multi-headed self-attention (MSA) is 4HWDC² + 2(HWD)²C. S_H, S_W, S_D represent the height, width and depth of the sliding window respectively. SW-MSA greatly reduces the computational complexity of MSA, so our proposed algorithm is more efficient. The sliding window segmentation approach introduces connections between adjacent non-overlapping windows in the previous layer and has been found to be effective in image classification, object detection and semantic segmentation ^[23].

In calculating the self-attention, we refer to Han Hu et al. ^[29,30] and add the relative position bias, and the specific formula for calculating the self-attention is as follows:

$ 47Attention(Q, K, V) = SoftMax\left(\frac{Q{K}^{T}}{\sqrt{d}}+B\right)V $

(5)

where Q, K, V represent the query matrix, key matrix and value matrix, respectively. d is generally taken as the dimension of Q or K. $ B\in {R}^{\left(2H-1\right)\times \left(2W-1\right)} $ is the relative position encoding.

2.2.3. Convolutional down-sampling

Instead of completing the cascaded feature operations by using linear layers as in Swin-Unet ^[23], we directly use the convolution operation with stride size of 2. The reason for this is that the layered features generated by convolutional down-sampling help to model the target object at multiple scales. After such processing, the feature resolution is down-sampled by a factor of 2 and the feature dimension is increased to twice the original dimension.

2.2.4. Convolutional up-sampling

Corresponding to the Convolutional down-sampling, we also make changes in the up-sampling layer. We use stepwise deconvolution to up-sample the low-resolution feature map into a high-resolution feature map, i.e., by reconstructing the adjacent dimensional feature map into a higher-resolution feature map (2x up-sampling) and correspondingly reducing the feature dimension to half of the original dimension, and then by skip connecting, the features extracted from the encoder's down-sampling are combined with the decoder up-sampled features are merged. A deconvolution operation is also performed in the last patch extension block to produce the final result.

2.2.5. Skip connection

Similar to U-Net ^[19], skip connections are used to fuse multiscale features from the encoder with up-sample features from the decoder. We splice shallow and deep features together to reduce the loss of spatial information due to down-sample.

3. Results

To fairly compare the experimental results, we test three times on the ACDC dataset to take the average, and to verify the robustness of our algorithm, we do the same test on the Synapse dataset.

3.1. Experimental details

We ran all experiments based on Python 3.6, pytorch 1.8.1 and Ubutun 20.04. All training programs were executed on an NVIDIA 2080 GPU with 11 GB of RAM. The initial learning rate was set to 0.01, and we used the "poly" decay strategy ^[31] by default. As described in Eq (6):

$ 59lr = initial\_lr\times {(1-\frac{tem\_epoch}{max\_epoch})}^{\gamma } $

(6)

where $ max\_epoch $ represents the total number of training generations, default 1000, $ tem\_epoch $ represents the current training generations, γ is the hyperparameter, default take 0.9.

The default optimizer is stochastic gradient descent (SGD) and we set the momentum to 0.99. The weight decay is set to 3e^-5. We use the weighted sum of the cross-entropy loss and the dice loss as the loss function. The training epochs is 1000 and each epoch contains 250 iterations.

3.2. Data pre-processing and data enhancement

All images in the same dataset are firstly resampled to the same target spacing and then cropped to the same size. Since there are not enough training samples, some data enhancement operations, such as rotation, scaling, Gaussian blur, Gaussian noise, brightness and contrast adjustment, are performed during the training process

3.3. Experimental result at ACDC

In conducting experiments on the ACDC dataset, we designed two experimental scenarios; one is to make full use of the dataset, we use all 100 training data as the training set and 50 test data as the test set. The other is to quantitatively evaluate our results, we divide the 100 labeled training data into 70 training sets, 10 validation sets and 20 test sets. The real labels of the 20 cases used for testing were not put into the training.

Figures 4 and 5 show the results of the first and second scenario, respectively. We randomly selected several patients' results for visualization ^[32].

Figure 4. The patient's consecutive nine-layer cardiac MRI results, the first and third columns are the original images, sequentially from the base to the apex of the heart, and the second and fourth columns correspond to the segmentation results on the left, respectively.

DownLoad: Full-Size Img PowerPoint

Figure 5. Cardiac MRI results in a patient with ground truth, the first columns are the original images, sequentially from the base to the apex of the heart, the second columns are the ground truth, and the third column are the segmentation results.

DownLoad: Full-Size Img PowerPoint

The results of the second scenario are as follows, Table 1 shows the quantitative calculations and comparisons of RV, MYO and LV using the dice coefficients, and Figure 5 shows the raw plots of several randomly selected patient data, ground truth and predicted results. Due to the random nature of data partitioning, the results of the other methods in Table Ⅰ are taken from the results in the corresponding papers.

Table 1. Comparison on the ACDC MRI dataset (average dice score in %, dice score in % for each class).

Methods	DSC(avg)	RV	MYO	LV
R50-Unet ^[27]	87.55	87.10	80.36	94.92
R50-Attn Unet ^[33]	86.75	87.58	79.20	93.47
VIT ^[25]	81.45	81.46	70.71	92.18
R50-VIT ^[25]	87.57	86.07	81.88	94.75
TransUNet ^[27]	89.71	88.86	84.54	95.73
Swin-Unet ^[23]	90.00	88.55	85.62	95.83
Ours	91.72	90.16	89.40	95.60

| Show Table

DownLoad: CSV

3.4. Experimental result at Synapse

For Synapse data, we only use the second scenario in ACDC, we chose a part of labeled training set for testing, with training sample: validation sample: test sample = 14:7:9. We used the mean dice similarity coefficient (DSC) for eight abdominal organs, namely the aorta, gall bladder, spleen, left kidney, right kidney, liver, pancreas and stomach, to evaluate the model performance. Figure 6 shows the results of the different layers of patients from the Synapse dataset, with different colors representing different organs, as shown in the legend in Figure 6.

Figure 6. Results for one patient from the Synapse dataset, the first column is the original image, the second column is the ground truth, and the third column is the segmentation result. Due to the excessive number of scanned layers, we choose the intermediate layers to display, starting from layer 85 and displaying every ten layers at intervals until layer 125.

DownLoad: Full-Size Img PowerPoint

3.5. Ablation study

In this section, we introduce the importance of learning rate strategies. In order to verify the effect of different learning rate strategies on the results, we did controlled experiments with four functions, inv, multistep, poly, step, and the results of the four methods are shown in Figure 7.

Figure 7. Results of four different learning rate strategies.

DownLoad: Full-Size Img PowerPoint

4. Discussion

In this section, we discuss in detail the experimental results obtained by our algorithm and explore the impact of different factors on the model performance, which we have compared on the ACDC and Synapse datasets, respectively. Specifically, we discuss the effects of different learning rate strategies on network performance.

Analysis from a quantitative perspective. From Table 1, the best transformer-based model is Swin-Unet, which has an average dice coefficient of 90%. The best convolution-based model is R50-U-Net whose average dice coefficient is 87.55%, while our proposed TF-Unet is 1.72% higher than that of Swin-Unet and 4.17% higher than that of R50-U-Net. Considering that the current accuracy of these networks themselves is already very high, our proposed network improvement is still very effective, suggesting that our method can achieve better edge prediction. Analysis from a qualitative perspective. As can be seen in Figure 5, the middle represents the patient's true value and the rightmost represents our predicted value. By comparing layer by layer, the results obtained by our method are very close to the true value, and very good results are achieved even for the right ventricle, which is difficult to segment. In this work, we demonstrate that by combining Transformer with convolutional operations, better global and remote semantic information interactions can be learned, resulting in better segmentation results.

It is well known that most of the deep learning networks cannot predict the results well for the test set without labeled values, but the network model based on TF-Unet can get good results. Observing Figure 4, we can conclude that the results obtained with our method are generally quite accurate for the layers other than the root tip layer. However, in the lower right corner of Figure 4, i.e., the apical layer, our method does not segment it. On the one hand, the apical layer has less segmentation in the training set, which makes it difficult for the network to learn features in this region; on the other hand, the true areas of both RV and LV in the apical layer are small and easily confused with surrounding vessels or tissues, leading to difficulties in segmentation.

Figure 8 summarizes the learning process of our proposed network, it was observed that the training loss and validation loss decrease with the increase of iterations and reach a stable state at about 200 generations without overfitting. And the dice coefficient of the validation set increases with the number of iterations and reaches a steady state at 800 generations.

Figure 8. (a) The blue solid line represents training loss curves, (b) The red solid line represents validation loss curves, (c) The green dotted line represents dice score curves during validation.

DownLoad: Full-Size Img PowerPoint

Quantitatively, as shown in Table 2, we performed experiments on Synapse and compared our TF-Unet with various transformer-based and Unet-based baselines. The main evaluation metric is the dice factor. the best performing transformer -based approach is Swin-Unet, which achieves an average score of 79.13. In contrast, DualNorm-UNet reports the best CNN-based results with an average of 80.37, slightly higher than Swin-Unet. our TF-Unet is able to outperform both Swin-Unet and DualNorm-UNet average performance by 6.33% and 5.09%, respectively, which is a considerable improvement on Synapse. Qualitatively, as can be seen in Figure 6, the middle column indicates the true value, and the rightmost column indicates the prediction result. For the segmentation of multiple organs, our proposed TF-Unet network still performs well, but there are some shortcomings for the stomach, as shown by the red boxes in the four lower right panels in Figure 6, one is that the prediction result is not smooth enough and there are many bursts, and the other is that it is difficult to segment to complex boundaries

Table 2. Comparison on the Synapse multi-organ CT dataset (average dice score in %, dice score in % for each class).

Methods	DSC(avg)	Aorta	Gallbladder	Kidney(L)	Kidney(R)	Liver	Pancreas	Spleen	Stomach
R50-Unet ^[27]	74.68	87.74	63.66	80.60	78.19	93.74	56.90	85.87	74.16
DualNorm-UNet ^[34]	80.37	86.52	55.51	88.64	86.29	95.64	55.91	94.62	79.80
VIT ^[25]	67.86	70.19	45.10	74.70	67.40	91.32	42.00	81.75	70.44
R50-VIT ^[25]	71.29	73.73	55.13	75.80	72.20	91.51	45.99	81.99	73.95
SQNet ^[35]	73.76	83.55	61.17	76.87	69.40	91.53	56.55	85.82	65.24
TransUNet ^[27]	77.48	87.23	63.16	81.87	77.02	94.08	55.86	85.08	75.62
Swin-Unet ^[23]	79.13	85.47	66.53	83.28	79.61	94.29	56.58	90.66	76.60
Ours	85.46	87.45	63.10	92.44	93.05	96.21	79.06	88.80	83.57

| Show Table

DownLoad: CSV

Observing Figure 7 we can easily see that the results of all the functions are close except for the inv function. Through Figure 9 we speculate that this is because the learning rate of the inv function decreases too fast at the beginning of the iteration, and although it can speed up the search for the optimal solution, it is also easy to ignore the optimal solution and fall into the local optimal solution, leading to relatively poor results. The other three learning rates are all gradually decreasing, and although there is a big difference in the intermediate stages, the results do not differ much. These experiments show that the learning rate strategy has some influence on the experimental results, but it is generally enough to find the learning rate with the appropriate decreasing speed, and the different learning rate functions do not differ greatly.

Figure 9. Four different learning rate functions.

DownLoad: Full-Size Img PowerPoint

5. Conclusions

In this paper, we propose a new medical image segmentation network TF-Unet. TF-Unet is built on the intertwined backbone of convolution and self-attention, which makes good use of the underlying features of CNN to build hierarchical object concepts at multiple scales through U-shaped hybrid architectural design. In addition play Transformer's powerful self-attention mechanism that entangles long-term dependencies with convolutionally extracted features to capture the global context. Based on this hybrid structure, TF-Unet has made a great progress in previous Transformer-based segmentation methods. In the future, we hope that TF-Unet can replace manual segmentation operations for cardiac modeling, effectively improve the efficiency of personalized modeling, and accelerate the development of personalized cardiac models in clinical applications.

Acknowledgments

This study was supported by the Natural Science Foundation of China (NSFC) under grant number 62171408 and 81901841, the Key Research and Development Program of Zhejiang Province under grant number 2020C03016, the Major Scientific Project of Zhejiang Lab under grant number 2020ND8AD01, and Fundamental Research Funds for the Central Universities under grant number DUT21YG102.

Conflict of interest

The authors declare that there are no conflicts of interest.

References

[1]	Ramanathan V, Feng Y (2009) Air pollution, greenhouse gases and climate change: Global and regional perspectives. Atmos Environ 43: 37-50.
[2]	Brunekreef B, Holgate ST (2002) Air pollution and health. Lancet 360: 1233-1242. doi: 10.1016/S0140-6736(02)11274-8
[3]	Burney J, Ramanathan V (2014) Recent climate and air pollution impacts on Indian agriculture. Proc Natl Acad Sci 111: 16319-16324. doi: 10.1073/pnas.1317275111
[4]	Kampa M, Castanas E (2008) Human health effects of air pollution. Environ Pollut 151: 362-367. doi: 10.1016/j.envpol.2007.06.012
[5]	Bernstein JA, Alexis N, Barnes C, et al. (2004) Health effects of air pollution. J Allergy Clin Immunol 114: 1116-1123. doi: 10.1016/j.jaci.2004.08.030
[6]	Harnung SE, Johnson MS (2012) Chemistry and the Environment: Cambridge University Press; 448 p.
[7]	WHO (2014) Burden of disease from the joint effects of Household and Ambient Air Pollution for 2012. Public Health, Social and Environmental Determinants of Health Department, World Health Organization, Geneva Switzerland.
[8]	Zhang Y, Mo J, Li Y, et al. (2011) Can commonly-used fan-driven air cleaning technologies improve indoor air quality? A literature review. Atmos Environ 45: 4329-4343.
[9]	Johnson MS, Nilsson EJ, Svensson EA, et al. (2014) Gas-phase advanced oxidation for effective, efficient in situ control of pollution. Environ Sci Technol 48: 8768-8776. doi: 10.1021/es5012687
[10]	Rohde RA, Muller RA (2015) Air pollution in China: Mapping of concentrations and sources. PloS One 10: e0135749.
[11]	Hyttinen M, Pasanen P, Björkroth M, et al. (2007) Odors and volatile organic compounds released from ventilation filters. Atmos Environ 41: 4029-4039.
[12]	Clausen G (2004) Ventilation filters and indoor air quality: a review of research from the International Centre for Indoor Environment and Energy. Indoor Air 14: 202-207. doi: 10.1111/j.1600-0668.2004.00289.x
[13]	Nøjgaard JK, Christensen KB, Wolkoff P (2005) The effect on human eye blink frequency of exposure to limonene oxidation products and methacrolein. Toxicol Lett 156: 241-251. doi: 10.1016/j.toxlet.2004.11.013
[14]	Klenø J, Wolkoff P (2004) Changes in eye blink frequency as a measure of trigeminal stimulation by exposure to limonene oxidation products, isoprene oxidation products and nitrate radicals. Int Arch Occup Environ Health 77: 235-243. doi: 10.1007/s00420-003-0502-1
[15]	Weschler CJ (2000) Ozone in indoor environments: concentration and chemistry. Indoor Air 10: 269-288.
[16]	Waring MS, Siegel JA, Corsi RL (2008) Ultrafine particle removal and generation by portable air cleaners. Atmos Environ 42: 5003-5014.
[17]	Muzenda E (2012) Pre-treatment methods in the abatement of volatile organic compounds: a discussion. Int Conf Nanotechnol Chem Eng.
[18]	Yu B, Hu Z, Liu M, et al. (2009) Review of research on air-conditioning systems and indoor air quality control for human health. Int J Refrig 32: 3-20. doi: 10.1016/j.ijrefrig.2008.05.004
[19]	Guieysse B, Hort C, Platel V, et al. (2008) Biological treatment of indoor air for VOC removal: Potential and challenges. Biotechnol Adv 26: 398-410.
[20]	Zhao J, Yang X (2003) Photocatalytic oxidation for indoor air purification: a literature review. Build Environ 38: 645-654.
[21]	Ardkapan SR, Johnson MS, Yazdi S, et al. (2014) Filtration efficiency of an electrostatic fibrous filter: Studying filtration dependency on ultrafine particle exposure and composition. J Aerosol Sci 72: 14-20. doi: 10.1016/j.jaerosci.2014.02.002
[22]	Hyttinen M, Pasanen P, Salo J, et al. (2003) Reactions of ozone on ventilation filters. Indoor Built Environ 12: 151-158. doi: 10.1177/1420326X03012003002
[23]	Sekine Y, Nishimura A (2001) Removal of formaldehyde from indoor air by passive type air-cleaning materials. Atmos Environ 35:2001-2007. doi: 10.1016/S1352-2310(00)00465-9
[24]	Khan FI, Ghoshal AK (2000) Removal of volatile organic compounds from polluted air. J Loss Prev Process Ind 13: 527-545. doi: 10.1016/S0950-4230(00)00007-3
[25]	Borhan MS, Mukhtar S, Capareda S, et al. (2012) Greenhouse gas emissions from housing and manure management systems at confined livestock operations. INTECH Open Access Publisher.
[26]	Nicolai R, Clanton C, Janni K, et al. (2006) Ammonia removal during biofiltration as affected by inlet air temperature and media moisture content. Trans ASABE 49: 1125-1138. doi: 10.13031/2013.21730
[27]	Harrop O (2001) Air quality assessment and management: A practical guide: CRC Press.
[28]	Moretti EC (2002) Reduce VOC and HAP emissions. Chem Eng Progress 98: 30-40.
[29]	Mills B (1998) Abatement of VOCs. Surf Coat Int 81: 223-229. doi: 10.1007/BF02693863
[30]	Andreozzi R, Caprio V, Insola A, et al. (1999) Advanced oxidation processes (AOP) for water purification and recovery. Catal Today 53: 51-59. doi: 10.1016/S0920-5861(99)00102-9
[31]	Wu G, Conti B, Leroux A, et al. (1999) A high performance biofilter for VOC emission control. J Air Waste Manag Assoc 49: 185-192. doi: 10.1080/10473289.1999.10463793
[32]	Leson G, Winer AM (1991) Biofiltration: an innovative air pollution control technology for VOC emissions. J Air Waste Manag Assoc 41: 1045-1054.
[33]	Ergas SJ, Schroeder ED, Chang DP, et al. (1995) Control of volatile organic compound emissions using a compost biofilter. Water Environ Res 67: 816-821. doi: 10.2175/106143095X131736
[34]	Wani AH, Branion RM, Lau AK (1997) Biofiltration: A promising and cost‐effective control technology for Odors, VOCs and air toxics. J Environ Sci Health 32: 2027-2055.
[35]	Chang J-S (2001) Recent development of plasma pollution control technology: a critical review. Sci Technol Adv Mater 2: 571-576. doi: 10.1016/S1468-6996(01)00139-5
[36]	Magureanu M, Mandache NB, Eloy P, et al. (2005) Plasma-assisted catalysis for volatile organic compounds abatement. Appl Catal B 61: 12-20.
[37]	Subrahmanyam C, Renken A, Kiwi-Minsker L (2007) Novel catalytic non-thermal plasma reactor for the abatement of VOCs. Chem Eng J 134: 78-83. doi: 10.1016/j.cej.2007.03.063
[38]	Johnson MS, Arlemark J (2012) Method and device for cleaning air. European Patent EP 2119974, 2009; International Patent Cooperation Treaty PCT/EP2009/055849, 2009; U.S. Patent 8,318,084 B2, 2011.
[39]	Li R, Palm BB, Ortega AM, et al. (2015) Modeling the Radical Chemistry in an Oxidation Flow Reactor: Radical Formation and Recycling, Sensitivities, and the OH Exposure Estimation Equation. J Phys Chem A 119: 4418-4432. doi: 10.1021/jp509534k
[40]	Jimenez J, Canagaratna M, Donahue N, et al. (2009) Evolution of organic aerosols in the atmosphere. Science 326: 1525-1529. doi: 10.1126/science.1180353
[41]	Donahue NM, Epstein S, Pandis SN, et al. (2011) A two-dimensional volatility basis set: 1. organic-aerosol mixing thermodynamics. Atmos Chem Phys 11: 3303-3318. doi: 10.5194/acp-11-3303-2011
[42]	Kroll JH, Donahue NM, Jimenez JL, et al. (2011) Carbon oxidation state as a metric for describing the chemistry of atmospheric organic aerosol. Nat Chem 3: 133-139. doi: 10.1038/nchem.948
[43]	Alfassi ZB (1997) The chemistry of free radicals: peroxyl radicals: Wiley; 546 p.
[44]	Hanst PL, Spence JW, Edney EO (1980) Carbon monoxide production in photooxidation of organic molecules in the air. Atmos Environ (1967) 14: 1077-1088. doi: 10.1016/0004-6981(80)90038-4
[45]	Jacob D (1999) Introduction to Atmospheric Chemistry: Princeton University Press; 267 p.
[46]	Atkinson R, Baulch D, Cox R, et al. (2006) Evaluated Kinetic and Photochemical Data for Atmospheric Chemistry: Volume II–Gas Phase Reactions of Organic Species. Atmos Chem Phys 6: 3625-4055. doi: 10.5194/acp-6-3625-2006
[47]	Sharkey TD, Yeh S (2001) Isoprene emission from plants. Ann Rev Plant Biol 52: 407-436. doi: 10.1146/annurev.arplant.52.1.407
[48]	Nilsson EJK, Eskebjerg C, Johnson MS (2009) A photochemical reactor for studies of atmospheric chemistry. Atmos Environ 43: 3029-3033.
[49]	Atkinson R (1986) Kinetics and mechanisms of the gas-phase reactions of the hydroxyl radical with organic compounds under atmospheric conditions. Chem Rev 86: 69-201. doi: 10.1021/cr00071a004
[50]	Khamaganov VG, Hites RA (2001) Rate constants for the gas-phase reactions of ozone with isoprene, α-and β-pinene, and limonene as a function of temperature. J Phys Chem A 105: 815-822. doi: 10.1021/jp002730z
[51]	DeMore WB, Sander SP, Golden D, et al. (1997) Chemical kinetics and photochemical data for use in stratospheric modeling. Evaluation number 12; NASA panel for data evaluation. JPL Publ 97-412.
[52]	Toby S, Van de Burgt L, Toby F (1985) Kinetics and chemiluminescence of ozone-aromatic reactions in the gas phase. J Phys Chem 89: 1982-1986. doi: 10.1021/j100256a034
[53]	Atkinson R (2003) Kinetics of the gas-phase reactions of OH radicals with alkanes and cycloalkanes. Atmos Chem Phys 3: 2233-2307. doi: 10.5194/acp-3-2233-2003
[54]	Atkinson R, Baulch D, Cox R, et al. (2004) Evaluated kinetic and photochemical data for atmospheric chemistry: Volume I-gas phase reactions of Ox, HOx, NOx and SOx species. Atmos Chem Phys 4: 1461-1738. doi: 10.5194/acp-4-1461-2004
[55]	Król S, Namieśnik J, Zabiegała B (2014) α-Pinene, 3-carene and d-limonene in indoor air of Polish apartments: the impact on air quality and human exposure. Sci Total Environ 468: 985-995.
[56]	Yu Y, Ezell MJ, Zelenyuk A, et al. (2008) Photooxidation of α-pinene at high relative humidity in the presence of increasing concentrations of NOx. Atmos Environ 42: 5044-5060. doi: 10.1016/j.atmosenv.2008.02.026
[57]	Jenkin ME, Shallcross DE, Harvey JN (2000) Development and application of a possible mechanism for the generation of cis-pinic acid from the ozonolysis of α-and β-pinene. Atmos Environ 34: 2837-2850.
[58]	Camredon M, Hamilton J, Alam M, et al. (2010) Distribution of gaseous and particulate organic composition during dark α-pinene ozonolysis. Atmos Chem Phys 10: 2893-2917. doi: 10.5194/acp-10-2893-2010
[59]	Capouet M, Müller JF, Ceulemans K, et al. (2008) Modeling aerosol formation in alpha‐pinene photo‐oxidation experiments. J Geophys Res 113.
[60]	Capouet M, Peeters J, Noziere B, et al. (2004) Alpha-pinene oxidation by OH: simulations of laboratory experiments. Atmos Chem Phys 4: 2285-2311. doi: 10.5194/acp-4-2285-2004
[61]	WHO (2002) IARC monographs of the evaluation carcinogenic risk to humans. Vol.3, some traditional herbal medicines, some mycotoxins, naphthalene and styrene. Lyon (France): IARC Press; 590 p.
[62]	Rodins V. Photochemical air purification, MSc Thesis, Department of Chemistry, University of Copenhagen; 2013. 80 p.
[63]	Staples E, Zeiger K (2007) On-site Measurement of VOCs and Odors from Metal Casting Operations Using an Ultra-Fast Gas Chromatograph. Electronic Sensor Technology, Inc, USA.
[64]	Fatta D, Marneri M, Papadopoulos A, et al. (2004) Industrial pollution and control measures for a foundry in Cyprus. J Clean Prod 12: 29-36. doi: 10.1016/S0959-6526(02)00180-4
[65]	Kim KY, Ko HJ, Kim HT, et al. (2008) Quantification of ammonia and hydrogen sulfide emitted from pig buildings in Korea. J Environ Manag 88: 195-202. doi: 10.1016/j.jenvman.2007.02.003
[66]	Hobbs P, Misselbrook T, Cumby T (1999) Production and emission of odours and gases from ageing pig waste. J Agric Eng Res 72: 291-298. doi: 10.1006/jaer.1998.0372
[67]	Aladedunye FA, Przybylski R (2009) Degradation and nutritional quality changes of oil during frying. J Am Oil Chem Soc 86: 149-156. doi: 10.1007/s11746-008-1328-5
[68]	Ghoshal A, Manjare S (2002) Selection of appropriate adsorption technique for recovery of VOCs: an analysis. J Loss Prev Process Ind 15: 413-421. doi: 10.1016/S0950-4230(02)00042-6

This article has been cited by:

1.	Kelei He, Chen Gan, Zhuoyuan Li, Islem Rekik, Zihao Yin, Wen Ji, Yang Gao, Qian Wang, Junfeng Zhang, Dinggang Shen, Transformers in medical image analysis, 2023, 3, 26671026, 59, 10.1016/j.imed.2022.07.002
2.	Xiang Li, Minglei Li, Pengfei Yan, Guanyi Li, Yuchen Jiang, Hao Luo, Shen Yin, Deep Learning Attention Mechanism in Medical Image Analysis: Basics and Beyonds, 2023, 2653-6226, 10.53941/ijndi0201006
3.	Jasmine El-Taraboulsi, Claudia P. Cabrera, Caroline Roney, Nay Aung, Deep neural network architectures for cardiac image segmentation, 2023, 4, 26673185, 100083, 10.1016/j.ailsci.2023.100083
4.	Md Rabiul Islam, Marwa Qaraqe, Erchin Serpedin, CoST-UNet: Convolution and swin transformer based deep learning architecture for cardiac segmentation, 2024, 96, 17468094, 106633, 10.1016/j.bspc.2024.106633
5.	Hamed Aghapanah, Reza Rasti, Saeed Kermani, Faezeh Tabesh, Hossein Yousefi Banaem, Hamidreza Pour Aliakbar, Hamid Sanei, William Paul Segars, CardSegNet: An adaptive hybrid CNN-vision transformer model for heart region segmentation in cardiac MRI, 2024, 115, 08956111, 102382, 10.1016/j.compmedimag.2024.102382
6.	Fan Yang, Fan Wang, Pengwei Dong, Bo Wang, HCA-former: Hybrid Convolution Attention Transformer for 3D Medical Image Segmentation, 2024, 90, 17468094, 105834, 10.1016/j.bspc.2023.105834
7.	Jiaji Wang, Shuihua Wang, Yudong Zhang, Deep learning on medical image analysis, 2024, 2468-2322, 10.1049/cit2.12356
8.	Swetha S, Aasiya Rafee, Manjula S H, Venugopal K R, 2023, Optimizing Left Atrium Segmentation: A Modified U-NET Architecture with MRI Image Slicing, 979-8-3503-1416-8, 1, 10.1109/ICDDS59137.2023.10434364
9.	Haiyan Song, Cuihong Liu, Shengnan Li, Peixiao Zhang, TS-GCN: A novel tumor segmentation method integrating transformer and GCN, 2023, 20, 1551-0018, 18173, 10.3934/mbe.2023807
10.	Bettina Baeßler, Sandy Engelhardt, Amar Hekalo, Anja Hennemuth, Markus Hüllebrand, Ann Laube, Clemens Scherer, Malte Tölle, Tobias Wech, Perfect Match: Radiomics and Artificial Intelligence in Cardiac Imaging, 2024, 17, 1941-9651, 10.1161/CIRCIMAGING.123.015490
11.	Sarv Priya, Durjoy D. Dhruba, Sarah S. Perry, Pritish Y. Aher, Amit Gupta, Prashant Nagpal, Mathews Jacob, Optimizing Deep Learning for Cardiac MRI Segmentation: The Impact of Automated Slice Range Classification, 2024, 31, 10766332, 503, 10.1016/j.acra.2023.07.008
12.	Narjes Benameur, Ramzi Mahmoudi, Mohamed Deriche, Amira Fayouka, Imene Masmoudi, Mohamed Hedi Jemaa, Nessrine Zoghlami, 2024, Automatic Deep Learning-based Myocardial Contours Segmentation from Cine MRI Images, 979-8-3503-7413-1, 76, 10.1109/SSD61670.2024.10548280
13.	Qile Zhang, Xiaoliang Jiang, Xiuqing Huang, Chun Zhou, MSR-Net: Multi-Scale Residual Network Based on Attention Mechanism for Pituitary Adenoma MRI Image Segmentation, 2024, 12, 2169-3536, 119371, 10.1109/ACCESS.2024.3449925
14.	Shuai Zhang, Yanmin Niu, LcmUNet: A Lightweight Network Combining CNN and MLP for Real-Time Medical Image Segmentation, 2023, 10, 2306-5354, 712, 10.3390/bioengineering10060712
15.	Xiaoke Lan, Honghuan Chen, Wenbing Jin, DRI-Net: segmentation of polyp in colonoscopy images using dense residual-inception network, 2023, 14, 1664-042X, 10.3389/fphys.2023.1290820
16.	Giorgos Papanastasiou, Nikolaos Dikaios, Jiahao Huang, Chengjia Wang, Guang Yang, Is Attention all You Need in Medical Image Analysis? A Review, 2024, 28, 2168-2194, 1398, 10.1109/JBHI.2023.3348436
17.	Wenli Cheng, Jiajia Jiao, An adversarially consensus model of augmented unlabeled data for cardiac image segmentation (CAU⁺), 2023, 20, 1551-0018, 13521, 10.3934/mbe.2023603
18.	Gang Lin, 2023, Segmentation of Cardiac MRI Images Based on nnU-Net, 978-1-6654-5703-3, 103, 10.1109/IPEC57296.2023.00026
19.	David Pospíšil, Monika Míková, Jan Řehoř, Kristýna Hochmanová, Veronika Bulková, Nela Hejtmánková, Biomedical engineering in contemporary Czech cardiology, 2024, 23, 1213807X, 110, 10.36290/kar.2024.023
20.	Tian Ma, JiaLi Qian, Xue Qin, DanDan Yu, 2024, Lightweight Dental Lesion Segmentation Algorithm Based on Transformer and CNN, 979-8-3503-8599-1, 183, 10.1109/ICIVC61627.2024.10837496
21.	Yaliang Tong, Kun Liu, Yuquan He, Ping Yang, CSS‐UNet: Convolution‐State Space Enhanced UNet for Cardiac MRI Segmentation, 2025, 61, 0013-5194, 10.1049/ell2.70175
22.	Ovais Iqbal Shah, Danish Raza Rizvi, Aqib Nazir Mir, Transformer-Based Innovations in Medical Image Segmentation: A Mini Review, 2025, 6, 2661-8907, 10.1007/s42979-025-03929-y

Reader Comments

Your name:*

Email:*
© 2016 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Environmental Science

1.6 2.9

Metrics

Article views(10581) PDF downloads(1927) Cited by(17)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

AIMS Environmental Science

Gas-phase advanced oxidation as an integrated air pollution control technique