Research article

Torque and d-q axis current dynamics of an inverter fed induction motor drive that leverages computational intelligent techniques


  • Emphasizing the significance of Model Predictive Control (MPC) in modern optimization of control systems, the proposed research distinctively highlights its predictive prowess through the application of current state variables and well-structured mathematical models. We introduced a Predictive Current Control (PCC) strategy applied to a Three-Phase Inverter-fed Induction Motor (IM), with a particular focus on the Sequential Model methodology. The Sequential Model MPC algorithm employed a cost functional approach, predicated on the square of the discrepancy between reference and stator-measured currents of the IM in d-q reference frame. This method, implemented and tested in both MATLAB/Simulink and Python environments, utilized a minimization principle to guide the switching states of the inverter, thereby ensuring the accuracy of voltage signals for the induction motor. The projected study further included a comparative analysis of the electromagnetic torque, load currents, rotor speed, and angle deviations derived from the Sequential Model with those obtained through the Ant Colony Optimization (ACO) and Nelder-Mead methods. The results distinctly illustrated the robust adaptability of the Sequential Model methodology, outperforming the ACO and Nelder-Mead techniques in certain aspects such as minimum current errors, better speed regulations, and rotor angle trajectories.

    Citation: Shaswat Chirantan, Bibhuti Bhusan Pati. Torque and d-q axis current dynamics of an inverter fed induction motor drive that leverages computational intelligent techniques[J]. AIMS Electronics and Electrical Engineering, 2024, 8(1): 28-52. doi: 10.3934/electreng.2024002

    Related Papers:

    [1] Jason Adams, Yumou Qiu, Luis Posadas, Kent Eskridge, George Graef . Phenotypic trait extraction of soybean plants using deep convolutional neural networks with transfer learning. Big Data and Information Analytics, 2021, 6(0): 26-40. doi: 10.3934/bdia.2021003
    [2] Jian-Bing Zhang, Yi-Xin Sun, De-Chuan Zhan . Multiple-instance learning for text categorization based on semantic representation. Big Data and Information Analytics, 2017, 2(1): 69-75. doi: 10.3934/bdia.2017009
    [3] Cun Chen, Hui Peng . Dynamic mode decomposition for blindly separating mixed signals and decrypting encrypted images. Big Data and Information Analytics, 2024, 8(0): 1-25. doi: 10.3934/bdia.2024001
    [4] M Supriya, AJ Deepa . Machine learning approach on healthcare big data: a review. Big Data and Information Analytics, 2020, 5(1): 58-75. doi: 10.3934/bdia.2020005
    [5] Jiaqi Ma, Hui Chang, Xiaoqing Zhong, Yueli Chen . Risk stratification of sepsis death based on machine learning algorithm. Big Data and Information Analytics, 2024, 8(0): 26-42. doi: 10.3934/bdia.2024002
    [6] Xiangmin Zhang . User perceived learning from interactive searching on big medical literature data. Big Data and Information Analytics, 2017, 2(3): 239-254. doi: 10.3934/bdia.2017019
    [7] Nickson Golooba, Woldegebriel Assefa Woldegerima, Huaiping Zhu . Deep neural networks with application in predicting the spread of avian influenza through disease-informed neural networks. Big Data and Information Analytics, 2025, 9(0): 1-28. doi: 10.3934/bdia.2025001
    [8] Cai-Tong Yue, Jing Liang, Bo-Fei Lang, Bo-Yang Qu . Two-hidden-layer extreme learning machine based wrist vein recognition system. Big Data and Information Analytics, 2017, 2(1): 59-68. doi: 10.3934/bdia.2017008
    [9] Tieliang Gong, Qian Zhao, Deyu Meng, Zongben Xu . Why Curriculum Learning & Self-paced Learning Work in Big/Noisy Data: A Theoretical Perspective. Big Data and Information Analytics, 2016, 1(1): 111-127. doi: 10.3934/bdia.2016.1.111
    [10] Minlong Lin, Ke Tang . Selective further learning of hybrid ensemble for class imbalanced increment learning. Big Data and Information Analytics, 2017, 2(1): 1-21. doi: 10.3934/bdia.2017005
  • Emphasizing the significance of Model Predictive Control (MPC) in modern optimization of control systems, the proposed research distinctively highlights its predictive prowess through the application of current state variables and well-structured mathematical models. We introduced a Predictive Current Control (PCC) strategy applied to a Three-Phase Inverter-fed Induction Motor (IM), with a particular focus on the Sequential Model methodology. The Sequential Model MPC algorithm employed a cost functional approach, predicated on the square of the discrepancy between reference and stator-measured currents of the IM in d-q reference frame. This method, implemented and tested in both MATLAB/Simulink and Python environments, utilized a minimization principle to guide the switching states of the inverter, thereby ensuring the accuracy of voltage signals for the induction motor. The projected study further included a comparative analysis of the electromagnetic torque, load currents, rotor speed, and angle deviations derived from the Sequential Model with those obtained through the Ant Colony Optimization (ACO) and Nelder-Mead methods. The results distinctly illustrated the robust adaptability of the Sequential Model methodology, outperforming the ACO and Nelder-Mead techniques in certain aspects such as minimum current errors, better speed regulations, and rotor angle trajectories.



    Brain tumors are abnormal cell growths located in or near brain tissue that damage the nervous system, causing symptoms such as headaches, dizziness, dementia, seizures, and other neurological signs [1]. Magnetic resonance imaging (MRI)—including T1-weighted (T1), post-contrast T1-weighted (T1CE), T2-weighted (T2), and fluid-attenuated inversion recovery (FLAIR) sequences—is a prevalent diagnostic tool for brain tumors due to its sensitivity to soft tissue and high image contrast, as shown in Figure 1. Physicians utilize MRI for lesion diagnosis, but accuracy can be hindered by factors such as fatigue and emotional state. Automated methods have garnered extensive attention in the medical field due to their capability to objectively and accurately analyze imaging information.

    Figure 1.  The samples of four MRI modalities and ground-truth of brain tumors image. FLAIR, T1, T1CE, T2 represent the four input samples respectively, and GT represents the ground truth.

    Most multimodal approaches assume complete data availability; however, in reality, missing modalities are common. As illustrated in Figure 2, various missing scenarios can occur during both training and inference stages. The absence of certain MRI sequences may fail to capture tumor characteristics, thereby limiting a comprehensive understanding of the tumor [2]. Therefore, it is crucial for multimodal learning methods to maintain robustness in the presence of missing modalities during inference.

    Figure 2.  We compared our method with others in terms of incomplete modality scenarios encountered during training and testing. While other methods utilize a complete dataset for training and a dataset with missing modalities for testing, our method employs datasets with missing modalities for both training and testing.

    Currently, a prevalent approach to tackle segmentation arising from missing modality is knowledge distillation [3,4], where information is transferred from a teacher-student network to recover missing data, but this can be computationally intensive. Another method is image synthesis [5], leveraging generative models to reconstruct the missing data. However, synthetic images may introduce noise to the task. Additionally, mapping available modalities into a common latent subspace aims to compensate for or recover the missing information [6,7,8]. However, existing approaches often require training multiple sets of parameters to address various missing modality scenarios, thereby escalating the model's complexity and computational overhead.

    With the expansion of data scale and enhancement of computational resources, researchers favor general neural networks for diverse tasks, minimizing the need for task-specific model design and training. Recently, transformer [9] has shown great potential in natural language processing, visual recognition, intensive prediction. However, its complex architecture and high computational demands limit comprehensive fine-tuning for downstream tasks, especially accurate segmentation, potentially leading to overfitting and reduced generalization ability.

    Inspired by recent advancements in prompt learning [10,11,12] and efficient fine-tuning techniques [13,14,15], we introduce a novel brain tumor segmentation framework, called DPONet. This framework employs an encoder-decoder structure for the segmentation network, enhancing performance in both incomplete and complete modality scenarios. Specifically, we leverage image frequency information as frequency filtering prompt (FFP) to facilitate the pre-trained model in extracting discriminative features. Furthermore, by learning a series of spatial perturbation prompt (SPP), we map these discriminative features into a common latent space, mitigating the challenges of modality fusion in the decoder. Finally, we validate the robustness of our approach on two commonly used public datasets. To sum up, our main contributions are as follows:

    ● We propose a new framework for incomplete-modal image segmentation that effectively handles common cases of missing modalities. This approach requires only 7% of the trainable parameters to adjust the pre-trained model, thereby avoiding the heavy fine-tuning typically necessary for transformers.

    ● We introduce a frequency filtering prompt to extract spatial frequency components from images. This method addresses the model's oversight of target domain features and enhances its adaptation to brain tumor datasets.

    ● We propose a spatial perturbation prompt that incorporates learnable parameters into a spatial modulation model. This aims to achieve consistent multimodal feature embeddings even in the presence of missing partial modalities.

    Incomplete multimodal learning refers to scenarios in multimodal learning tasks where partial modality information is missing or incomplete. This issue becomes particularly prominent in brain tumor segmentation tasks, where medical imaging data is typically composed of multiple MRI sequences. The absence of one modality results in the challenge of incomplete modality information learning. Many studies [16,17,18] are devoted to solving this problem, demonstrating impressive performance in various incomplete multimodal learning tasks. Zhou et al. [16] showed that there exists a certain correlation within the latent representations of modalities, which can be utilized to describe missing modalities by calculating the correlation between modalities in a latent space. Ting et al. [17] combines available modality information to estimate the latent features of missing modalities. Liu et al. [18] explicitly considers the relationship between modalities and regions, giving different attention to different modalities for each region. However, these models require full fine-tuning of the pre-trained model, which increases the computational burden and leads to a decrease in generalization ability.

    The task of most neural networks is to learn the optimal points in functions. Fourier Transform establishes the transformation relationship between the function in the spatial domain and the frequency domain, so that we can analyze a function by the frequency component to approximate the objective function more effectively [19]. The frequency of an image represents the intensity of gray change in the image. Fourier transform analyzes the features by analyzing the coefficients of each frequency component [20]. The performance of computer vision models is significantly affected by the Fourier statistical properties of the training data and show a certain sensitivity to the Fourier basis direction, and their robustness can be improved by learning this sensitivity [21]. For example, Fang et al. [22] and Xu et al. [23] argued that different parts of the same organ in MRI images exhibit regularity and that high-frequency structural information can more effectively capture these similarities and regularities.

    Prompt learning is an effective transfer learning approach in natural language processing [10,24,25], which fine-tunes pre-trained models on source tasks by embedding contextual prompts. Recently, prompts have also been employed in computer vision tasks [26,27,28] and multimodal learning tasks [11,29,30], introducing self-adaptation in the input space to optimize the target task. For instance, Jia et al. [26] proposed the Pyramid Vision Transformer model (PVT), achieving downstream performance comparable to full fine-tuning by adding a small number of learnable prompt embeddings on the patch embedding. Different from the PVT model, Bahng et al. [27] further proposed a method to learn a single disturbance to adjust the pixel space and affect the model output. These studies suggest that continuously adjusting and optimizing prompts can enhance the adaptability of model. Lee et al. [29] treats different scenarios of missing modalities as different types of inputs and employs learnable prompts to guide the predictions of model under various missing conditions. Qiu et al. [30] utilizes an intermediate classifier to generate a prompt for each missing scenario based on intermediate features for segmentation prediction. The difference is that our work does not require learning a set of prompts for each missing scenario but aims to learn generic visual prompts and generalize them to modulate feature space in missing scenes.

    In this paper, we focus on brain tumor segmentation under common missing modality scenarios. We simulate real-world data incompleteness by assuming absences of one or multiple modalities (Figure 2). Additionally, due to the difficulty of fully training a pre-trained transformer with limited computational resources, we design a discriminative prompt optimization network that avoids fine-tuning the entire pre-trained model. In this section, we will elaborate on the framework and its components.

    The pyramid vision transformer (PVT) [31] introduces a progressive shrinking strategy within the transformer block to control the scale of feature maps for dense prediction tasks. We chose the backbone is initialized with the weights pre-trained on ImageNet. PVT comprises four stages, each consisting of a patch embedding layer and l transformer encoder layers, which generate feature maps of different scales. Given an input image XRH×W×C, the patch embedding layer divides the sample X into HWpi non-overlapping patches, where pi represents the patch size of the i-th layer. As the stage progresses, the patch size decreases accordingly. The flattened patches are then fed into a linear projection to obtain embedded patches. The embedded patches, along with positional embedding information, are subsequently input into the transformer encoder to produce a feature map x of size Hpi×Wpi×C.This process can be described as follows:

    xl=MLP(LN(SRA(xl1))), (3.1)

    where xl1 represents the feature map output from the previous layer, SRA() denotes the spatial reduction attention proposed in PVT, and LN() and MLP() refer to normalization and multi-layer perceptron operations, respectively. SRA is like multi-head attention. The formula is as follows:

    SQA=Attention(QWQ,SRA(K)WK,SR(V)WV), (3.2)

    where WQ, WK, and WV are the parameters of the linear projections. SRA() is used to reduce the spatial dimension. This can be expressed as:

    SRA(x)=LN(Reshape(xi,ri)WS), (3.3)

    The ri represents the feature map reduction rate for stage i.

    The Reshape() operation reshapes the input xRhi×wi×ci to hiwir2i×(r2ici)). The WS is a linear projection that reduces the dimensionality of the input. The attention calculation is as follows:

    Attention(q,k,v)=Softmax(qkTd)v, (3.4)

    where q, k and v are the query, key, and value transform matrices, and d is the dimension.

    We consider a multimodal dataset consisting of N(N=4) modalities, M= FLAIR, T1CE, T1 and T2. The dataset is denoted as D=D14,D13,,Di,,D0, where D14 represents the complete set of modalities, and other sets represent missing modalities subsets, such as D0=XF0,XT1c0,XT10,XT21 indicating only T2 mode is available. Xmk represents the input sample, where m denotes the modality type, and k represents the modal state. For the model, it is unaware of which specific modality is missing, therefore, we introduce placeholder values (set to 0) to assign to the missing modality data XF0,XT1c0,XT10,XT20 to ensure the format of the multimodal input.

    We propose a novel discriminative prompt optimization network, as shown in Figure 3, which aims to provide natural insertion points for intermediate features of the network while preserving the integrity of the pre-trained model and enabling fine-tuning for downstream tasks. We adapt a pre-trained transformer as feature extractor and keep it frozen during training. Multimodal images D={Xkm}k=[0,1] are fed into four extractors, and task-relevant information is aggregated through discriminative prompts to fully exploiting the discriminative features. Next, a spatial perturbation prompt module is introduced, which hierarchically fuses the discriminative features of available modalities and maps them to a shared feature representation space to learn cross-modal shared information. Furthermore, the fused features are mapped back to the original input size through up-sampling in the decoder, and the resulting segmentation masks are obtained from these feature maps. Notably, during training, the trainable parameters are confined to the prompt components and the decoder.

    Figure 3.  The proposed DPONet framework. It takes MRI images as input. Each image is combined with frequency filtering prompts and fed into a pre-trained transformer network to extract discriminative features. Subsequently, the intermediate features extracted by four encoders are integrated with spatial perturbation prompts to learn consistent features within a shared latent space. Finally, the fused discriminant features and consistent features are input into the decoder to get the segmentation map.

    The frequency filtering prompt method, as illustrated in Figure 4, utilizes Fourier transform to extract frequency features and jointly modulates the intermediate features with image embeddings. The frequency processing method decomposes images into different frequency components, which are distributed across different spatial locations of the image, encouraging the model to focus on critical information of the image [21]. The core idea is to remodulate the intermediate features using frequency domain information, shifting the distribution from the pre-trained dataset to the target dataset. Furthermore, since there may be commonalities between features of different modalities, even if image data from a particular modality is missing, the remaining modalities still contain corresponding frequency information, which enhances the robustness of the information to a certain extent. Taking a single branch as an example, for a given image, we apply the fast Fourier transform (FFT) along the spatial dimension to obtain frequency components corresponding to different spatial locations. FFT is applied to each channel to convert the spatial domain representation into a frequency representation in the frequency domain, and filtering operations are performed in the frequency domain. Then, an attention mask is learned in the frequency domain to analyze the dominant frequency components in the feature map. Finally, the feature representation is transformed back to the spatial domain using inverse FFT (iFFT). The transformation from the spatial domain to the frequency domain is expressed as follows:

    F(x)(μ,υ)=H1h=0W1w=0x(h,w)ei2π(hμH+wυW), (3.5)
    Figure 4.  The architecture of the proposed frequency filtering prompt (FFP). The image is mapped into patch embeddings through a linear layer. The frequency filtering prompt method splits these embeddings into two branches for processing. One branch undergoes frequency filtering operations to obtain high-frequency features, while the other branch remains unprocessed. The combination of these two branches will generate prompts through an adaptor. The frequency filtering prompt and the image embeddings go through transformer blocks to extract discriminative features.

    After obtaining the frequency representation, different frequency components are modulated by filtering through the attention mechanism. Specifically, the attention mechanism compresses information across channels through convolution and a sigmoid function. The expression of the frequency filtering mechanism is as follows:

    F(x)=Fxσ(conv([AvgPool(Fx),Maxpool(Fx)])), (3.6)

    where, σdenotes the Sigmoid function, AvgPool() and MaxPool() represent the average pooling and max pooling operations respectively.

    Finally, the inverse FFT is used to transform back to the spatial domain features:

    x(h,w)=1HWH1h=0W1w=0F(x)ei2π(hμH+wυW), (3.7)

    Inspired by AdaptFormer [32], we employ a frequency enhancement adaptor, a bottleneck structure that limits the number of parameters. It takes the combination of filtered frequency features and image features as input and generates relevant frequency prompts through a down-projection layer, a lightweight multi-layer perceptron, and an up-projection layer. Formally, this process can be expressed as:

    pif=MLPup(GELU(MLPidown(x+x))), (3.8)

    Thirdly, the generated prompts are appended to the transformer layers to facilitate the model in learning more representative and discriminative image features.

    To enable the model to handle missing modalities, we employ null values for filling, however, such null values are likely to disturb the feature space and result in failure of modal feature fusion. Therefore, we propose learnable spatial perturbation prompts, as show in Figure 5, aiming to learn a task-specific visual prompt (P) within a latent space that encourages the sharing of cross-modal information. Prompts interact dynamically with input features, facilitating adaptive modal fusion rather than simply injecting fixed information using learning prompts.

    Figure 5.  The architecture of the proposed spatial perturbation prompt (SPP). Intermediate features and prompt embeddings are combined with input into the transformer block and utilizing consistency loss to facilitate the learning of prompts.

    First, the extracted discriminative features are concatenated through element-wise addition fic=[fif,fit1c,fit1,fit2] and then passed through a 3 × 3 convolutional layer followed by a Sigmoid activation function to generate prompt weights ωi[0,1]. These weights describe the importance of each spatial data point in the input. Inspired by EVP [27], we add random visual embeddings of the same size as the transformer tokens, train only these random embeddings in the training phase, and the trained visual prompts as the guidance for the model, denoted as Fi=(Fitoken,pim). The process can be described as:

    ωi=σ(conv([fif,fit1c,fit1,fit2])), (3.9)
    pim=conv(Nc=1ωipic), (3.10)
    Fi=transformer(fic+pim), (3.11)

    where, σ is the Sigmoid function. Finally, the cross-modal information features (F) are fed into Transformer encoder block to establish cross-modal long-range dependencies.

    We introduce a consistency loss to optimize the prompts to capture task-shared knowledge and transform it into representations that are beneficial for the task. Specifically, we map the feature maps obtained from the transformer encoder stages to the same size as the input image and use mean squared error ensuring that the model learns coherent and consistent information at each stage. Note that, since shallower layers may lack sufficient semantic information, we apply the consistency loss only in the last two stages of the transformer encoder.

    Lm=1NNi=1Mm=1(ˆfifmi)2, (3.12)

    where, N is the number of samples, M is the number of decoder layers, and the rescaled features of images in transformer layer m, and their average is denoted as ˆfi=1mmk=1fki.

    In addition, we mapped the feature map into a segmentation map, and calculated Dice loss from the ground truth to prompt the model capture consistent feature representations.

    Ld=1NNi=1Mm=1Dice(yif(xim)), (3.13)

    where, yi denotes the ground-truth labels of the image xi, and f(xim) denotes the prediction corresponding to the m-th layer features of the image.

    The feature consistency loss and prediction consistency loss are combined to supervise prompt generation.

    Lc=γLm+(1γ)Ld, (3.14)

    where, γ is the weight parameter used to balance the two losses. We experiment with different values of γ and found that γ=0.3 gives the best result.

    The convolutional decoder gradually restores the spatial resolution from the fused features to the original segmentation space. The convolutional decoder employs skip connections to merge features from different modalities at specific hierarchical levels into the encoder, to preserve more low-level details. Therefore, the overall processing steps are as follows:

    Di=conv(upsample(conv(fic,Di1))), (3.15)

    where Di is the feature map from the i-th layer of the convolutional decoder, and fic is the combined feature from multiple encoder layers.

    We employ a hybrid loss to measure the difference between the predictions and the ground truth. Dice Loss is used to calculate the similarity between the predicted segmentation result and the true segmentation result. Cross-entropy loss measures the prediction performance by quantifying the difference between the predicted probability distribution and the true probability distribution. Gradients are calculated based on the feedback of the sum of the two losses to update the parameters. The definition is as follows:

    LDice=2Niyif(xi)Niyi+Nif(xi), (3.16)
    LCE=Niyilogp(f(xi)), (3.17)

    where f(xi) and yi represent the prediction and ground-truth labels, respectively. Besides, N is the number of pixels, p() is the SoftMax of the prediction. Last, our hybid loss function Lseg can be given by

    Lseg=Lc+LDice+LCE, (3.18)

    We use two public datasets from the Multimodal Brain Tumor Segmentation Challenge (BraTS) to demonstrate the effectiveness of the proposed method, BraTS 2018 and BraTS 2020 [33,34,35]. BraTS 2018 contains 285 cases of patients for training, while BraTS 2020 includes 369 cases for training and 125 for validation. In these datasets, each case comprises four MRI modalities: Flair, T1ce, T1, and T2. The volume of each modality is 240 × 240 × 155, aligned within the same spatial space. Medical experts provide manual pixel-level annotations of three mutually inclusive tumor regions in each image, namely, whole tumor (WT), tumor core (TC), and enhancing tumor (ET). WT encompasses all tumor tissues, while TC comprises ET, necrosis, and non-enhancing tumor core.

    Data preprocessing is performed on the two datasets before training. For each dataset, we slice along the axial plane of the 3D medical images. To eliminate non-informative slices and irrelevant background regions, thereby saving training efficiency and time, we use central slices as the training data and reshape each 2D slice to 224 × 224. We design a simulation method for missing modalities. The MRI modalities are randomly removed from the input. The missing modality can be any one or multiple modalities, and the missing rate for each modality is random. The purpose of this is to simulate the scenario where missing modalities may occur in real-world situations.

    In this study, our method is implemented in Pytorch utilizing a single NVIDIA Tesla V100 32 GB GPU. We adopt the U-Net architecture composed of transformer blocks as the benchmark, and the transformer is pre-trained on ImageNet-1K. We utilize the SGD optimizer with an initial learning rate of 0.01. After many experiments and parameter tuning, we set our model to train 100 epochs with an initial learning rate of 1e2 and a batch size of 12. For the segmentation task, we use the Dice coefficient (which computes the similarity of two sets), the Hausdorff distance (HD95, which measures the distance between two sets), and the sensitivity (the ratio of the number of positive samples correctly identified by the model to the number of all true positive samples) as performance metrics to evaluate various methods.

    We focus on exploring the robustness of discriminative optimization networks to general incompleteness in multimodal image without fine-tuning the entire pretraind model. In this chapter, we first introduce the excellent results obtained by our method. Subsequently, a series of ablation experiments on the proposed components. Considering that the BraTS 2020 dataset contains many patient cases and is representative, we experimented with it in the ablation study.

    As shown in Table 1, our method achieves remarkable performance in Dice score on both the modality-complete and modality-missing scenarios. For example, our proposed approach has significantly better mean Dice scores for whole tumors, tumor cores, and enhanced tumors than suboptimal approaches. From the experimental results in Table 2, we observed that the baseline model generally exhibited unsatisfactory performance on the T1 modality. However, our model achieved significant improvements in this aspect, effectively enhancing the performance under the T1 modality. In Figures 6 and 7, we present the visualization of segmentation results. Furthermore, Table 3 clearly exhibits that our method outperforms other approaches in terms of HD95 and sensitivity under complete modality testing, further validating the superior performance of our approach.

    Table 1.  Quantitative results of state-of-the-art unified models (Ding [36], Zhang [37], Ting [17], Qiu [30]), and our DPONet on the BraTS2020 dataset. indicates available modalities. Bold indicates optimal, underline indicates sub-optimal.
    Modalities Dice (%)
    Complete Core Enhancing
    F T1 T1c T2 D Z T Q Our D Z T Q Our D Z T Q Our
    86.1 86.1 86.5 86.7 93.9 71.0 70.9 71.5 71.0 93.3 46.3 46.3 45.6 47.2 76.1
    76.8 78.5 77.4 79.5 91.6 81.5 84.0 83.4 84.3 95.3 74.9 80.1 78.9 81.4 88.4
    77.2 78.0 78.1 79.5 89.1 66.0 65.9 66.8 67.7 91.9 37.3 38.0 41.3 39.1 71.6
    87.3 87.4 89.1 86.9 95.2 69.2 68.8 69.3 69.9 93.5 38.2 42.4 43.6 42.8 74.6
    87.7 87.8 88.4 88.4 94.5 83.5 84.8 86.4 86.3 95.8 75.9 79.4 81.7 80.1 88.9
    81.1 81.8 81.2 83.1 92.1 83.4 83.6 85.2 85.8 95.4 78.0 80.1 79.2 81.7 88.3
    89.7 89.8 89.9 89.8 95.5 73.1 73.8 73.9 74.4 94.3 41.0 45.9 48.2 46.8 77.3
    87.7 87.8 88.0 87.9 94.4 73.1 73.4 73.3 72.9 94.1 45.7 46.8 50.1 47.3 77.5
    89.9 89.9 90.5 90.1 95.5 74.1 74.6 75.5 74.5 94.1 49.3 48.6 48.6 49.5 76.6
    89.9 89.3 90.0 90.0 95.6 84.7 84.8 85.5 86.6 95.9 76.7 81.9 81.8 81.2 88.9
    90.7 90.1 90.7 90.6 95.6 85.1 85.2 86.5 86.7 95.8 76.8 82.1 81.8 81.8 88.8
    90.6 90.6 90.3 90.6 95.7 75.2 75.6 75.9 75.8 94.7 49.9 50.3 52.5 51.1 78.0
    90.7 90.4 90.6 90.8 95.8 85.0 85.3 86.4 86.4 96.0 77.1 78.7 81.0 80.0 88.9
    88.3 88.2 88.7 88.9 94.6 83.5 84.2 86.5 86.5 95.8 77.0 79.3 78.5 82.1 88.9
    91.1 90.6 90.6 91.0 95.9 85.2 84.6 87.4 86.4 95.9 78.0 79.9 81.6 81.0 88.9
    Average 87.0 87.1 87.3 87.6 94.3 78.2 78.6 79.6 79.7 94.8 61.5 64.0 64.9 64.9 82.8

     | Show Table
    DownLoad: CSV
    Table 2.  Quantitative results of state-of-the-art unified models (Zhang [37], Yang [38], Ting [17], Liu [18] and our DPONet on the BraTS2018 dataset. indicates available modalities. Bold indicates optimal, underline indicates sub-optimal.
    Modalities Dice (%)
    Complete Core Enhancing
    F T1 T1c T2 Z Y T L Our Z Y T L Our Z Y T L Our
    81.2 76.3 86.6 84.8 94.3 64.2 56.7 68.8 69.4 94.4 43.1 16.0 41.4 47.6 76.2
    72.2 42.8 77.8 75.8 92.6 75.4 65.1 81.5 82.9 95.4 72.6 66.3 75.7 73.7 89.2
    67.5 15.5 78.7 74.4 90.9 56.6 16.8 65.6 66.1 93.2 32.5 8.1 44.5 37.1 74.7
    86.1 84.2 88.4 88.7 95.2 61.2 47.3 66.7 66.4 94.2 39.3 8.1 40.5 35.6 74.8
    83.0 84.1 88.2 86.3 95.0 78.6 80.3 84.8 84.2 96.1 74.5 68.7 77.7 75.3 90.0
    74.4 62.1 81.8 77.2 93.1 78.6 78.2 83.5 83.4 95.7 74.0 70.7 77.1 74.7 89.5
    87.1 87.3 89.7 89.0 95.6 65.9 61.6 72.0 70.8 95.2 43.0 9.5 44.4 41.2 77.9
    82.2 84.2 88.4 88.7 94.9 61.2 47.3 66.7 66.4 95.1 45.0 16.5 47.7 48.7 77.7
    87.6 87.9 90.3 89.9 95.9 69.8 62.6 71.8 70.9 95.1 47.5 17.4 48.3 45.4 78.1
    87.1 87.5 89.5 89.7 95.6 77.9 80.8 84.8 84.4 96.1 75.1 64.8 76.8 75.0 90.0
    87.3 87.7 90.4 88.9 95.7 79.8 80.9 85.2 84.1 96.2 75.5 65.7 77.4 74.0 90.0
    87.8 88.4 89.7 89.9 96.0 71.5 63.7 74.1 72.7 95.5 47.7 19.4 50.0 44.8 78.7
    88.1 88.8 90.6 90.4 96.0 79.6 80.7 85.8 84.6 96.3 75.7 66.4 76.6 73.8 90.1
    82.7 80.9 88.4 86.1 95.1 80.4 79.0 85.8 84.4 96.2 74.8 68.3 78.5 75.4 90.1
    89.6 88.8 90.6 90.1 96.1 85.8 80.1 85.9 84.5 96.3 77.6 68.4 80.4 75.5 90.0
    Average 82.9 76.4 87.3 86.0 94.8 72.4 65.4 77.5 77.0 95.4 59.9 42.3 62.5 59.9 83.8

     | Show Table
    DownLoad: CSV
    Figure 6.  Visual comparison results of state-of-the-art unified models and our proposed DPONet on the BraTS2020 dataset.
    Figure 7.  Visual comparison results of state-of-the-art models and our proposed DPONet on the BraTS2020 dataset.
    Table 3.  Quantitative results of the state-of-the-art unified models and our proposed DPONet on the BraTS2020 dataset. The models are evaluated using Dice, HD95, and sensitivity scores. Baseline (fine-tune) means that the pre-trained transformer feature extractor is fully fine-tuned on the target dataset. Baseline (frozen) indicates that the pre-trained transformer feature extractor is frozen.
    Method Dice HD95 Sensitivity
    WT TC ET Avg WT TC ET Avg WT TC ET Avg
    Ding et al. 86.13 71.93 58.98 72.35 - - - - - - - -
    Zhang et al. 87.08 78.69 64.08 76.62 2.90 6.21 44.64 17.92 99.60 99.81 99.82 99.74
    Ting et al. 90.71 84.60 79.07 84.79 4.05 5.78 33.77 14.53 90.98 83.90 77.68 84.18
    Qiu et al. 87.58 79.67 64.87 77.37 2.82 5.71 43.92 17.48 99.66 99.83 99.81 99.77
    baseline(fine-tune) 77.63 78.94 70.85 93.56 2.61 2.09 2.39 2.36 86.28 86.50 82.74 85.17
    baseline(frozen) 58.11 61.09 40.88 89.16 2.83 2.29 2.97 2.70 81.41 84.68 85.90 84.00
    our 94.96 94.12 89.98 93.02 2.58 2.09 2.21 2.29 96.81 96.32 93.01 95.38

     | Show Table
    DownLoad: CSV

    We further conducted experiments to analyze the robustness of our proposed method to varying missing modality rates between the training and testing phases. As shown in Figure 8(a), we trained the model with a 70% missing rate and randomly removed multiple modalities to simulate modality missing scenarios for testing. We found that, compared to the baseline, our DPONet method was robust to different missing rates during testing. Moreover, in Figure 8(b), where we used 10%, 70%, and 90% to represent the degree of missingness during training (through many experiments, we found that these missing rates are representative), we observed that when training with more complete modality data, the performance was significantly higher when testing with low missing rates. In this paper, the experiments based on the general reality that collecting complete modality data cannot be guaranteed. However, there are still some publicly available datasets with complete modalities. Therefore, we trained the models using complete data, as shown in Figure 8(c), where the baseline model could not handle data missing, our method consistently improved upon the baseline.

    Figure 8.  Study on the robustness of DPONet to testing missing rates under different scenarios (where the absence of one, two, or three modalities is random, to account for the possible missing modalities during testing). (a) All models are trained under a 70% missing rate and evaluated under varying missing rates. (b) Training with different missing rates scenarios with 10%, 70%, and 90% missing rates (through many experiments, we found that these missing rates are representative), representing data with higher modality completeness, balanced data, and data with lower modality completeness, respectively. (c) All models are trained with modality-complete data.

    We explored the effects of frequency filtering prompts and spatial perturbation prompts, the results showing in Table 4, our method achieved a higher Dice score of 93.02. The term baseline (fine-tune) refers to a pre-trained transformer that is comprehensively fine-tuned on the BraTS dataset. The term baseline (frozen) refers to a baseline model where the pre-trained backbone parameters are frozen.

    Table 4.  Ablation study of our proposed DPONet on the BraTS2020 dataset. The models are evaluated using Dice, HD95, and sensitivity scores. Baseline (fine-tune) means that the pre-trained transformer feature extractor is fully fine-tuned on the target dataset. Baseline (frozen) indicates that the pre-trained transformer feature extractor is frozen.
    Method Dice HD95 Sensitivity
    WT TC ET Avg WT TC ET Avg WT TC ET Avg
    baseline (fine-tune) 77.63 78.94 70.85 75.81 2.61 2.09 2.39 2.36 86.28 86.50 82.74 85.17
    baseline (frozen) 58.11 61.09 40.88 53.36 2.83 2.29 2.97 2.70 81.41 84.68 85.90 84.00
    baseline + FFP 93.65 92.40 85.08 90.38 2.45 2.04 2.16 2.22 96.54 96.11 91.26 94.64
    baseline + SPP 94.56 94.40 87.37 92.11 2.47 2.05 2.22 2.25 96.59 96.07 90.53 94.40
    baseline + FFP + SPP 94.96 94.12 89.98 93.02 2.58 2.09 2.21 2.29 96.81 96.32 93.01 95.38

     | Show Table
    DownLoad: CSV

    We introduced frequency filtering prompts into the baseline model, the model achieved comparable performance to fine-tuned model, demonstrating the efficiency of proposed component. Furthermore, as shown in Figure 9, during training with complete modalities, when a significant portion of modalities were absent during inference (i.e., retaining only one modality), the baseline model suffered a severe performance degradation. Excitingly, when prompts were introduced, the model was able to perform image segmentation normally even with a single modality input, indicating that the proposed visual prompts facilitated the encoder to learn discriminative features across modalities.

    Figure 9.  Qualitative results from state-of-the-art models and our DPONet, which was trained using the complete modal dataset of BraTS2020 and randomly missing three modalities with a 70% miss rate during the test phase.

    We introduced the spatial perturbation prompts module into the baseline, the overall robustness of the model was improved. As shown in Table 4, our method achieved a higher Dice score of 93.02, exceeding the baseline model by 17.21. Furthermore, the Dice score for the ET region saw a significant increase, indicating that the spatial perturbation prompt facilitated the fusion of inter-modal information and preserved more edge details and small-scale information. Figure 10 visualizes the segmentation results before and after using the spatial perturbation prompt, clearly demonstrating that more small-scale lesion areas are preserved.

    Figure 10.  Qualitative results from DPONet, which was trained using the complete dataset of BraTS2020 and randomly missing three modalities with a 70% miss rate during the test phase. The red box indicates the progress of DPONet.

    Additionally, in Table 5, we described the parameter information before and after adding the module. It indicates that our method only introduced approximately 7% of the total trainable parameters but achieved excellent segmentation performance. Once extended to large models with billions of parameters, our proposed method will be more favorable and suitable for multimodal downstream tasks with missing modalities, achieving a favorable trade-off between computational cost and performance.

    Table 5.  The number of model parameters (106) before and after adding the learnable prompt component.
    Method Param (M) Tunable Param (M)
    baseline (fine-tune) 194.82 194.82
    baseline (frozen) 194.82 49.30
    baseline + FFP 160.42 58.97
    baseline + SPP 173.93 48.69
    baseline + FFP + SPP 153.43 10.58

     | Show Table
    DownLoad: CSV

    In this paper, we introduce a parameter-efficient and discriminatively optimized segmentation network that exhibits robust adaptability to generalized missing modality inputs. Our model filters frequency features to generate discriminative visual cues and introduces learnable spatial perturbation prompts into shared feature representations, effectively addressing the challenge of incomplete multimodal brain tumor segmentation. Compared to fine-tuning the entire transformer model, our approach requires only 7% of the trainable parameters while demonstrating superior performance in handling real-world scenarios with missing modality data. Extensive experiments and ablation studies on the publicly available BraTS2018 and BraTS2020 datasets validate the effectiveness of our proposed method.

    In this work, we investigate a parametrically efficient incomplete modal image segmentation method for brain tumors. Although our model successfully captures consistent features by mapping robust multimodal features to the same potential space, we must point out that our model cannot recover information about missing modalites from available multimodal inputs. Therefore, our next plan will study how to use the available multimodal image to estimate the missing modal information to obtain rich image information.

    This work is supported by the National Nature Science Foundation of China (No.U24A20231, No.62272283), New Twentieth Items of Universities in Jinan (No.2021GXRC049).

    All authors declare no conflicts of interest in this paper.



    [1] Rodriguez J, Cortes P (2012) Predictive Control of Power Converters and Electrical Drives, John Wiley & Sons Ltd, United Kingdom. https://doi.org/10.1002/9781119941446
    [2] Wang L, Gan L (2014) Integral FCS predictive current control of induction motor drive. IFAC Proceedings Volumes 47: 11956-11961. https://doi.org/10.3182/20140824-6-ZA-1003.00753 doi: 10.3182/20140824-6-ZA-1003.00753
    [3] Wang L, Chai S, Yoo D, Gan L, Ng K (2015) PID and predictive control of electrical drives and power converters using MATLAB/Simulink, JohnWiley & Sons. https://doi.org/10.1002/9781118339459
    [4] Odhano S, Bojoi R, Formentini A, Zanchetta P, Tenconi A (2017) Direct flux and current vector control for induction motor drives using model predictive control theory. IET Electr Power Appl 11: 1483-1491. https://doi.org/10.1049/iet-epa.2016.0872 doi: 10.1049/iet-epa.2016.0872
    [5] Ahmed AA, Koh BK, Kim JS, Lee YI (2017) Finite control set-model predictive speed control for induction motors with optimal duration. IFAC Papers On Line 50: 7801-7806. https://doi.org/10.1016/j.ifacol.2017.08.1056 doi: 10.1016/j.ifacol.2017.08.1056
    [6] Wang J, Wang F (2020) Robust sensor less FCS-PCC control for inverter-based induction machine systems with high-order disturbance compensation. J Power Electron 20: 1222-1231. https://doi.org/10.1007/s43236-020-00113-8 doi: 10.1007/s43236-020-00113-8
    [7] Fereidooni A, Davari SA, Garcia C, Rodriguez J (2021) Simplified Predictive Stator Current Phase Angle Control of Induction Motor with a Reference Manipulation Technique. IEEE Access 9: 54173-54183. https://doi.org/10.1109/ACCESS.2021.3070790 doi: 10.1109/ACCESS.2021.3070790
    [8] Rodriguez J, Garcia C, Mora A, Flores-Bahamonde F, Acuna P, Novak M, et al. (2021) Latest Advances of Model Predictive Control in Electrical Drives—Part I: Basic Concepts and Advanced Strategies. IEEE T Power Electr 37: 3927-3942. https://doi.org/10.1109/TPEL.2021.3121532 doi: 10.1109/TPEL.2021.3121532
    [9] Rodriguez J, Garcia C, Mora A, Davari SA, Rodas J, Valencia DF, et al. (2021) Latest advances of model predictive control in electrical drives—Part Ⅱ: Applications and bench marking with classical control methods. IEEE T Power Electr 37: 5047-5061. https://doi.org/10.1109/TPEL.2021.3121589 doi: 10.1109/TPEL.2021.3121589
    [10] Tang Y, Xu W, Dong D, Liu Y, Ismail MM (2022) Low-Complexity Multistep Sequential Model Predictive Current Control for Three-Level Inverter-Fed Linear Induction Machines. IEEE T Ind Electron 70: 5537-5548. https://doi.org/10.1109/TIE.2022.3192688 doi: 10.1109/TIE.2022.3192688
    [11] Yang X, Zhang L, Xie W, Zhang J (2019) Sequential and Iterative Distributed Model Predictive Control of Multi-Motor Driving Cutterhead System for TBM. IEEE Access 7: 46977-46989. https://doi.org/10.1109/ACCESS.2019.2908388 doi: 10.1109/ACCESS.2019.2908388
    [12] Wang T, Wang Y, Wang X, Han M, Rodríguez J, Zhang Z (2021) A Statistics-Based Dynamic Sequential Model Predictive Control for Induction Motor Drives. 2021 IEEE International Conference on Predictive Control of Electrical Drives and Power Electronics (PRECEDE), 513-518, https://doi.org/10.1109/PRECEDE51386.2021.9681001
    [13] Vodola V, Odhano S, Norambuena M, Garcia C, Vaschetto S, Zanchetta P, et al. (2019) Sequential MPC Strategy for High Performance Induction Motor Drives: a detailed analysis. 2019 IEEE Energy Conversion Congress and Exposition (ECCE), 6595-6600. https://doi.org/10.1109/ECCE.2019.8912708
    [14] Vodola V, Odhano S, Garcia C, Norambuena M, Vaschetto S, Zanchetta P, et al. (2019) Modulated Model Predictive Control for Induction Motor Drives with Sequential Cost Function Evaluation. 2019 IEEE Energy Conversion Congress and Exposition (ECCE), 4911-4917. https://doi.org/10.1109/ECCE.2019.8911870
    [15] Wang Y, Zhang Z, Huang W, Kennel R, Xie W, Wang F (2019) Encoderless Sequential Predictive Torque Control with SMO of 3L-NPC Converter-fed Induction Motor Drives for Electrical Car Applications. 2019 IEEE International Symposium on Predictive Control of Electrical Drives and Power Electronics (PRECEDE), 1-6. https://doi.org/10.1109/PRECEDE.2019.8753238
    [16] Kerboua A, Kelaiaia R (2023) Fault Diagnosis in an Asynchronous Motor Using Three-Dimensional Convolutional Neural Network. Arab J Sci Eng, 1-19. https://doi.org/10.1007/s13369-023-08025-y
    [17] Yin Z, Du C, Liu J, Sun X, Zhong Y (2018) Research on Autodisturbance-Rejection Control of Induction Motors Based on an Ant Colony Optimization Algorithm. IEEE T Ind Electron 65: 3077-3094. https://doi.org/10.1109/TIE.2017.2751008 doi: 10.1109/TIE.2017.2751008
    [18] Mahfoud S, Derouich A, Iqbal A, El Ouanjli N (2022) ANT-colony optimization-direct torque control for a doubly fed induction motor: An experimental validation. Energy Rep 8: 81-98. https://doi.org/10.1016/j.egyr.2021.11.239 doi: 10.1016/j.egyr.2021.11.239
    [19] Dhieb Y, Yaich M, Guermazi A, Ghariani M (2019) PID controller tuning using ant colony optimization for induction motor. J Electr Syst 15: 133-141.
    [20] Essa MS, Elhosseini MA, Gouda EA (2021) Induction Motor Drive Using Fractional-Order Proportional Integral Derivative (FOPID) Controller Based on Nelder-Mead and Grey Wolf Optimizers. MEJ-Mansoura Engineering Journal 46: 23-31. https://doi.org/10.21608/bfemu.2021.178058 doi: 10.21608/bfemu.2021.178058
    [21] Swetha KT, Reddy BV, Jain RK (2022) A Direct Search Nelder Mead MPPT based Induction Motor Drive for Solar PV Water Pumping Systems. 2022 22nd National Power Systems Conference (NPSC), 578-583. https://doi.org/10.1109/NPSC57038.2022.10069049
    [22] Habbi F, Gabour NE, Bounekhla M, Boudissa EG (2021) Output voltage control of synchronous generator using Nelder–Mead algorithm based PI controller. 2021 18th International Multi-Conference on Systems, Signals & Devices (SSD), 365-374. https://doi.org/10.1109/SSD52085.2021.9429387
    [23] Mishra DD, Padhi P, Tripathy AA, Patnaik S, Sahoo PK (2023) Optimal Tuning of Fractional Order PID controller using Nelder-Mead Algorithm for DC Motor Speed Control. In 2023 International Conference in Advances in Power, Signal, and Information Technology (APSIT), 373-378. https://doi.org/10.1109/APSIT58554.2023.10201735
    [24] Mallik S, Mallik K, Barman A, Maiti D, Biswas SK, Deb Nk, et al. (2017) Efficiency and Cost Optimized Design of an Induction Motor Using Genetic Algorithm. IEEE T Ind Electron 64: 9854-9863. https://doi.org/10.1109/TIE.2017.2703687 doi: 10.1109/TIE.2017.2703687
    [25] Keskin B, Eminoğlu I (2022) Optimally Tuned PI Controller Design for V/f Control of Induction Motor. 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), 1-5. https://doi.org/10.1109/HORA55278.2022.9800005
    [26] Houili R, Hammoudi MY, Betka A, Titaouine A (2023) Stochastic optimization algorithms for parameter identification of three phase induction motors with experimental verification. 2023 International Conference on Advances in Electronics, Control and Communication Systems (ICAECCS), 1-6. https://doi.org/10.1109/ICAECCS56710.2023.10104526
    [27] Taradeh M, Mafarja M, Heidari AA, Faris H, Aljarah I, Mirjalili S, et al. (2019) An evolutionary gravitational search-based feature selection. Inform Sciences 497: 219–239. https://doi.org/10.1016/j.ins.2019.05.038 doi: 10.1016/j.ins.2019.05.038
    [28] Jalil T, Boudour M, Tadjine M (2013) Optimal tuning of induction motor control using gravitational search algorithm. 2013 3rd International Conference on Systems and Control, 208-213.
    [29] El Mahfoud M, Bossoufi B, El Ouanjli N, Said M, Taoussi M (2021) Improved direct torque control of doubly fed induction motor using space vector modulation. Int J Intell Eng Syst 14: 177-188. https://doi.org/10.22266/ijies2021.0630.16 doi: 10.22266/ijies2021.0630.16
    [30] El Ouanjli N, Mahfoud S, Al-Sumaiti AS, El Daoudi S, Derouich A, El Mahfoud M, et al. (2023) Improved twelve sectors DTC strategy of induction motor drive using Backstepping speed controller and P-MRAS stator resistance identification-design and validation. Alex Eng J 80: 358-371. https://doi.org/10.1016/j.aej.2023.08.077 doi: 10.1016/j.aej.2023.08.077
    [31] El Idrissi A, Derouich A, Mahfoud S, El Ouanjli N, Chantoufi A, Al-Sumaiti AS, Mossa MA (2022) Bearing fault diagnosis for an induction motor controlled by an artificial neural network—Direct torque control using the Hilbert transform. Mathematics 10: 4258. https://doi.org/10.3390/math10224258 doi: 10.3390/math10224258
    [32] El Ouanjli N, Mahfoud S, Bhaskar MS, El Daoudi S, Derouich A, El Mahfoud M (2022) A new intelligent adaptation mechanism of MRAS based on a genetic algorithm applied to speed sensorless direct torque control for induction motor. International Journal of Dynamics and Control 10: 2095-2110. https://doi.org/10.1007/s40435-022-00947-z doi: 10.1007/s40435-022-00947-z
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1586) PDF downloads(156) Cited by(1)

Figures and Tables

Figures(14)  /  Tables(7)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog