
Dynamic modeling and control of DC–DC power converters require formulations capable of capturing nonlinearity, harmonic interaction, and control sensitivity under fast-switching and bidirectional power flow. The two-level dual-active-bridge (2L-DAB) converter exemplifies such challenges, especially under phase-shift modulation schemes used for galvanically isolated energy transfer. To address these issues, three complementary modeling frameworks are developed: a switching model in the time domain, a rotating-frame formulation based on the DQ transformation, and a generalized state-space averaging (GSSA) model that incorporates fundamental and harmonic components through frequency-domain decomposition. Each formulation enables a different perspective—ranging from intuitive time-domain dynamics to harmonic coupling behavior—while providing a foundation for control system design. Small-signal linearizations yield control-to-output transfer functions used for loop-shaping via proportional-integral (PI) compensators. A modified phase margin criterion is employed to guarantee dynamic stability and robustness. Comparative simulation results under reference and load disturbances demonstrate the distinct advantages of each model, with the GSSA approach excelling in harmonic accuracy and the DQ-based model offering streamlined controller implementation. These tools offer a robust methodology for high-fidelity analysis and design of high-performance DAB-based systems.
Citation: José M. Campos-Salazar, Roya Rafiezadeh, Felipe Santander, Juan L. Aguayo-Lazcano, Nicolás Kunakov. Comprehensive GSSA and D-Q frame dynamic modeling of dual-active-bridge DC-DC converters[J]. AIMS Electronics and Electrical Engineering, 2025, 9(3): 288-313. doi: 10.3934/electreng.2025014
[1] | Pannathon Kreabkhontho, Watchara Teparos, Thitiya Theparod . Potential for eliminating COVID-19 in Thailand through third-dose vaccination: A modeling approach. Mathematical Biosciences and Engineering, 2024, 21(8): 6807-6828. doi: 10.3934/mbe.2024298 |
[2] | Tao Chen, Zhiming Li, Ge Zhang . Analysis of a COVID-19 model with media coverage and limited resources. Mathematical Biosciences and Engineering, 2024, 21(4): 5283-5307. doi: 10.3934/mbe.2024233 |
[3] | Fang Wang, Lianying Cao, Xiaoji Song . Mathematical modeling of mutated COVID-19 transmission with quarantine, isolation and vaccination. Mathematical Biosciences and Engineering, 2022, 19(8): 8035-8056. doi: 10.3934/mbe.2022376 |
[4] | Hamdy M. Youssef, Najat A. Alghamdi, Magdy A. Ezzat, Alaa A. El-Bary, Ahmed M. Shawky . A new dynamical modeling SEIR with global analysis applied to the real data of spreading COVID-19 in Saudi Arabia. Mathematical Biosciences and Engineering, 2020, 17(6): 7018-7044. doi: 10.3934/mbe.2020362 |
[5] | Jie Bai, Xiunan Wang, Jin Wang . An epidemic-economic model for COVID-19. Mathematical Biosciences and Engineering, 2022, 19(9): 9658-9696. doi: 10.3934/mbe.2022449 |
[6] | Yue Deng, Siming Xing, Meixia Zhu, Jinzhi Lei . Impact of insufficient detection in COVID-19 outbreaks. Mathematical Biosciences and Engineering, 2021, 18(6): 9727-9742. doi: 10.3934/mbe.2021476 |
[7] | Salma M. Al-Tuwairqi, Sara K. Al-Harbi . Modeling the effect of random diagnoses on the spread of COVID-19 in Saudi Arabia. Mathematical Biosciences and Engineering, 2022, 19(10): 9792-9824. doi: 10.3934/mbe.2022456 |
[8] | Avinash Shankaranarayanan, Hsiu-Chuan Wei . Mathematical modeling of SARS-nCoV-2 virus in Tamil Nadu, South India. Mathematical Biosciences and Engineering, 2022, 19(11): 11324-11344. doi: 10.3934/mbe.2022527 |
[9] | Rahat Zarin, Usa Wannasingha Humphries, Amir Khan, Aeshah A. Raezah . Computational modeling of fractional COVID-19 model by Haar wavelet collocation Methods with real data. Mathematical Biosciences and Engineering, 2023, 20(6): 11281-11312. doi: 10.3934/mbe.2023500 |
[10] | Ayako Suzuki, Hiroshi Nishiura . Transmission dynamics of varicella before, during and after the COVID-19 pandemic in Japan: a modelling study. Mathematical Biosciences and Engineering, 2022, 19(6): 5998-6012. doi: 10.3934/mbe.2022280 |
Dynamic modeling and control of DC–DC power converters require formulations capable of capturing nonlinearity, harmonic interaction, and control sensitivity under fast-switching and bidirectional power flow. The two-level dual-active-bridge (2L-DAB) converter exemplifies such challenges, especially under phase-shift modulation schemes used for galvanically isolated energy transfer. To address these issues, three complementary modeling frameworks are developed: a switching model in the time domain, a rotating-frame formulation based on the DQ transformation, and a generalized state-space averaging (GSSA) model that incorporates fundamental and harmonic components through frequency-domain decomposition. Each formulation enables a different perspective—ranging from intuitive time-domain dynamics to harmonic coupling behavior—while providing a foundation for control system design. Small-signal linearizations yield control-to-output transfer functions used for loop-shaping via proportional-integral (PI) compensators. A modified phase margin criterion is employed to guarantee dynamic stability and robustness. Comparative simulation results under reference and load disturbances demonstrate the distinct advantages of each model, with the GSSA approach excelling in harmonic accuracy and the DQ-based model offering streamlined controller implementation. These tools offer a robust methodology for high-fidelity analysis and design of high-performance DAB-based systems.
Influenza is the major cause of medically-attended acute respiratory diseases [1,2], and as an infectious disease it is caused by the influenza virus [3] which has three types: A, B, and C. The most significant human influenza pathogens are Alphainfluenza viruses (IAV), which can be further classified into subtypes by combining one of the 16 hemagglutinin (HA: H1– H16) with one of the 9 neuraminidase (NA: N1–N9) surface antigens [4]. Generally, most influenza viruses (e.g., subtypes H5N1, H9N2, H7N7, and H7N3) are avian and have low pathogenicity due to being inefficient at binding to sialic acid receptors of human upper airways [4]. However, some like H7N9, broke out in China and caused 44 deaths [5], cross species from poultry [6] due to the mutations in their HA proteins which enabled them to bind to human-like receptors.
Influenza A can cause major outbreaks and pandemics[7,8]. An estimated three to five million cases of severe illnesses and about 250,000 to 500,000 deaths are reported annually. Once the human population has low immunity against newly emerged influenza sequences, a pandemic happens [9,10].
Three influenza pandemics arose in the 20th century: Spanish influenza in 1918, Asian influenza in 1958, and Hong Kong influenza in 1968, each of which caused more than a million deaths [11]. All the seasonal influenza A epidemics from 1968 to 2009 were dominated by A/H3N2 virus variants produced by antigenic drift [12,13] except A/H1N1 viruses which reappeared in 1977. In fact, the pandemic of 1918 was caused by an H1N1 IAV as well. In March and early April 2009, a new type of Influenza A (H1N1) virus which was of swine origin (S-OIV) came out in Mexico and California [14]. It caused considerable fear and several deaths worldwide. This virus was antigenically distinct from human seasonal influenza viruses. However, it was genetically related to viruses recognized to circulate in pigs. With respect to its similar swine origin, it is often known as ‘swine-origin influenza virus’ (S-OIV) A/H1N1, or pandemic influenza A (H1N1) 2009 virus [15].
Viruses are parasites which need host cellular machinery for their genome replication. For reaching the host, viral proteins need to interact with host proteins. Therefore, identification of host-virus protein-protein interaction network (HVIN) can help to predict the behavior of that virus and lead to design antiviral drugs.
There are many experimental methods for detecting host-virus protein-protein interaction (HV-PPI) such as co-immunoprecipitation [16], bimolecular fluorescence complementation [17], label transfer and yeast two-hybrid. All of these methods are expensive, time-consuming and laborious. So, a series of computational methods have been proposed in recent years to predict HV-PPIs.
While Sprinzak [18] applied sequence-signature pairs, Kim [19] and Ng [20] used protein domain profiles and Yu [21] used sequence homology in order to predict HV-PPIs. Zhang [22] took advantage of decision trees in predicting co-complexed protein pairs using genomic and proteomic data integration. For predicting HV-PPIs via genomic data, Jansen [23] utilized Bayesian networks. Qi [24] used support vector machines and random forest to predict HV-PPIs. Dyer [25] achieved 516 new HV-PPIs by applying Bayesian statistics on every pair of functional domains of human–plasmodium falciparum. Zahiri [26] employed four well-established diverse learners as base classifiers of an ensemble learning model and a variety of features including pseudo amino acid composition and post translation modification to predict HV-PPI between homo sapiens and HCV (hepatitis C) proteins. Tastan [27] applied random forest as a classifier accompanied by a variety of features including co-occurrence of functional motifs and their interaction domains, tissue distributions and gene expression profiles to predict PPIs between HIV (human immunodeficiency) and human proteins. Qi [28] identified novel PPIs among HIV and human proteins by taking advantage of semi-supervised multi task learning while Barnes [29] constructed a protein-protein interaction prediction engine (PIPE) to identify new PPIs between HIV and homo sapiens proteins. Alguwaizani [30] used repeated patterns of amino acids and amino acid composition to predict PPIs among HIV, H1N1, SARS (sever acute respiratory syndrome), HCV, HPV (human papillomavirus) and human.
Zhang [31] constructed a graph by human proteins which share gene ontology terms [32] with H7N9 proteins, then calculated the shortest path of the constructed graph and sorted its proteins based on betweenness score. The top 20 proteins with the highest betweenness score interacting with H7N9 were reported as potential proteins. Eng [33] extracted the physicochemical properties of amino acids of IAV and human proteins and used them as input features of a random forest to predict PPIs between IAV and human proteins.
Nanni [34] used position specific scoring matrix of the proteins (PSSM), substitution matrix representation, wavelet image, physicochemical property response matrix, amino acid composition, pseudo amino acid composition, dipeptide, tripeptide and tetrapeptide composition to improve prediction performance up to two percent in 25 different datasets. Zacharaki [35] with extracting torsion angles density and density of amino acid distances learned a deep convolutional neural network to achieve 90% accuracy in predicting structure-based protein function.
In prediction problems, some of papers combined different classifiers to make an ensemble learning model which improve the accuracy of their model. Saha [36] used support vector machine (SVM), random forest (RF), Naïve Bayes (NB) and decision tree to build an ensemble learning method based on majority voting to improve its prediction accuracy to 90%. Emamjomeh [26] used SVM, RF, NB and multilayer perceptron (MLP) to build and ensemble learning method based on a meta learner combiner to improve its prediction accuracy to 84%. Nanni [37] used SVM, random subspace of adaboost, gaussian process classifier, deep learning and random subspace of rotation boosting to build and ensemble learning model based on normalized summation score of its classifiers to outperform the other methods.
In the present study, 1800 different features were extracted from physicochemical properties of amino acids, different centralities of HPPIN, human and virus proteins’ sequence and gene ontology to predict HI-PPIs between human and AlphaInfluenzavirus. We used KNN, cart tree, NB and SVM as the base learners and RF as a meta-classifier to build an ensemble learning method for predicting HI-PPIs using the extracted features. All these processes are depicted in Figure 1. Our ensemble learning method reached the accuracy of 93% in detecting HI-PPIs according to the experimental data.
Moreover, with running the trained model on 694522 possible HI-PPIs, a database was created which is publicly accessible at http://bioinf.modares.ac.ir/software/complexnet/Influenza.
Finally, feature importance analysis revealed that human PPI network centralities, gene ontology semantic similarity and codon usage are the most informative descriptors for HI-PPIs prediction.
We have constructed two datasets for evaluating the proposed method: A positive dataset and a negative one.
In order to construct positive HI-PPIs, all IAV interactions were extracted from Intact [38], Virus Mint [39], DIP [40], STRING [41] and BioGRID [42] databases. Then, interactions between IAV proteins and other organism proteins except with human proteins were removed. At last, 10775 interactions that annotated as ‘physical association’ or ‘direct interaction’ were considered as the positive interaction set (PS). Constructed PPI network consists of 125 IAV proteins from 10 IAV genes and 2794 HPs from 2498 genes. As it is shown in Figure 2, ten distinct influenza genes interact with 2498 distinct human genes of which non-structural gene (NS) have the most interactions.
As there isn’t any negative data in databases, selecting appropriate negative PPIs is very challenging among the PPI prediction problems [43]. HPPIN has 250038 interactions among 20050 HPs. By using CD-HIT[44], sequence similarity is calculated between 2794 HPs of HI-PPIs and 17256 other HPs of HPPIN. HPs with sequence similarity less than 20% is used as negative HPs and interaction between each of negative HPs and all IAV proteins considered as negative dataset. Final negative dataset consists of 236875 interactions.
As the number of positive and negative interactions, which is used for training the model, needs to be equal to prevent training biased classifiers, inverse random under sampling (IRUS) [45] is used to balance the benchmark dataset.
As it is shown in Figure 3, we used five different schemes to encode features for human and Influenza A proteins: Amino acid sequence-based feature, nucleotide sequence-based features, physicochemical properties, gene ontology semantic similarities and network-based features.
a. Amino acid composition (AAC): Eight categories [26] were defined by clustering twenty naive amino acids using k-means algorithm, according to 514 physicochemical index of amino acids, which exist in the AAindex database [46]. Frequency distribution of each group in the desired sequence is considered as AAC.
b. Dipeptide Composition (DC): Dipeptide composition is defined as the percentage of two consecutive amino acids which will construct a feature vector with the length of 20 × 20 × 2 = 800. But for avoiding the side effect of curse of dimensionality [47], we clustered 20 amino acids into eight groups [26] and subsequently the size of feature vector reduced to 8*8*2 = 128. DC is calculated as below:
DC(AiAj)=N(AiAj)L−1 |
which N(AiAj) is the number of occurrences of jth amino acid group followed by ith amino acid group in the sequence and L is the length of sequence.
c. Conjoint Triad (CT): Percentage of three consecutive amino acids which will construct a feature vector with length of 20 × 20 × 20 × 2 = 16000. Again, we clustered 20 amino acids into eight groups[26] and subsequently the size of feature vector reduced to 8 × 8 × 8 × 2 = 1024. TC is defined as:
TC(GiGjGk)=N(GiGjGk)L−2 |
which N(GiGjGk) is the number of occurrences of kth group of amino acids followed by jth group of amino acids followed by ith group of amino acids and L is the length of the sequence.
d. Biosynthesis energy: Pyruvate, 3-phosphoglycerate and several other metabolic precursors were combined and formed amino acids. Total cost of this procedure is called biosynthesis energy and calculated by Wagner method [48]. We used it as a feature calculated by:
BE=(∑20i=1fi∗ei)/n |
where n is the length of protein, fi is the frequency of ith amino acid and ei is the biosynthesis energy of ith amino acid.
a. GC content: GC Content stands for Guanine-Cytosine content and represents the percentage of nitrogenous bases on a DNA molecule, which may be either guanine or cytosine. As the bond between guanine and cytosine is a triple bond compared to a double bond between adenine and thymine, the sequences with higher GC content are more stable.
b. Codon usage: Codon usage represents the frequency of occurrence of synonymous codons in coding DNA. By considering fi as frequency of ith codon of jth amino acid and nj as the sum of the occurrence of that amino acid in the desired sequence, codon usage of ith codon is calculated by:
CUi=fi/nj |
c. Relative synonymous codon usage (RSCU): Frequency of each codon divided by frequency of that codon with assumption of equal distribution of codons of the related amino acid [49] and is calculated by:
RSCUij=fij(1/ni)∑nij=1fij |
where fij is the frequency of jth codon of ith amino acid in the protein sequence, ni is the number of codons of ith amino acid.
d. Codon adaption index (CAI): An effective, simple measure of RSCU bias[50] which is calculated by:
CAI=(∏ni=1RSCUi)1/n/(∏ni=1RSCUimax)1/n |
where n is the length of protein, RSCUi is the RSCU value of the ith codon and the RSCUimax is the maximum RSCU value among codons of amino acid related to ith codon.
e. Stacking energy: The nearest-neighbor (NN) model of nucleic acids assumes that the identity and orientation of neighboring base pairs of a particular base pair affect the stability of the base pair [51].
Stacking Energy is calculated by:
∇Gtotal=∑(ni∗∇Gi)+∇Ginit+∇Gend+∇Gsym |
where ∇G for init, i and end is acquired by unified nearest-neighbor (NN) free energy parameter. If the duplex is self-complementary, its symmetry is conserved by setting ∇Gsym to +0.43 (kcal/mol) and zero if it is non-self-complementary.
f. Interaction energy: Dispersion and repulsion energies between a codon and its complement is called interaction energy [52] and calculated by:
IE=∑20i=1(ei∗ni)/n |
where n is the length of protein, ni is the frequency of ith amino acid and ei is the interaction energy of ith amino acid.
a. Hydrophobicity: Repletion tendency of an amino acid from a mass of water.
b. Hydrophilicity: Attraction tendency of an amino acid to a mass of water.
c. Polarity: The degree to which a molecule has a dipole moment.
e. Polarizability: The influencing amount of an external electric field on the electron clouds of a molecule.
f. Side chain volume: Sum of volume of side chain atoms of an amino acid.
g. Solvent-accessible surface area: The surface area of a biomolecule that is accessible to a solvent [53].
h. Net charge index of residue side chains [54]
To add the effect of certain distance neighbors of each amino acid, Auto covariance is used [55] and the mentioned physicochemical properties were assumed as interaction mode. AC is calculated by the following equation:
AC(d.k)=1L−dL−d∑i=1(Pi,k−1LL∑j=1Pj.k)∗(Pi+d,k−1LL∑j=1Pj,k) |
Where i, j are ith and jth residue and k is the index of the physicochemical properties. Pi.k is kth physicochemical property of ith amino acid. L is length of the sequence and d is the distance between the current residue and its neighbor. As an example, d = 1 is the first neighbor which is regarded to the next residue while d = 2 is the second neighbor and so on.
Gene ontology (GO) [32] is a comprehensive set of ontologies for molecular biology domains developed for gene annotations of all organisms as a hierarchy. It uses a shared language to achieve a mutual understanding of the definition and meaning of any word used. There are three classes in GO:
a. Cellular compartment (CC): Where a gene product is located such as inner and outer membrane.
b. Molecular function (MF): An element activity, task or job such as protein kinase activity or insulin receptor activity.
c. Biological process (BP): A commonly recognized series of events such as cell division or transcription.
The similarity between these GO terms are achieved from the frequencies of two GO terms involved and their closest common ancestor term in a specific corpus of GO annotations[56].
GO terms have a hierarchical structure. Each of them is a node in a tree and may have parents and children. Frequency of each GO term is calculated by dividing the total number of its children over all number of GO terms which is called Fc for cth term. The information content (IC) of a GO term is computed by the negative log frequency of that term. A rarely used term contains a greater amount of information [57]. IC of a concept is given by the following formula:
IC(c)=−log(Fc) |
The most informative common ancestor (MICA) is the largest IC of all common ancestors of two concepts and calculated by the following formula:
MICA(c1⋅c2)=max{IC(a)|a∈CommonAncestors(c1,c2)} |
Resnik [58] defined largest information content of all common ancestors as the semantic similarity between the concepts.
simRes(c1⋅c2)=MICA(c1,c2) |
Jiang [59] defined the semantic similarity as the inverse of difference between their information content and the largest information content of all common ancestors.
SimJia(c1⋅c2)=1IC(c1)+IC(c2)−2∗MICA(c1,c2)+1 |
Lin [60] considered MICA over their information content as the semantic similarity.
SimLin(c1⋅c2)=2∗MICA(c1,c2)IC(c1)+IC(c2) |
We used all of the mentioned methods for calculating semantic similarities for each pair of HVIN separately for each class. So we gained nine GO features: MFSimRes, MFSimJia, MFSimLin, BPSimRes, BPSimJia, BPSimLin, CCSimRes, CCSimJia, CCSimLin. By evaluating the models which are trained by these features, three features gained by Jiang similarity are chosen as the final GO semantic similarity features.
a. Degree (connectivity): Is defined as the number of partners that are interacting with a protein p.
b. Neighborhood connectivity: Neighborhood connectivity is based on degree (connectivity) measure. In fact, the average connectivity of all neighbors of p represents the neighborhood connectivity of p.
c. Shortest paths: The length of a path is the number of edges forming it. The pass with minimum length between each two proteins i and j is considered as the shortest path. For each protein as shown in the following formula, the shortest path centrality is the summation of shortest path between that protein and all the other proteins divided by the number of proteins. (∑nm=1Sp⋅m)/n
d. Shared neighbors: This topological measure represents the number of interacting partners shared between proteins i and j, i.e., proteins which are neighbors of both i and j.
e. Stress centrality: The number of the shortest paths between all protein pairs in the HPPIN passing through a given protein p stands for the stress centrality of p. This centrality is representative of the workload the protein carries in a network. If a protein is traversed by a high number of shortest paths, then it has a high stress.
f. Topological coefficients: A relative measure for the extent to which a protein shares neighbors with others. Proteins that have no or one neighbor are assigned a topological coefficient of 0 (zero). The chart of the topological coefficients can be used to estimate the tendency of the proteins in the HPPIN to have shared neighbors. Topological coefficient is defined as follows: Tp = avg(Jp.m)/kp, where Jp.m is defined for all proteins m that share at least one neighbor with protein p and the value Jp.m is the number of neighbors shared between the proteins p and m, plus one if there is a direct link between proteins p and m. However, Kp is the number of neighbors of protein p.
g. Closeness centrality: The closeness centrality Cc(p) of a protein p defines the reciprocal of the average shortest path length. Actually, it is a number between 0 and 1 which is computed as:
Cc(p)=1/avg(Lp,m) |
where Lp.m is the length of the shortest path between two proteins p and m.
The closeness centrality of isolated proteins is equal to 0. This measure shows how fast information spreads from a given protein to other reachable ones in the HPPIN.
h. Clustering coefficients: The clustering coefficient for a protein p is the number of triangles (3-loops) that pass through p, relative to the maximum number of triangles that could pass through p.
Cp=2ep/(kp(kp−1)) |
where kp is the number of neighbors of p and ep is the number of connected pairs between all neighbors of p.
i. Betweenness centrality: The betweenness centrality of a protein p represents the amount of control that p exerts over the interactions of others in the HPPIN and it is defined as follows: Cb(p)=∑σst(p)σst, where s and t are proteins in the HPPIN different from p, σst shows the number of the shortest paths from s to t, and σst (p) is the number of the shortest paths from s to t that p lies on.
j. Radiality: The radiality of a protein is calculated by subtracting the average shortest path between that protein and all other proteins in the HPPIN from the value of the diameter. Hence, proteins with higher radiality are usually closer to the other nodes, whereas, proteins with lower radiality are peripheral.
Five different categories of features were used each in a separate model. Combination of these features were performed by choosing random features among all existing features for training 10 other models. All these 15 models were constructed by different classifiers to obtain divers base classifiers.
The results of predictions of 10 most popular classifiers were combined by stacked generalization [61]. In stacked generalization the outputs of the base classifiers were given to a meta-learner which combines the outputs to get the final output.
We used two different models as meta learners including random forest and majority voting. In majority voting, we used three different thresholds for accepting the votes: 30%, 40% and 50%. By this definition we get more sensitivity through sacrificing the specificity in 30% model while in 50% model, we get more specificity through sacrificing sensitivity.
The prediction performance of the proposed method was evaluated by four major measures of evaluation measure package [62], calculated based on the number of interactions predicted correctly (TP), the number of non-interactions which are predicted correctly (TN), the number of non-interactions which are predicted as interaction (FP) and the number of interactions which are predicted as non-interactions (FN). Some of the formulas of these measures are listed below:
Specificity=TNTN+FP,Sensitivity=TPTP+FNAccuracy=TP+TNTP+TN+FP+FN,FMeasure=2TP2TP+FP+FN |
To predict new HV-PPI, five different categories of features were extracted from physicochemical properties of amino acids, network topology of HPPIN, protein sequences, subcellular localization of human proteins and GO semantic similarities (all human and virus proteins’ extracted features are available at http://bioinf.modares.ac.ir/software/complexnet/Influenza/HumanFeatures.rar and http://bioinf.modares.ac.ir/software/complexnet/Influenza/VirusFeatures.rar, respectively).
Several models were constructed by choosing different features from these categories. Five models are made by choosing all the features of one category. Moreover, 10 more models are made by choosing random features from all the exiting features. Among these models, the best results belonged to the models which had more diverse features.
To estimate the proposed model’s performance, a 10-fold cross validation procedure is used. The dataset is partitioned into 10 equal parts (all 10 partitioned train and test datasets are available at http://bioinf.modares.ac.ir/software/complexnet/Influenza/10FoldCrossValidation.rar). Each time nine partitions are used for training and one remaining partition is used for testing the model. Average of the obtained performance in each evaluation measure of the ten testing sets is reported as the final performance in that evaluation measure which is shown in Figure 4 for each of the classifiers.
Finally, we sent the results to a meta learner. If either the sensitivity or specificity measures are more important for the researcher, majority voting with 30% or 50% positive voter is used. Otherwise, random forest is used as a meta learner. The results are shown in Figure 5.
A database containing the predicted interactome of HVPPI has been constructed, which is publicly accessible at http://bioinf.modares.ac.ir/software/complexnet/Influenza as it is shown in Figure 6. To do this end, all possible interactions (812,625 interactions) between each of 6501 HPs of HVIN with each of 125 IAV proteins were examined by the trained model. Mean of the prediction probability of all the models is reported as the final interaction probability (InPr) of the interaction between each pair. The results are shown in Figure 7 as a heat map in which colors show the probability of interaction between the pairs. Columns and rows represent human and virus proteins respectively. Pairs with a score of one could be good candidates for researchers to do experimental test.
Among all the 812,625 pairs, 6919 pairs have the score 1 (which is available at http://bioinf.modares.ac.ir/software/complexnet/Influenza/Novel6919Interactions.rar). By investigating the human partners of these 6919 pairs, 76 human proteins with a degree larger than 5 are selected (human proteins targeted by more than five virus proteins) and their interaction network are gained by STRING [41] as it is depicted in Figure 7. The constructed network has 256 edges with an average node degree of 6.75 in the HPPIN and an average local clustering coefficient of 0.47. Color of edges determine the type of interaction between the nodes. Cyan edges are interaction extracted from curated databases, while pink ones are experimentally determined. Blue, Green and red edges are predicted interactions gained by gene neighborhood, gene fusion and gene co-occurrence respectively. Light green edges are extracted by text mining, while violet edges are gained by protein homology, and finally, the nodes connected by black edges are co-expressed.
By using DAVID [63] tools, gene ontology enrichment analysis was done on these 76 proteins (results are available at enrichment tab of http://bioinf.modares.ac.ir/software/complexnet/Influenza). Furthermore, by using REVIGO [64], the whole enriched biological process (BP), cellular component (CC) and molecular function (MF) terms are depicted and available at enrichment tab of http://bioinf.modares.ac.ir/software/complexnet/Influenza.
In this study, heterogeneous descriptors were used to predict HV-PPI. Contribution of the different descriptors were measured by removing each feature type in turn and recalculating the evaluation measures of the proposed prediction model; the higher the loss of measures, the more important the feature type. As shown in Table 1, HPPIN topology is the most important feature type in predicting HI-PPIs. GO semantic similarity is another important feature with considering the number of features in each feature type (three features of GO semantic similarity make 0.029 loss of sensitivity in contrast with 1414 features of sequence-based feature type which make only 0.005 more loss of sensitivity).
Feature type | Loss of Sensitivity | Loss of Specificity | Loss of Accuracy |
Nucleotide sequence-based | 0.027 | 0.012 | 0.019 |
GO semantic similarity | 0.022 | 0.011 | 0.018 |
Amino acid sequence-based | 0.008 | 0.003 | 0.005 |
Physicochemical properties | 0.025 | 0.013 | 0.018 |
HPPIN topology | 0.035 | 0.023 | 0.029 |
We also extracted the most important features by four ways:
a. Tree models: In tree models, after the split, the percentage of training samples fallen into all terminal nodes determine the feature importance. In this method, since all samples are affected by the first predictor of the first split, it has an importance measurement of 1. Other predictors will be scored in range of zero to one.
b. Rule-based models: The number of rules involving the predictor determines the importance of features.
c. PCA: Sum of the Loading coefficients of the 10 first Principal Components[65] (PCs) are considered as a score for determining the feature importance.
d. GA-PLS: The selection of the best subset of variables is one of the most popular usage of Genetic Algorithms(GA), Especially in variable selection of Partial Least Squares(PLS) models[66]. For this purpose, we made an initial population by selecting part of variables randomly and fit a PLS model on them. In this method, each variable is considered as a gene and each variable set is considered as a chromosome. Every chromosome consists of 1800 genes, in which each gene is on with probability of 0.2 and so the approximate length of the chromosome is 360 variables. By generating 100 chromosomes, initial population was created. ROC value divided by the number of variables is considered as fitness value. This strategy was performed hundreds of times and the top 30 percent of variable sets with higher fitness values were sent to the next generation. In the new generation we made mutations by changing the variable’s value with probability of 0.05 and also performed a crossover between the variable sets and repeated the previous steps. Finally, the variables with the lowest prediction error were reported as the most important features. As the result of genetic algorithm changed with each run, we repeated the previous step 100 times and the variables with the most frequency in these 100 runs were reported as the most important variables.
Feature importance was calculated by all mentioned models. Sum of the scores of the top 10 percent features is depicted in Figure 9. Panel (a) shows the score distribution of each type while Panel (b) shows the features distribution of each type. For the top one percent of features with highest score Panel (c) shows the score of features with size and type of the circles by putting the features with the same category in one circle, whereas Panel (d) compares the features mean for positive and negative samples.
Figure 9(d) reveals that GO semantic similarity-based features of positive samples have apparently higher values in comparison with negative samples while features extracted from physicochemical properties of amino acids of negative samples have higher values in comparison with positive samples.
It seems that network topology of HPPI network plays the most important role in exposing the important features. The gene ontology semantic similarity-based features play the second most important role in determining the important variables. Furthermore, conjoint triad of virus proteins has a higher chance of being a candidate as important features.
In this study, we proposed a computational method for predicting HIPPI. Five different categories of descriptors including physicochemical properties of amino acids, nucleotide sequence-based descriptors, gene ontology similarities, protein sequence-based features and network centrality measures were used to encode protein pairs. Several different classifiers such as C5, RF, SVM, NB, KNN are used as base classifiers. Ensemble learning was used to combine the classifiers. The final model achieved an accuracy of 0.93, a specificity of 0.95, and a sensitivity of 0.91 in a 10-fold cross validation analysis on our benchmark dataset.
In addition, all of possible pairs between all of the human proteins and IAV proteins are given as input to our constructed model to design a new database which is available via the following link http://bioinf.modares.ac.ir/software/complexnet/Influenza. Among all of the predicted pairs, 6919 pairs have score 1 which could be good candidates for experimental research or drug targets purpose.
Moreover, Enrichment analysis is reported on 76 human proteins targeted by more than five virus proteins of these 6919 pairs.
According to our analysis, network topology of HPPI network, gene ontology semantic similarity and conjoint triad of virus proteins contribute most in predicting HI-PPIs.
The proposed method can be extended to predict other HV-PPIs.
The authors declared that they have no conflicts of interest to this work.
[1] |
De Doncker, RWAA, Divan DM, Kheraluwala MH (1991) A Three-Phase Soft-Switched High-Power-Density DC/DC Converter for High-Power Applications. IEEE T Ind Appl 27: 63–73. https://doi.org/10.1109/28.67533 doi: 10.1109/28.67533
![]() |
[2] | Demetriades GD (2005) On Small-Signal Analysis and Control of the Single- and the Dual-Active Bridge Topologies. |
[3] |
Kheraluwala MN, Gascoigne RW, Divan DM, Baumann ED (1992) Performance Characterization of a High-Power Dual Active Bridge DC-to-DC Converter. IEEE T Ind Appl 28: 1294–1301. https://doi.org/10.1109/28.175280 doi: 10.1109/28.175280
![]() |
[4] | Rodríguez Alonso AR, Sebastian J, Lamar DG, Hernando MM, Vazquez A (2010) An Overall Study of a Dual Active Bridge for Bidirectional DC/DC Conversion. In Proceedings of the 2010 IEEE Energy Conversion Congress and Exposition, 1129–1135. https://doi.org/10.1109/ECCE.2010.5617847 |
[5] |
Krismer F, Kolar JW (2010) Accurate Power Loss Model Derivation of a High-Current Dual Active Bridge Converter for an Automotive Application. IEEE T Ind Electron 57: 881–891. https://doi.org/10.1109/TIE.2009.2025284 doi: 10.1109/TIE.2009.2025284
![]() |
[6] |
Rodríguez A, Vázquez A, Lamar DG, Hernando MM, Sebastián J (2015) Different Purpose Design Strategies and Techniques to Improve the Performance of a Dual Active Bridge with Phase-Shift Control. IEEE T Power Electr 30: 790–804. https://doi.org/10.1109/TPEL.2014.2309853 doi: 10.1109/TPEL.2014.2309853
![]() |
[7] |
Rolak M, Twardy M, Soból C (2022) Generalized Average Modeling of a Dual Active Bridge DC-DC Converter with Triple-Phase-Shift Modulation. Energies 15: 6092. https://doi.org/10.3390/en15166092 doi: 10.3390/en15166092
![]() |
[8] |
Qin H, Kimball JW (2012) Generalized Average Modeling of Dual Active Bridge DC–DC Converter. IEEE T Power Electr 27: 2078–2084. https://doi.org/10.1109/TPEL.2011.2165734 doi: 10.1109/TPEL.2011.2165734
![]() |
[9] | George K (2015) Design and Control of a Bidirectional Dual Active Bridge DC-DC Converter to Interface Solar, Battery Storage, and Grid-Tied Inverters. Electrical Engineering Undergraduate Honors Theses. |
[10] |
Rodriguez-Rodriguez JR, Salgado-Herrera NM, Torres-Jimenez J, Gonzalez-Cabrera N, Granados-Lieberman D, Valtierra-Rodriguez M (2021) Small-Signal Model for Dual-Active-Bridge Converter Considering Total Elimination of Reactive Current. J Mod Power Syst Clean Energy 9: 450–458. https://doi.org/10.35833/MPCE.2018.000911 doi: 10.35833/MPCE.2018.000911
![]() |
[11] | Liu B, Davari P, Blaabjerg F (2020) An Enhanced Generalized Average Modeling of Dual Active Bridge Converters. In Proceedings of the 2020 IEEE Applied Power Electronics Conference and Exposition (APEC), 85–90. https://doi.org/10.1109/APEC39645.2020.9124001 |
[12] |
He J, Chen Y, Lin J, Chen J, Cheng L, Wang Y (2023) Review of Modeling, Modulation, and Control Strategies for the Dual-Active-Bridge DC/DC Converter. Energies 16: 6646. https://doi.org/10.3390/en16186646 doi: 10.3390/en16186646
![]() |
[13] |
Shao S, Chen L, Shan Z, Gao F, Chen H, Sha D, Dragičević T (2022) Modeling and Advanced Control of Dual-Active-Bridge DC–DC Converters: A Review. IEEE T Power Electr 37: 1524–1547. https://doi.org/10.1109/TPEL.2021.3108157 doi: 10.1109/TPEL.2021.3108157
![]() |
[14] | Shah SS, Bhattacharya S (2017) Large & Small Signal Modeling of Dual Active Bridge Converter Using Improved First Harmonic Approximation. In Proceedings of the 2017 IEEE Applied Power Electronics Conference and Exposition (APEC), 1175–1182. https://doi.org/10.1109/APEC.2017.7930844 |
[15] |
Ghazal OM, Marei MI, Mohamad AMI (2024) Small-Signal Modeling Comparison of Dual Active Bridge Converter. e-Prime - Advances in Electrical Engineering, Electronics and Energy 8: 100570. https://doi.org/10.1016/j.prime.2024.100570 doi: 10.1016/j.prime.2024.100570
![]() |
[16] |
Shao S, Chen H, Wu X, Zhang J, Sheng K (2019) Circulating Current and ZVS-on of a Dual Active Bridge DC-DC Converter: A Review. IEEE Access 7: 50561–50572. https://doi.org/10.1109/ACCESS.2019.2911009 doi: 10.1109/ACCESS.2019.2911009
![]() |
[17] |
Wang P, Chen X, Tong C, Jia P, Wen C (2021) Large- and Small-Signal Average-Value Modeling of Dual-Active-Bridge DC–DC Converter with Triple-Phase-Shift Control. IEEE T Power Electr 36: 9237–9250. https://doi.org/10.1109/TPEL.2021.3052459 doi: 10.1109/TPEL.2021.3052459
![]() |
[18] | Hebala OM, Aboushady AA, Ahmed KH, Burgess S, Prabhu R (2018) Generalized Small-Signal Modelling of Dual Active Bridge DC/DC Converter. In Proceedings of the 2018 7th International Conference on Renewable Energy Research and Applications (ICRERA), 914–919. https://doi.org/10.1109/ICRERA.2018.8567014 |
[19] | Gonzalez M, Cardenas V, Pazos F (2004) DQ Transformation Development for Single-Phase Systems to Compensate Harmonic Distortion and Reactive Power. In Proceedings of the 9th IEEE International Power Electronics Congress, 177–182. https://doi.org/10.1109/CIEP.2004.1437575 |
[20] | Bacha S, Munteanu I, Bratcu AI (2014) Power Electronic Converters Modeling and Control: With Case Studies, Advanced Textbooks in Control and Signal Processing, Springer-Verlag: London. https://doi.org/10.1007/978-1-4471-5478-5_1 |
[21] |
Li L, Xu G, Sha D, Liu Y, Sun Y, Su M (2023) Review of Dual-Active-Bridge Converters with Topological Modifications. IEEE T Power Electr 38: 9046–9076. https://doi.org/10.1109/TPEL.2023.3258418 doi: 10.1109/TPEL.2023.3258418
![]() |
[22] | Campos-Salazar JM (2024) Design and Analysis of Battery Chargers for Electric Vehicles Based on Multilevel Neutral-Point- Lamped Technology. Doctoral thesis, Universitat Politècnica de Catalunya: Universitat Politècnica de Catalunya, 2024. |
[23] |
Campos Salazar J, Viani-Abad A, Sandoval-García R (2024) Modeling and Simulation of a Single-Phase Linear Multi-Winding Transformer in the d-q Frame. Journal of Electronics and Electrical Engineering 3: 206–235. https://doi.org/10.37256/jeee.3120244530 doi: 10.37256/jeee.3120244530
![]() |
[24] | Trento BC (2012) Modeling and Control of Single-Phase Grid-Tie Converters. Master's Theses. |
[25] | Erickson RW, Maksimovic D (2013) Fundamentals of Power Electronics, Springer Science & Business Media. |
[26] | Mohan N (2011) Electric Machines and Drives: A First Course, Hoboken, NJ. |
[27] | Katsuhiko O (2009) Modern Control Engineering, Boston. |
[28] | Kuo BC (1991) Automatic Control Systems, 6th edition., Prentice Hall: Englewood Cliffs, N.J. |
[29] | Khalil H (2014) Nonlinear Control, 1st edition., Pearson: Boston. |
Feature type | Loss of Sensitivity | Loss of Specificity | Loss of Accuracy |
Nucleotide sequence-based | 0.027 | 0.012 | 0.019 |
GO semantic similarity | 0.022 | 0.011 | 0.018 |
Amino acid sequence-based | 0.008 | 0.003 | 0.005 |
Physicochemical properties | 0.025 | 0.013 | 0.018 |
HPPIN topology | 0.035 | 0.023 | 0.029 |