We have investigated the optical scattering characteristic of the air-void micro-structure of air-voids of the porous polyimide (PI) layer prepared from a simple polyimide-precursor coating and the immersion precipitation method. For the careful control or tune of the generated pore-structure, the content of polar aprotic solvent in polar protic non-solvent bath was adjusted during the pore-generation process. To account the correlation between the generated micro-structure of a porous polymer layer and its optical scattering property, the optical haze values were calculated and compared with the measured total transmittance and diffuse transmittance values. And the calculated average optical haze value decreased from 0.88 to 0.53 as increasing the content of polar aprotic solvent in the coagulation bath. In addition, the light scattering mechanism was proposed for a prepared porous polymer film consisting of air-voids inside the polymer media and the rough surface at the ambient interface. Finally, for an analytical explanation, we also introduced Mie scattering and Scalar surface scattering which explains the light scattering inside pore structure as well as at the rough surface, respectively. Based on our systematic approach, it can be said that the net power of light scattering was the sum of Mie scattering and the surface scattering.
Citation: Hyeck Go, Eun-Mi Han, Moon Hee Kang, Yong Hyun Kim, Changhun Yun. The coated porous polyimide layers for optical scattering films[J]. AIMS Materials Science, 2018, 5(6): 1102-1111. doi: 10.3934/matersci.2018.6.1102
Related Papers:
[1]
Tao Zhang, Hao Zhang, Ran Wang, Yunda Wu .
A new JPEG image steganalysis technique combining rich model features and convolutional neural networks. Mathematical Biosciences and Engineering, 2019, 16(5): 4069-4081.
doi: 10.3934/mbe.2019201
[2]
Chunmei He, Hongyu Kang, Tong Yao, Xiaorui Li .
An effective classifier based on convolutional neural network and regularized extreme learning machine. Mathematical Biosciences and Engineering, 2019, 16(6): 8309-8321.
doi: 10.3934/mbe.2019420
[3]
Boyang Wang, Wenyu Zhang .
ACRnet: Adaptive Cross-transfer Residual neural network for chest X-ray images discrimination of the cardiothoracic diseases. Mathematical Biosciences and Engineering, 2022, 19(7): 6841-6859.
doi: 10.3934/mbe.2022322
[4]
Gang Cao, Antao Zhou, Xianglin Huang, Gege Song, Lifang Yang, Yonggui Zhu .
Resampling detection of recompressed images via dual-stream convolutional neural network. Mathematical Biosciences and Engineering, 2019, 16(5): 5022-5040.
doi: 10.3934/mbe.2019253
[5]
Shuai Cao, Biao Song .
Visual attentional-driven deep learning method for flower recognition. Mathematical Biosciences and Engineering, 2021, 18(3): 1981-1991.
doi: 10.3934/mbe.2021103
Chun Li, Ying Chen, Zhijin Zhao .
Frequency hopping signal detection based on optimized generalized S transform and ResNet. Mathematical Biosciences and Engineering, 2023, 20(7): 12843-12863.
doi: 10.3934/mbe.2023573
[8]
Jose Guadalupe Beltran-Hernandez, Jose Ruiz-Pinales, Pedro Lopez-Rodriguez, Jose Luis Lopez-Ramirez, Juan Gabriel Avina-Cervantes .
Multi-Stroke handwriting character recognition based on sEMG using convolutional-recurrent neural networks. Mathematical Biosciences and Engineering, 2020, 17(5): 5432-5448.
doi: 10.3934/mbe.2020293
[9]
Sakorn Mekruksavanich, Wikanda Phaphan, Anuchit Jitpattanakul .
Epileptic seizure detection in EEG signals via an enhanced hybrid CNN with an integrated attention mechanism. Mathematical Biosciences and Engineering, 2025, 22(1): 73-105.
doi: 10.3934/mbe.2025004
[10]
Mei-Ling Huang, Zong-Bin Huang .
An ensemble-acute lymphoblastic leukemia model for acute lymphoblastic leukemia image classification. Mathematical Biosciences and Engineering, 2024, 21(2): 1959-1978.
doi: 10.3934/mbe.2024087
Abstract
We have investigated the optical scattering characteristic of the air-void micro-structure of air-voids of the porous polyimide (PI) layer prepared from a simple polyimide-precursor coating and the immersion precipitation method. For the careful control or tune of the generated pore-structure, the content of polar aprotic solvent in polar protic non-solvent bath was adjusted during the pore-generation process. To account the correlation between the generated micro-structure of a porous polymer layer and its optical scattering property, the optical haze values were calculated and compared with the measured total transmittance and diffuse transmittance values. And the calculated average optical haze value decreased from 0.88 to 0.53 as increasing the content of polar aprotic solvent in the coagulation bath. In addition, the light scattering mechanism was proposed for a prepared porous polymer film consisting of air-voids inside the polymer media and the rough surface at the ambient interface. Finally, for an analytical explanation, we also introduced Mie scattering and Scalar surface scattering which explains the light scattering inside pore structure as well as at the rough surface, respectively. Based on our systematic approach, it can be said that the net power of light scattering was the sum of Mie scattering and the surface scattering.
1.
Introduction
Enhancers are non-coding DNA fragments, which hold responsibility for regulating gene expression in both transcription and translation and the production of RNA and proteins [1]. Unlike the proximal elements promoters of the gene, enhancers are distal elements that can be located up to 20kb upstream or downstream away from a gene, or even located on a different chromosome [2]. Such locational variation makes the identification of enhancers challenging. Moreover, genetic variation in enhancers has been demonstrated that it is related to many human illnesses, such as cancer [3,4], disorder [4,5] and inflammatory bowel disease [6]. Genome-wide study of histone modifications has shown that enhancers are a large group of functional elements with many different subgroups, such as strong enhancers and weak enhancers, poised enhancers and inactive enhancers [7]. Because enhancers of different subgroups have different biological activities, understanding enhancers and their subgroups is an important task, especially for the identification of the enhancers and their strength.
Due to the importance of enhancers in genomics and disease, the identification of the enhancers and their strength has become a popular topic in biological research. The pioneering works carried out purely by the experimental techniques include chromatin immunoprecipitation followed by deep sequencing [8,9,10], DNase I hypersensitivity [11] and genome-wide mapping of histone modifications [12,13,14,15,16]. However, the experimental methods are expensive, time consuming and low accuracy. Therefore, several computational methods were developed in order to fast identify enhancers and their strength in genomes. In 2016, Liu et al. [2] developed a two-layer predictor iEnhancer-2L, which is the first computational model for identifying not only enhancers, but also their strength by pseudo k-tuple nucleotide composition. At the same year, Jia et al. [17] proposed EnhancerPred model by fusing bi-profile Bayes and pseudo-nucleotide composition as multiple features, and a two-step wrapper for feature selection to distinguish between enhancers and non-enhancers and to determine enhancers' strength. In 2018, Liu et al. [18] established the iEnhancer-EL model for identifying enhancers and their strength with ensemble learning approach. In 2019, Nguyen et al. [19] put forward iEnhancer-ECNN model to identify enhancers and their strength using ensembles of convolutional neural networks. At the same year, Tan et al. [20] used ensemble of deep recurrent neural networks for identifying enhancers via dinucleotide physicochemical properties. Le et al. [21] developed iEnhancer-5Step model to identifying enhancers and their strength using hidden information of DNA sequences via Chou's 5-step rule and word embedding. In 2021, Basith et al. [22] proposed Enhancer-IF model by integrative machine learning (ML)-based framework for identifying cell-specific enhancers. At the same year, Cai ea al. [23] established iEnhancer-XG model by using XGBoost as a base classifier and k-spectrum profile, mismatch k-tuple, subsequence profile, position-specific scoring matrix and pseudo dinucleotide composition as feature extraction methods. Le et al. [24] use a transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers. Lim et al. [25] proposed iEnhancer-RF model to identify enhancers and their strength by enhanced feature representation using random forest. However, the stability of the model still needs to be improved, especially for identifying the strong enhancers from the weak enhancers.
In this study, we focus on developing a novel model named iEnhancer-MFGBDT to identify enhancers and their strength. Its first layer serves to identify whether a DNA sequence sample is of enhancer or not, while its second layer is to identify whether the identified enhancer as being strong or weak. We fuse k-mer and reverse complement k-mer nucleotide composition based on DNA sequence, and second-order moving average, normalized Moreau-Broto auto-cross correlation and Moran auto-cross correlation based on dinucleotide physical structural property matrix as extracted multiple features, and a 902-dimensional feature vector is obtained for each enhancer sequence. Then, gradient boosting decision tree (GBDT) algorithm in this study is adopted as the feature selection strategy and also as the classifier. The accuracy of enhancers and their strength on the benchmark dataset with the 10-fold cross-validation are 78.67% and 66.04%, respectively. The accuracy of enhancers and their strength on the independent dataset with the 10-fold cross-validation are 77.50% and 68.50%, respectively. The experimental results indicate that our model improves the accuracies to identify enhancers and their strength, and is a useful supplementary tool.
2.
Materials and methods
2.1. Datasets
In order to facilitate comparison, in this study, we adopt the benchmark dataset S constructed by Liu et al. [2], they obtain the 2968 enhancer sequences with 200bp which can be formulated by
S=S+∪S−S+=S+strong∪S+weak,
(1)
where S+ contains 1484 enhancer sequences, S− contains 1484 non-enhancer sequences, S+strong contains 742 strong enhancer sequences, S+weak contains 742 weak enhancer sequence, in which none of the enhancer DNA sequences has the pairwise sequences similarities more than 80%.
2.2. Feature extraction
Suppose that a DNA enhancer sequence D with L nucleic acid residues is expressed by
where Bi denotes the i-th nucleic acid residue of the DNA sequence at the sequence position i. In this study, 902 multiple features are extracted by fusing k-mer nucleotide composition, reverse complementary k-mer, second-order moving average, Moreau-Broto auto-cross correlation, and Moran auto-cross correlation based on dinucleotide property matrix.
2.2.1. K-mer nucleotide composition
K-mer nucleotide composition is a basic feature extraction approach and widely used in different fields of bioinformatics [26,27,28,29]. For a enhancer sequence with nucleotides, the k-mer nucleotide compositions involve all the possible subsequences with length of the enhancer sequence. We slide along the enhancer sequence with one nucleotide as a step size using a sliding window. When the subsequence of the enhancer sequence matches with the -th k-mer nucleotide composition, the occurrence number of the k-mer is denoted by. represents the occurrence frequency of the -th k-mer, and can be expressed by
fi=niL−k+1.
(3)
For each k, we can obtain 4kk-mer features, here we let k=1,2,3, Finally, each enhancer sequence obtains 41+42+43=84-dimensional k-mer feature vector.
2.2.2. Reverse complementary k-mer
The reverse complementary k-mer is a variant of the basic k-mer, and abbreviated as RevKmer, in which the k-mers are not expected to be strand-specific, so reverse complements are collapsed into a single feature. For example, when k = 2, there are totally 16 basic k-mers, but by removing the reverse complementary k-mers, only 10 different dinucleotides AA, AC, AG, AT, CA, CC, CG, GA, GC and TA are be retained. In other words, we obtain 10 reverse complementary 2-mer features. Let k=1,2,3, 2+10+32 = 44 RevKmer features are extracted, which can be calculated by a web server named Pse-in-One 2.0 [30].
2.2.3. Second-order moving average based on dinucleotide property matrix
As has been reported, DNA physicochemical properties play crucial role in gene expression regulation and genome analysis, and are also closely correlated with the functional non-coding elements [31,32,33]. In this study, six dinucleotide physical structural properties are adopted, include three the local translational parameters related to shift, slide and rise, and three the local angular parameters related to twist, tilt and roll [34]. The values of six DNA dinucleotide physical structural properties are shown in Table 1. Each DNA physical structural property is normalized for reducing the bias and noise by the following formula
P−PminPmax−Pmin,
(4)
Table 1.
The original values of the six physical structural properties for the 16 dinucleotides in DNA.
where P is the original value of the property, Pmin and Pmax are the minimum and the maximum property values, respectively.
A DNA sequence is a polymer of four nucleotides with A, C, G and T. Any combination of two nucleotides is called dinucleotide. Hence, there are totally 4*4 = 16 basic dinucleotides. First of all each dinucleotide in a DNA sequence is replaced by the value of the physical structural property. Then, each DNA sequence in the datasets can be converted into a matrix P=(pi,j)(L−1)×6, which is named by dinucleotide property matrix (DPM), where L represents the number of nucleic acid residue in this DNA sequence. pi,j represents the value of the ith dinucleotide corresponding to the jth physical structural property.
Second-order moving average (SOMA) algorithm is proposed by Alessio et al. [35], which is defined by fusing the idea of the moving average and the second-order difference. SOMA mainly investigate the long-range correlation properties of a stochastic time series.
Let a discrete stochastic time series be y(i),i=1,2,⋯,L, where L is the size of the stochastic series y(i). The algorithm of the SOMA is described as follows
Step 1. Calculate the moving average ˜yn(i) of the time series y(i) as
˜yn(i)=1nn−1∑k=0y(i−k),
(5)
where n is the moving average window. When n→0, then ˜yn(i)→y(i).
Step 2. For a given moving average window n, 2⩽n<L, the second-order difference between the y(i) and ˜yn(i) is defined by
σ2MA=1L−nL∑i=n[y(i)−˜yn(i)]2,
(6)
where σ2MA is a systematic analysis of the properties of y(i) with respect to ˜yn(i), so σ2MA is called the second-order moving average.
A dinucleotide property matrix contains 6 columns, each column is considered a time series, in other words, a dinucleotide property matrix contains 6 time series. Hence, each enhancer DNA sequence is represented by 6 SOMA features for a certain moving average window n. Here, we let n=2,3,⋯,10, we construct a 6∗9=54-dimensional SOMA-DPM feature vector for each enhancer sequence.
2.2.4. Moreau-Broto auto-cross correlation based on dinucleotide property matrix
Normalized Moreau-Broto auto-cross correlation (NMBACC) [36] based on dinucleotide property matrix for extracting global sequence information can be described by
where λ is the lag of the auto-cross correlation along the column in dinucleotide property matrix. Pi,s represents the value at the i-th row for the s-th column (s-th property index), Pi+λ,t represents the value at the i+λ-th row for the t-th column (t-th property index). When s=t, NMBACC(s,s,λ) represents the auto- correlation with the same property. When s≠t, NMBACC(s,t,λ) represents the cross-correlation with the different property. Here, we let λ=1,2,3,⋯,10, finally, each enhancer sequence obtains a 6∗6∗10=360-dimensional NMBACC-DPM feature vector.
2.2.5. Moran auto-cross correlation based on dinucleotide property matrix
Moran auto-cross correlation (MACC) [37] based on dinucleotide property matrix for extracting global sequence information can be described by
where λ is the lag along the column in dinucleotide property matrix, pi,s and pi,trepresent the value at the i-th row for the s-th column and t-th column in dinucleotide property matrix, respectively. pi+λ,t represents the value at the (i+λ)-th row for the t-th column in dinucleotide property matrix. ˉps and ˉpt are the average value for the s-th and t-th column, respectively. When s=t, MACC(s,s,λ) represents the auto-correlation with the same property. When s≠t,MACC(s,t,λ) represents the cross-correlation with the different property. Here, we let λ=1,2,3,⋯,10, finally, each enhancer sequence obtains a 6∗6∗10=360-dimensional MACC-DPM feature vector.
2.3. Gradient boosting decision tree
Gradient boosting decision tree (GBDT) is a Boosting algorithm based on decision tree as base learner, was proposed by Freidman in 2001 [38,39]. It builds a decision tree in each iteration to reduce the residual of the current model in the gradient direction. Then linearly combines the decision tree with the current model to obtain a new model. GBDT repeats the iteration until the number of decision trees reaches the specified value, and the final strong learner is obtained. GBDT is commonly used for regression, classification and feature selection. GBDT's advantages include: (a) It flexible processes of various types of data, including both continuous and discrete dataset; (b) It has powerful predictive ability and generalization ability; (c) It has good interpretability and robustness, can automatically discover high-order relationships between features, and does not require data normalization and other processing.
The GBDT classification algorithm process is as follows
Input: training dataset D={(x1,y1),(x2,y2),⋯,(xm,ym)}. Suppose that the maximum iteration number is T, the loss function is L(y,f(x)), and m is the number of samples.
(1) Initialize the weak classifier as follows
f0(x)=argmincm∑i=1L(yi,c),
(9)
c is the constant value that minimizes the loss function, that is, f0(x) is a tree with only one root node.
(2) For t=1 to T
a. For i=1 to m, calculate negative gradient as follows
where the loss function L(y,f(x))=log(1+exp(yf(x))),y∈{−1,1}.
b. Use L(xi,rti),i=1,2,⋯,m to fit a CART regression tree to get the t-th regression tree, and its corresponding leaf node area is Rtj, j=1,2,⋯,J.J is the number of leaf nodes of the regression tree t.
c. For leaf node area j=1,2,⋯,J, calculate the best residual fitting value as follows
ctj=argmincm∑xi∈Rtjlog[1+exp(−yi(ft−1(xi)+c))].
(11)
As the above equation is difficult to optimize, ctj is generally replaced by the approximate value as
ctj=∑xi∈Rtjrti∑xi∈Rtj|rti|(|1−rti|).
(12)
d. Update the strong classifier by
ft(x)=ft−1(x)+J∑j=1ctjI(x∈Rtj).
(13)
(3) Get the final strong classifier f(x) by
f(x)=fT(x)=J∑t=1J∑j=1ctjI(x∈Rtj).
(14)
Output: fT(x).
GBDT can not only be used for classification, but also can be used for feature selection by calculating the gini index. The gini index is ranked in descending order by the importance of the feature, the first k features can be selected as needed. In this study, we adopt GBDT to carry out feature selection and classification.
2.4. Cross-validation and performance assessment
In order to save the computational time, 10-fold cross-validation is carried out for each feature to evaluate the identification performance in this study. The dataset is randomly divided into ten subsets with approximately equal size, and the ratio of the testing set to the training set is 1:9. Each subset is in turn taken as a test set and the remaining nine subsets are used to train the GBDT classifier, and finally the average performance measures over the ten validation results are used for performance evaluation. K-fold cross-validation approach can improve the reliability of evaluation, because all of the original data are considered and each subset is tested only once.
To make an objective and comprehensive evaluation, we employ different performance measures [40,41,42,43], including sensitivity (Sn), specificity (Sp), accuracy (Acc) and Matthews correlation coefficient (MCC), The MCC value is ranging from -1 to 1, while the values of other three measures range from 0 to 1. They can be formulated as
where N+ represents the total number of the true enhancer sequences investigated, while N+− represents the number of true enhancer sequences incorrectly identified to be non-enhancer sequences; N− represents the total number of the non-enhancer sequences investigated while N−+ represents the number of the non-enhancer sequences incorrectly identified to be enhancer sequences.
We also employ the receiver operating characteristic (ROC) curve [44] and the area under the ROC curve (AUC) [45] to evaluate our model. The ROC curve plots the true positive rate (Specificity) as a function of the false positive rate (1-Specificity) for all possible thresholds. The ROC curve is closer to the upper left corner, the better the identification performance is. In other words, the closer the AUC is to 1, the better the identification system is.
3.
Results and discussion
3.1. Identification performance on the benchmark dataset
Identifying enhancers is a binary classification problem, which can be divided into two layers, the first layer is devoted to identify whether a DNA sequence is of enhancer or not, while the second layer is committed to identify enhancer sequence as being strong or weak enhancer. In this study, a novel model iEnhancer-MFGBDT is proposed by using multi-features and gradient boosting decision tree. Firstly, the 902 multi-features are extracted for both layers of each enhancer sequence, which contain 84 k-mer features, 44 RevKmer features, 54 SOMA-DPM features, 360 NMBACC-DPM features and 360 MACC-DPM features. Next, 156 features for the first layer, 263 features for the second layer are selected from 902 multi-features with the GBDT algorithm by the gini index, respectively. Finally, the GBDT classifier is adopted to implement classification using the 10-fold cross-validation. The Figure 1 shows the operating flow of iEnhancer-MFGBDT model.
Figure 1.
The flowchart of the iEnhancer-MFGBDT model.
Identification results by the 10-fold cross-validation are shown in Table 2 by our iEnhancer-MFGBDT model on the benchmark datastet. From Table 2, we can see that the accuracy reaches 78.67% and 66.04% for the first and second layers on the benchmark dataset, respectively. Meanwhile, the values of Sn, Sp and MCC reach 77.54%, 79.78%, 0.5735 for the first layer, 70.56%, 61.63%, 0.3232 for the second layer. The AUC indicates the probability at which the model ranks a randomly selected positive sample higher than a randomly selected negative sample. In fact, The AUC can measure the overall performance of a given identification system. The ROC curves are plotted for the both first and second layers, and shown in Figure 2. The AUC values on the benchmark dataset are 0.8615 and 0.7187 for the first layer and the second layer, respectively. Obviously, the second layer is more difficult to identify than the first layer due to their position variation and free scattering.
Table 2.
The identification performance of iEnhancer-MFGBDT with 10-fold cross validation on the benchmark dataset.
In this study, we adopt five different approaches to extract features from the benchmark dataset, which are named as K-mer, RevKmer, SOMA-DPM, NMBACC-DPM and MACC-DPM feature group, respectively. For the purpose of obtaining the importance of single feature group, we calculate the performance for K-mer, RevKmer, SOMA-DPM, NMBACC-DPM and MACC-DPM, respectively, and as shown in Table 3. The accuracy of single feature group is lower than that of multiple features after GBDT feature selection (MGBDT) for the both layers. Therefore, the fusion of multiple features is very necessary. From Table 3, we can see that the best identification performance is K-mer, followed by RevKmer, NMBACC-DPM, SOMA-DPM successively, the MACC-DPM is the lowest one for the first layer. Meanwhile, we also can see that the best identification performance is RevKmer, followed by K-mer, SOMA-DPM, MACC-DPM successively, the NMBACC-DPM is the lowest one for the second layer. Among these five feature groups, k-mer and RevKmer are the feature extraction methods based on DNA sequence, SOMA-DPM, NMBACC-DPM and MACC-DPM are the feature extraction methods based on physical structural properties of DNA dinucleotide. Obviously, the DNA sequence-based feature group is superior to physical structural properties-based feature group.
Table 3.
Feature group analysis of iEnhancer-MFGBDT with 10-fold cross validation on the benchmark dataset.
3.3. Comparison with feature selection and without feature selection
We construct 902 features by multiple features, and the large dimension will lead to decrease predictive performance, a handicap for the computation and information redundancy. The features selection can help the original classification system achieve a better predictive performance and a lower computational cost by removing redundant features. Hence, finding a suitable dimension reduction method is very important. The gini index is ranked in descending order by importance for GBDT, here, we use "mean" and "gini" as the threshold and criterion for feature selection. Figure 3 shows the accuracy comparison between our model with feature selection and without feature selection. It is obvious that the accuracies have been improved for both layers in the benchmark dataset, and clearly shows that GBDT feature selection method has great effect on improving accuracy. The accuracy is improved by 1.35% and 5.87% for the first layer and the second layer by using GBDT feature selection, respectively. These experimental results show that GBDT is very effective for the benchmark dataset.
Figure 3.
Identification accuracy comparison between with feature selection and without feature selection on the benchmark dataset.
To demonstrate the superiority of GBDT classifier, support vector machine (SVM), extra trees (ET), random forest (RF) and Bagging classifiers are tested successively using the selected features by GBDT based on the 10-fold cross-validation. As shown in Figure 4, the identification accuracy of SVM, ET, RF and Bagging reaches 75.64%, 77.02%, 77.15% and 76.75% for the first layer, and 60.04%, 62.47%, 65.02% and 64.75% for the second layer, respectively. However, the identification accuracy of GBDT reaches 78.67% and 66.04% for the first and second layer, respectively, we can see that from Figure 4, the accuracies of SVM, ET, RF and Bagging are all lower than the accuracies obtained by GBDT for the both layers. The results show that GBDT is more powerful for our benchmark dataset than other classifiers.
Figure 4.
Identification accuracy comparison with different classifiers.
In order to avoid experimental errors, it is persuasive to use an independent dataset to objectively evaluate our model. We adopt the independent dataset also constructed by Liu et al. [2], which contains the 400 enhancer sequences with 200bp, among them, 100 strong enhancer sequences, 100 weak enhancer sequences and 200 non-enhancer sequences, and sequence similarity is less than or equal to 80%. The results obtained by the proposed model using the 10-fold cross-validation on the independent dataset test are given in Table 4. For the first layer, the ACC, Sn, Sp, MCC and AUC reach 77.50%, 76.79%, 79.55%, 0.5607 and 0.8589, respectively. For the second layer, the ACC, Sn, Sp, MCC and AUC reach 68.50, 72.55%, 66.81%, 0.3862 and 0.7524, respectively. The values of these metrics further illustrate the effectiveness of our model.
Table 4.
The identification performance of iEnhancer-MFGBDT with 10-fold cross validation on the independent dataset.
The proposed iEnhancer-MFGBDT model, is compared with eight state-of-the-art models: iEnhancer-2L [2], iEnhancerPred [17], iEnhancer-EL [18], iEnhancer-ECNN [19], Tan et al. [20], iEnhancer-XG [23], BERT-2D CNNs [24] and iEnhancer-RF [25]. The values of Acc, Sn, Sp and MCC are listed in Tables 5 and 6.
Table 5.
The comparison with other methods in identifying enhancers and their strength on the benchmark dataset.
For the benchmark dataset, iEnhancer-2L, iEnhancerPred, iEnhancer-EL, Tan et al., iEnhancer-XG and iEnhancer-RF models are adopted for comparison for the both layers, of which the values of ACC, Sn, Sp and MCC are listed in Table 5. Among the six models, the accuracy for our model is lower than that of iEnhancer-XG model for the both layers, but the stability of our model is higher than that of iEnhancer-XG model. The accuracy for our model is 1.78%, 5.49%, 0.64%, 3.84% and 2.49% higher than the iEnhancer-2L, iEnhancerPred, iEnhancer-EL, Tan et al. and iEnhancer-RF models for the first layer, respectively, and the accuracy for our model is 4.11%, 3.98%, 1.01%, 7.08% and 3.51% higher than the iEnhancer-2L, iEnhancerPred, iEnhancer-EL, Tan et al and iEnhancer-RF models for the second layer, respectively. As shown in Table 5, our model has the best performance and is the most stable model from Sn, Sp and MCC.
For the independent dataset, iEnhancer-2L, iEnhancerPred, iEnhancer-EL and iEnhancer-ECNN, Tan et al., iEnhancer-XG and BERT-2D CNNs models are adopted for comparison for the first layer, of which the values of ACC, Sn, Sp and MCC are listed in Table 6. The accuracy is improved by 0.6%–4.5% for the first layer. From Table 6, we can see that iEnhancer-2L, iEnhancerPred, iEnhancer-EL and iEnhancer-ECNN, Tan et al., and iEnhancer-XG models are adopted for comparison for the second layer, The accuracy is improved by 0.01%-13.5% for the second layer. The test results still show that the performance of iEnhancer-MFGBDT is best on the independent dataset. Our model achieves remarkably better results than other existing models, and make a considerable improvement for performance.
4.
Conclusions
In this study, an effective computational tool called enhancers-MFGBDT has been developed for identification of DNA enhancers and their strength. The iEnhancer-MFGBDT model is established by fusing multi-features and GBDT based on the 10-fold cross validation. Compared with the existing models, our model can obtain satisfactory accuracies for the first and second layers on the benchmark dataset and independent dataset. It is anticipated that iEnhancer-MFGBDT will become a very useful high throughput tool for researching enhancers or, at the least, play an important complementary role to the existing models. As pointed out in [46] by Chou and Shen, user-friendly and publicly accessible web-servers represent the future direction for practically developing more useful computational tools, and have increasing impacts on medical science [47]. In the future, we will make great efforts to establish a web-server for the iEnhancer-MFGBDT model to facilitate communication among colleagues in bioinformatics.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 12101480), the Natural Science Basic Research Program of Shaanxi (Nos.2021JM-115, 2021JM-444), and the Fundamental Research Funds for the Central Universities (No. JB210715).
Conflict of interest
The authors declare no conflict of interest.
References
[1]
Chang HW, Lee J, Hofmann S, et al. (2013) Nano-particle based scattering layers for optical efficiency enhancement of organic light-emitting diodes and organic solar cells. J Appl Phys 113: 204502-1–204502-8. doi: 10.1063/1.4807000
[2]
Song J, Kim KH, Kim E, et al. (2018) Lensfree OLEDs with over 50% external quantum efficiency via external scattering and horizontally oriented emitters. Nat Commun 9: 3207–3217. doi: 10.1038/s41467-018-05671-x
[3]
Xue J, Gu Y, Shan Q, et al. (2017) Constructing Mie-Scattering Porous Interface-Fused Perovskite Films to Synergistically Boost Light Harvesting and Carrier Transport. Angew Chem Int Ed 56: 5232–5236. doi: 10.1002/anie.201700600
[4]
Nirmal A, Kyaw AKK, Sun XW, et al. (2014) Microstructured porous ZnO thin film for increased light scattering and improved efficiency in inverted organic photovoltaics. Opt Express 22: A1412–A1421. doi: 10.1364/OE.22.0A1412
Kim E, Cho H, Kim K, et al. (2015) A Facile Route to Efficient, Low-Cost Flexible Organic Light-Emitting Diodes: Utilizing the High Refractive Index and Built-In Scattering Properties of Industrial-Grade PEN Substrates. Adv Mater 27: 1624–1631. doi: 10.1002/adma.201404862
[7]
Bathelt R, Buchhauser D, Gärditz C, et al. (2007) Light extraction from OLEDs for lighting applications through light scattering. Org Electron 8: 293–299. doi: 10.1016/j.orgel.2006.11.003
[8]
Malinka AV (2014) Light scattering in porous materials: Geometrical optics and stereological approach. J Quant Spectrosc Ra 141: 14–23. doi: 10.1016/j.jqsrt.2014.02.022
[9]
Penttilä A, Lumme K (2009) The effect of the properties of porous media on light scattering. J Quant Spectrosc Ra 110: 1993–2001. doi: 10.1016/j.jqsrt.2009.05.009
[10]
Koh TW, Spechler JA, Lee KM, et al. (2015) Enhanced outcoupling in organic light-emitting diodes via a high-index contrast scattering layer. ACS Photonics 2: 1366–1372. doi: 10.1021/acsphotonics.5b00346
[11]
Go H, Koh TW, Jung H, et al. (2017) Enhanced light-outcoupling in organic light-emitting diodes through a coated scattering layer based on porous polymer films. Org Electron 47: 117–125. doi: 10.1016/j.orgel.2017.05.009
[12]
Strey R (1994) Microemulsion microstructure and interfacial curvature. Colloid Polym Sci 272: 1005−1019.
[13]
LEE KM, Fardel R, Zhao L, et al. (2017) Enhanced outcoupling in flexible organic light-emitting diodes on scattering polyimide substrates. Org Electron 51: 471−476.
[14]
Niesen B, Rand BP (2009) Thin Film Metal Nanocluster Light-Emitting Devices. Adv Mater 26: 1446–1449.
[15]
Kim YD, Kim JY, Lee HK, et al. (1999) Formation of polyurethane membrane by immersion precipitation. II. Morphology formation. J Appl Polym Sci 74: 2124–2132.
[16]
Reuvers AJ, van den Berg JWA, Smolders CA (1987) Formation of membranes by means of immersion precipitation: Part I. A model to describe mass transfer during immersion precipitation. J Membrane Sci 34: 45–65.
[17]
Reuvers AJ, van den Berg JWA, Smolders CA (1987) Formation of membranes by means of immersion precipitation: Part II. The mechanism of formation of membranes prepared from the system cellulose acetate-acetone-water. J Membrane Sci 34: 67–86.
[18]
Kelly K, Coronado E, Zhao L, et al. (2003) The Optical Properties of Metal Nanoparticles: The Influence of Size, Shape, and Dielectric Environment. J Phys Chem B 107: 668–677.
[19]
Zeman M, van Swaaij R, Metselaar JW, et al. (2000) Optical modeling of solar cells with rough interfaces: Effect of back contact and interface roughness. J Appl Phys 88: 6436–6443. doi: 10.1063/1.1324690
This article has been cited by:
1.
Luca Biggio, Iason Kastanis,
Prognostics and Health Management of Industrial Assets: Current Progress and Road Ahead,
2020,
3,
2624-8212,
10.3389/frai.2020.578613
2.
Tarek Berghout, Leïla-Hayet Mouss, Ouahab Kadri, Lotfi Saïdi, Mohamed Benbouzid,
Aircraft Engines Remaining Useful Life Prediction with an Improved Online Sequential Extreme Learning Machine,
2020,
10,
2076-3417,
1062,
10.3390/app10031062
3.
Bin He, Long Liu, Dong Zhang,
Digital Twin-Driven Remaining Useful Life Prediction for Gear Performance Degradation: A Review,
2021,
21,
1530-9827,
10.1115/1.4049537
4.
Sergey Barkalov, Dmitry Dorofeev, Irina Fedorova, Alla Polovinkina, V. Breskich, S. Uvarova,
Application of digital twins in the management of socio-economic systems,
2021,
244,
2267-1242,
11001,
10.1051/e3sconf/202124411001
5.
Biao Wang, Yaguo Lei, Tao Yan, Naipeng Li, Liang Guo,
Recurrent convolutional neural network: A new framework for remaining useful life prediction of machinery,
2020,
379,
09252312,
117,
10.1016/j.neucom.2019.10.064
6.
Tarek Berghout, Leïla-Hayet Mouss, Toufik Bentrcia, Elhoussin Elbouchikhi, Mohamed Benbouzid,
A deep supervised learning approach for condition-based maintenance of naval propulsion systems,
2021,
221,
00298018,
108525,
10.1016/j.oceaneng.2020.108525
7.
Jinyang Jiao, Ming Zhao, Jing Lin, Kaixuan Liang,
A comprehensive review on convolutional neural network in machine fault diagnosis,
2020,
417,
09252312,
36,
10.1016/j.neucom.2020.07.088
8.
Xinyun Zhang, Yan Dong, Long Wen, Fang Lu, Wei Li,
2019,
Remaining Useful Life Estimation Based on a New Convolutional and Recurrent Neural Network,
978-1-7281-0356-3,
317,
10.1109/COASE.2019.8843078
9.
Liangwei Zhang, Jing Lin, Bin Liu, Zhicong Zhang, Xiaohui Yan, Muheng Wei,
A Review on Deep Learning Applications in Prognostics and Health Management,
2019,
7,
2169-3536,
162415,
10.1109/ACCESS.2019.2950985
10.
Zhao Zhang, Xinyu Li, Long Wen, Liang Gao, Yiping Gao,
2019,
Fault Diagnosis Using Unsupervised Transfer Learning Based on Adversarial Network,
978-1-7281-0356-3,
305,
10.1109/COASE.2019.8842881
11.
Ali Al-Dulaimi, Soheil Zabihi, Amir Asif, Arash Mohammed,
NBLSTM: Noisy and Hybrid Convolutional Neural Network and BLSTM-Based Deep Architecture for Remaining Useful Life Estimation,
2020,
20,
1530-9827,
10.1115/1.4045491
12.
Tao Li, YinQuan Yu, JinWen Yang, Long Zhang, WenBin Tu, Hao Yong,
Method for Predicting Cutter Remaining Life Based on Multi-scale Cyclic Convolutional Network (MSRCNN),
2021,
1754,
1742-6588,
012218,
10.1088/1742-6596/1754/1/012218
13.
Zhen Jia, Yaguang Luo, Dayang Wang, Quynh N. Dinh, Sophia Lin, Arnav Sharma, Ethan M. Block, Manyun Yang, Tingting Gu, Arne J. Pearlstein, Hengyong Yu, Boce Zhang,
Nondestructive Multiplex Detection of Foodborne Pathogens with Background Microflora and Symbiosis Using a Paper Chromogenic Array and Advanced Neural Network,
2021,
09565663,
113209,
10.1016/j.bios.2021.113209
14.
Jun Xia, Yunwen Feng, Cheng Lu, Chengwei Fei, Xiaofeng Xue,
LSTM-based multi-layer self-attention method for remaining useful life estimation of mechanical systems,
2021,
125,
13506307,
105385,
10.1016/j.engfailanal.2021.105385
15.
Seokgoo Kim, Nam Ho Kim, Joo-Ho Choi,
A Study Toward Appropriate Architecture of System-Level Prognostics: Physics-Based and Data-Driven Approaches,
2021,
9,
2169-3536,
157960,
10.1109/ACCESS.2021.3129516
16.
Angel J. Alfaro-Nango, Elias N. Escobar-Gomez, Eduardo Chandomi-Castellanos, Sabino Velazquez-Trujillo, Hector R. Hernandez-De-Leon, Lidya M. Blanco-Gonzalez,
2022,
Predictive Maintenance Algorithm Based on Machine Learning for Industrial Asset,
978-1-6654-9607-0,
1489,
10.1109/CoDIT55151.2022.9803983
17.
Amgad Muneer, Shakirah Mohd Taib, Suliman Mohamed Fati, Hitham Alhussian,
Deep-Learning Based Prognosis Approach for Remaining Useful Life Prediction of Turbofan Engine,
2021,
13,
2073-8994,
1861,
10.3390/sym13101861
18.
Seokgoo Kim, Joo-Ho Choi, Nam H. Kim,
Challenges and Opportunities of System-Level Prognostics,
2021,
21,
1424-8220,
7655,
10.3390/s21227655
19.
Manuel Arias Chao, Chetan Kulkarni, Kai Goebel, Olga Fink,
Fusing physics-based and deep learning models for prognostics,
2022,
217,
09518320,
107961,
10.1016/j.ress.2021.107961
20.
Viatcheslav P. Shuvalov, Boris P. Zelentsov, Irina G. Kvitkova,
2021,
Optimization of Testing Intervals in the Conditions of Optical Fiber Periodic Predictive Control,
978-1-6654-3408-9,
346,
10.1109/APEIE52976.2021.9647546
21.
Xianjun Du, Wenchao Jia, Ping Yu, Yaoke Shi, Bin Gong,
RUL prediction based on GAM–CNN for rotating machinery,
2023,
45,
1678-5878,
10.1007/s40430-023-04062-8
22.
Cheng Peng, Jiaqi Wu, Zhaohui Tang, Xinpan Yuan, Changyun Li, Tapan Senapati,
A Spatio-Temporal Attention Mechanism Based Approach for Remaining Useful Life Prediction of Turbofan Engine,
2022,
2022,
1687-5273,
1,
10.1155/2022/9707940
23.
Hongchun Sun, Chenchen Wu, Zunyang Lei,
Uncertainty Measurement of the Prediction of the Remaining Useful Life of Rolling Bearings,
2022,
5,
2572-3901,
10.1115/1.4054392
24.
Luca Biggio, Alexander Wieland, Manuel Arias Chao, Iason Kastanis, Olga Fink,
Uncertainty-Aware Prognosis via Deep Gaussian Process,
2021,
9,
2169-3536,
123517,
10.1109/ACCESS.2021.3110049
25.
Asefeh Asemi, Andrea Ko, Adeleh Asemi,
Infoecology of the deep learning and smart manufacturing: thematic and concept interactions,
2022,
40,
0737-8831,
994,
10.1108/LHT-08-2021-0252
26.
Narjes Davari, Bruno Veloso, Gustavo de Assis Costa, Pedro Mota Pereira, Rita P. Ribeiro, João Gama,
A Survey on Data-Driven Predictive Maintenance for the Railway Industry,
2021,
21,
1424-8220,
5739,
10.3390/s21175739
27.
Amgad Muneer, Shakirah Mohd Taib, Sheraz Naseer, Rao Faizan Ali, Izzatdin Abdul Aziz,
Data-Driven Deep Learning-Based Attention Mechanism for Remaining Useful Life Prediction: Case Study Application to Turbofan Engine Analysis,
2021,
10,
2079-9292,
2453,
10.3390/electronics10202453
Mantas Landauskas, Loreta Saunoriene, Minvydas Ragulskis,
2021,
Multiscale Diversity Index for RUL Analysis with Bernstein Polynomial Neural Networks,
9781450389839,
517,
10.1145/3459104.3459188
30.
Xianli Liu, Shaoyang Liu, Xuebing Li, Bowen Zhang, Caixu Yue, Steven Y. Liang,
Intelligent tool wear monitoring based on parallel residual and stacked bidirectional long short-term memory network,
2021,
60,
02786125,
608,
10.1016/j.jmsy.2021.06.006
31.
Ibrahem M. A. Ibrahem, Ouassima Akhrif, Hany Moustapha, Martin Staniszewski,
An Ensemble of Recurrent Neural Networks for Real Time Performance Modeling of Three-Spool Aero-Derivative Gas Turbine Engine,
2021,
143,
0742-4795,
10.1115/1.4051112
32.
Jichao Zhuang, Minping Jia, Yifei Ding, Peng Ding,
Temporal convolution-based transferable cross-domain adaptation approach for remaining useful life estimation under variable failure behaviors,
2021,
216,
09518320,
107946,
10.1016/j.ress.2021.107946
33.
Khanh T. P. Nguyen, Kamal Medjaher, Do T. Tran,
A review of artificial intelligence methods for engineering prognostics and health management with implementation guidelines,
2022,
0269-2821,
10.1007/s10462-022-10260-y
34.
Xuguo Yan, Xuhui Xia, Lei Wang, Zelin Zhang,
A Cotraining-Based Semisupervised Approach for Remaining-Useful-Life Prediction of Bearings,
2022,
22,
1424-8220,
7766,
10.3390/s22207766
35.
Amare Desalegn Fentaye, Valentina Zaccaria, Konstantinos Kyprianidis,
Aircraft Engine Performance Monitoring and Diagnostics Based on Deep Convolutional Neural Networks,
2021,
9,
2075-1702,
337,
10.3390/machines9120337
36.
Cheng Peng, Jiaqi Wu, Qilong Wang, Weihua Gui, Zhaohui Tang,
Remaining Useful Life Prediction Using Dual-Channel LSTM with Time Feature and Its Difference,
2022,
24,
1099-4300,
1818,
10.3390/e24121818
37.
Yuxin Wen, Md. Fashiar Rahman, Honglun Xu, Tzu-Liang Bill Tseng,
Recent advances and trends of predictive maintenance from data-driven machine prognostics perspective,
2022,
187,
02632241,
110276,
10.1016/j.measurement.2021.110276
38.
Kürşat İnce, Yakup Genc,
Joint autoencoder-regressor deep neural network for remaining useful life prediction,
2023,
41,
22150986,
101409,
10.1016/j.jestch.2023.101409
39.
Ye Zhu, Bo Xu, Zhenjie Luo, Zhiqiang Liu, Hao Wang, Chenglie Du,
2022,
Prediction method of turbine engine RUL based on GA-SVR,
978-1-6654-5087-4,
1,
10.1109/AICIT55386.2022.9930303
40.
Qiankun Hu, Yongping Zhao, Lihua Ren,
Novel Transformer-based Fusion Models for Aero-engine Remaining Useful Life Estimation,
2023,
2169-3536,
1,
10.1109/ACCESS.2023.3277730
41.
Suleyman Yildirim, Zeeshan A. Rana,
Enhancing Aircraft Safety through Advanced Engine Health Monitoring with Long Short-Term Memory,
2024,
24,
1424-8220,
518,
10.3390/s24020518
Lin Song, Jun Wu, Liping Wang, Guo Chen, Yile Shi, Zhigui Liu,
Remaining Useful Life Prediction of Rolling Bearings Based on Multi-Scale Attention Residual Network,
2023,
25,
1099-4300,
798,
10.3390/e25050798
44.
Yasunari Matsuzaka, Yoshihiro Uesawa,
Computational Models That Use a Quantitative Structure–Activity Relationship Approach Based on Deep Learning,
2023,
11,
2227-9717,
1296,
10.3390/pr11041296
45.
Jun Guo, Dapeng Li, Baigang Du,
A stacked ensemble method based on TCN and convolutional bi-directional GRU with multiple time windows for remaining useful life estimation,
2024,
150,
15684946,
111071,
10.1016/j.asoc.2023.111071
46.
Gyeongho Kim, Jae Gyeong Choi, Sunghoon Lim,
Using transformer and a reweighting technique to develop a remaining useful life estimation method for turbofan engines,
2024,
133,
09521976,
108475,
10.1016/j.engappai.2024.108475
47.
Hairui Wang, Dongwen Li, Dongjun Li, Cuiqin Liu, Xiuqi Yang, Guifu Zhu,
Remaining Useful Life Prediction of Aircraft Turbofan Engine Based on Random Forest Feature Selection and Multi-Layer Perceptron,
2023,
13,
2076-3417,
7186,
10.3390/app13127186
48.
F Gougam, A Afia, MA Aitchikh, W Touzout, C Rahmoune, D Benazzouz,
Computer numerical control machine tool wear monitoring through a data-driven approach,
2024,
16,
1687-8132,
10.1177/16878132241229314
49.
Safa Ben Ayed, Roozbeh Sadeghian Broujeny, Rachid Tahar Hamza,
Remaining Useful Life Prediction with Uncertainty Quantification Using Evidential Deep Learning,
2025,
15,
2449-6499,
37,
10.2478/jaiscr-2025-0003
50.
Yucheng Wang, Min Wu, Ruibing Jin, Xiaoli Li, Lihua Xie, Zhenghua Chen,
Local–Global Correlation Fusion-Based Graph Neural Network for Remaining Useful Life Prediction,
2025,
36,
2162-237X,
753,
10.1109/TNNLS.2023.3330487
51.
Josue Luiz Dalboni da Rocha, Jesyin Lai, Pankaj Pandey, Phyu Sin M. Myat, Zachary Loschinskey, Asim K. Bag, Ranganatha Sitaram,
Artificial Intelligence for Neuroimaging in Pediatric Cancer,
2025,
17,
2072-6694,
622,
10.3390/cancers17040622
52.
Qian Zhao, Dian Zhang, Xiang Jia, Bo Guo, Fengchen Qian,
2024,
Remaining Useful Life Prediction for Complex Systems by Fusing Multi-Level Information,
979-8-3503-5401-0,
1,
10.1109/PHM-Beijing63284.2024.10874767
53.
Chengying Zhao, Jiajun Wang, Fengxia He, Xiaotian Bai, Huaitao Shi, Jialin Li, Xianzhen Huang,
A fatigue life prediction method based on multi-signal fusion deep attention residual convolutional neural network,
2025,
235,
0003682X,
110646,
10.1016/j.apacoust.2025.110646
54.
M.A. Benatia, M. Hafsi, S. Ben Ayed,
A continual learning approach for failure prediction under non-stationary conditions: Application to condition monitoring data streams,
2025,
204,
03608352,
111049,
10.1016/j.cie.2025.111049
Figure 1. The chemical structures of (a) 4, 4'-oxydiphthalic anhydride; (b) 2, 2-bis[4-(4-aminophenoxy)phenyl] hexafluoropropane as the starting materials; (c) the synthesized poly amic acid precursor; and (d) the colorless polyimide from the thermal imidization process of c
Figure 2. Cross sectional SEM images of porous polyimide films prepared through the immersion precipitation method using water bath with different contents of N, N-dimethylacetamide (DMAc) of (a) 0 vol%, (b) 16 vol%, (c) 32 vol%, and (d) 48 vol%. Insets are the corresponding top-down SEM images
Figure 3. Measured AFM images of porous polyimide films prepared through the immersion precipitation method using water bath with different contents of N, N-dimethylacetamide (DMAc) of (a) 0 vol%, (b) 16 vol%, (c) 32 vol%, and (d) 48 vol%, where the scan size was fixed at 5 µm × 5 µm
Figure 4. (a) measured total transmittance, (b) measured diffuse transmission, and (c) calculated haze spectra for the prepared porous polyimide scattering films on glass substrates using water bath with different contents of N, N-dimethylacetamide (DMAc) of 0 vol% (rectangular), 16 vol% (rhombus), 32 vol% (triangle), and 48 vol% (circle), (d) pictures of prepared porous scattering films on glass
Figure 5. Schematic images of proposed optical scattering mechanisms in a porous polymer layer: (a) light scattering happens only inside pore-structure, (b) light scattering happens both inside pore-structure and at the rough surface, and (c) light scattering happens only at the rough surface