
Citation: Tracey Cockerton Tattersall, Nadja Rolli, Martin Butwell. Integrative child psychotherapy: discussion of a common core and unified theory approach[J]. AIMS Medical Science, 2024, 11(2): 181-209. doi: 10.3934/medsci.2024015
[1] | Pingping Sun, Yongbing Chen, Bo Liu, Yanxin Gao, Ye Han, Fei He, Jinchao Ji . DeepMRMP: A new predictor for multiple types of RNA modification sites using deep learning. Mathematical Biosciences and Engineering, 2019, 16(6): 6231-6241. doi: 10.3934/mbe.2019310 |
[2] | Ruiping Yuan, Jiangtao Dou, Juntao Li, Wei Wang, Yingfan Jiang . Multi-robot task allocation in e-commerce RMFS based on deep reinforcement learning. Mathematical Biosciences and Engineering, 2023, 20(2): 1903-1918. doi: 10.3934/mbe.2023087 |
[3] | Quan Zhu, Xiaoyin Wang, Xuan Liu, Wanru Du, Xingxing Ding . Multi-task learning for aspect level semantic classification combining complex aspect target semantic enhancement and adaptive local focus. Mathematical Biosciences and Engineering, 2023, 20(10): 18566-18591. doi: 10.3934/mbe.2023824 |
[4] | Xiao Chu, Weiqing Wang, Zhaoyun Sun, Feichao Bao, Liang Feng . An N6-methyladenosine and target genes-based study on subtypes and prognosis of lung adenocarcinoma. Mathematical Biosciences and Engineering, 2022, 19(1): 253-270. doi: 10.3934/mbe.2022013 |
[5] | Sungwon Kim, Meysam Alizamir, Youngmin Seo, Salim Heddam, Il-Moon Chung, Young-Oh Kim, Ozgur Kisi, Vijay P. Singh . Estimating the incubated river water quality indicator based on machine learning and deep learning paradigms: BOD5 Prediction. Mathematical Biosciences and Engineering, 2022, 19(12): 12744-12773. doi: 10.3934/mbe.2022595 |
[6] | Jianhua Jia, Lulu Qin, Rufeng Lei . DGA-5mC: A 5-methylcytosine site prediction model based on an improved DenseNet and bidirectional GRU method. Mathematical Biosciences and Engineering, 2023, 20(6): 9759-9780. doi: 10.3934/mbe.2023428 |
[7] | Huili Yang, Wangren Qiu, Zi Liu . Anoikis-related mRNA-lncRNA and DNA methylation profiles for overall survival prediction in breast cancer patients. Mathematical Biosciences and Engineering, 2024, 21(1): 1590-1609. doi: 10.3934/mbe.2024069 |
[8] | Mingshuai Chen, Xin Zhang, Ying Ju, Qing Liu, Yijie Ding . iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM. Mathematical Biosciences and Engineering, 2022, 19(12): 13829-13850. doi: 10.3934/mbe.2022644 |
[9] | Xuwen Wang, Yu Zhang, Zhen Guo, Jiao Li . Identifying concepts from medical images via transfer learning and image retrieval. Mathematical Biosciences and Engineering, 2019, 16(4): 1978-1991. doi: 10.3934/mbe.2019097 |
[10] | Long Wen, Liang Gao, Yan Dong, Zheng Zhu . A negative correlation ensemble transfer learning method for fault diagnosis based on convolutional neural network. Mathematical Biosciences and Engineering, 2019, 16(5): 3311-3330. doi: 10.3934/mbe.2019165 |
It has been demonstrated that more than 170 types of RNA modifications are reported within a diverse set of RNAs [1], including m6A, adenosine to inosine (A-to-I) deamination, cytosine to uracil (C-to-U) deamination, N1-methyladenosine (m1A), 5-methylcytosine (m5C), pseudouridylation (Ψ), and ribose 2'O-methylation [2], etc. There is a growing list of RNA modifications found in both coding and non-coding RNAs, significantly influencing their biological functions [3]. These modifications frequently result in changes to RNA stability, folding, interactions, translation, localization, and subsequent processing, thereby impacting their biological function [4,5]. However, insights into the molecular machineries responsible for the deposition and removal, as well as recognition and interpretation, of these modifications within the cell are available for only a few modifications [6]. Even for those modifications for which writers, erasers, and readers have been identified [7], such as m6A [8], we have limited knowledge about their regulation, their cooperation or competition with other RNA modification and processing events, and how they become deregulated in disease [9]. Therefore, the accurate identification of m6A sites is a crucial step in understanding the mechanisms underlying these biological phenomena [9].
To date, several experimental methods have been developed to localize the m6A site. High-throughput sequencing technologies have been successfully applied to detect m6A sites in various species, including Saccharomyces cerevisiae [10], Homo sapiens [11,12], Arabidopsis thaliana [13], and mouse [14]. However, it is important to note that most high-throughput sequencing techniques cannot quickly obtain and precisely pinpoint the exact location of the m6A site [15]. The m6A motif 'RRACH' is often used to further narrow the location of the m6A site to a basic resolution within the peak detected with the m6A signal. Other experimental techniques, such as miCLIP-seq [16], can identify m6A sites at the single nucleotide resolution level. However, these methods rely on m6A-specific antibodies, exhibit poor reproducibility are long and involve complex procedures [17], making them unsuitable for large-scale genomic data analysis. Hence, there is a strong motivation to explore computational methods that can accurately and efficiently identify methylation sites.
Researchers have developed a variety of computational methods to predict RNA modification sites, which serve as invaluable complements to experimental approaches [18]. These methods approach RNA methylation identification as a binary prediction task [19], training machine learning models to differentiate between truly methylated and unmethylated sites. By leveraging these computational methods, we can quickly predict whether this sequence contains RNA methylation sites. The traditional computational method involves extracting a comprehensive set of hand-designed features from biological sequences [20]. These features are then fed into classical shallow classification algorithms [21], such as a support vector machine [22], often utilizing a linear kernel. However, the selection of these features is typically an empirical process, relying on trial and error [23]. Moreover, the feature selection itself is highly task-dependent, necessitating additional research for each new predictive task [24].
It is evident that analyzing biological sequences and interpreting the underlying biological information pose significant challenges in the realization of biological breakthrough discoveries. Recently, the application of natural language processing (NLP) in sequence analysis has garnered considerable attention within the realm of biological sequence processing [25]. This approach considers biological sequences as sentences [21,26], while k-mer subsequences are akin to words [25,27]; NLP has emerged as a valuable approach for unraveling the structure and function encoded within these sequences [28,29]. In contrast to traditional machine learning methods, deep learning (DL) techniques offer an end-to-end design [30]. The input sentence undergoes a series of feature extraction layers, with the deep layers of the network automatically learning features that are relevant to the task through backpropagation [31]. The journey from raw data to the ultimate output entails the extraction of features derived directly from the input data, honing in on the crucial aspects for the final identification or prediction tasks. This intricate process involves sifting through the raw information, meticulously selecting the most pertinent characteristics, and transforming them into more meaningful representations. These extracted features serve as the bedrock of the entire analytical process, empowering the algorithms and models to discern patterns and relationships, which is essential for achieving accurate and insightful conclusions. Through this transformative journey, the system gains the ability to decipher complex information and deliver informed decisions, ultimately driving successful outcomes [32]. For example, EDLm6Apred [32] applies bidirectional long short-term memory (BiLSTM) to predict m6A site through the use of word2vec [33], RNA word embedding [34] and one-hot [35] encoding. However, long short-term, BiLSTM and recurrent neural networks do not allow parallel computation [36], which results in long training times [37]. Convolutional neural network (CNN) has the ability to achieve parallel computation [38] and learn local dependencies [39]. For example, in the case of m6A-word2vec [40], a CNN is employed to identify m6A sites by extracting features based on word2vec. Similarly, Deeppromise [41] utilizes a CNN to identify m1A and m6A sites, extracting features through integrated enhanced nucleic acid composition [42,43], one-hot encoding, and RNA word embedding. However, these CNN structures primarily focus on contextual relationships among neighboring bases [44], without taking into account the dependencies over long distances within the sequence [45]. To address this limitation, DeepM6ASeq [46] combines the strengths of CNNs and BiLSTM by incorporating two layers of a CNN and one layer of BiLSTM to predict m6A sites. While this approach can be effective, it may extract redundant features that can interfere with prediction performance [47]. To quantify the degree of word-to-word dependency, the attention mechanism comes into play. By applying the attention mechanism, it becomes possible to capture the specific words that significantly impact the classification results. MultiRM [48], on the other hand, employs a BiLSTM layer and a Bahdanau attention [49] layer to identify various types of RNA modification sites and extract features based on word2vec encoding. In this case, Bahdanau attention calculates the attention weights for two words in different sentences. However, since Google introduced the transformer model in 2018, self-attention has been recognized as a special case of the attention mechanism [50,51]. The Transformer model, based entirely on self-attention mechanisms, has become the most widely used architecture in NLP representation learning, as demonstrated by its adoption in various applications [52]. Among them, Plant6mA [53], a transformer encoder, can be employed to identify whether the input sequence contains an m6A site. However, in the context of the transformer model, positional encoding plays a vital role, as other key components of the model are completely invariant to the order of the sequence. The original transformer employs absolute positional encoding, assigning each position a unique embedding vector [50]. By adding the positional embedding to the word embedding, the model affords valuable insights into the contextual representations of words at different positions. In addition to absolute positional encoding, Shaw et al. [54] and Raffel et al. [55]. have introduced relative positional encoding. This innovative technique incorporates carefully designed bias terms within the self-attention module, enabling the encoding of the distance between any two positions. However, it has been demonstrated by Ke et al. [56] that the additional operation applied to positional encoding and word embeddings in absolute positional encoding can introduce mixed correlations and unnecessary randomness in the attention mechanism. This may limit the expressive power of the model, potentially impacting its performance. Furthermore, the feed-forward networks within the transformer structure struggle to effectively capture contextual information. Since position-wise feed-forward networks process each position independently, they lack the capacity to adequately capture global contextual information. Consequently, the model may face difficulties in accurately comprehending long-term dependencies or recognizing global patterns within the sequence.
As m6A is the most prevalent modification observed in mammals, numerous methods have been developed to predict m6A sites in Saccharomyces cerevisiae. However, these methods [57,58,59,60,61] have primarily relied on a small dataset consisting of only 1307 m6A sites, as derived from base-resolution sequencing. Unfortunately, the limited size of this dataset has hindered the full utilization of the advantages offered by DL methods [62]. However, RMBase [63,64] and m6A-Atlas [65] respectively document over 60,000 low-resolution and 10,000 high-resolution m6A sites in Saccharomyces cerevisiae. Regarding the relatively novel m1A site prediction, it also encounters the above problems. Many methods for predicting m1A are primarily based on a smaller dataset containing only 707 human m1A sites with base-resolution sequencing. Correspondingly, RMBase has records of more than 2000 low-resolution human m1A sites. Huang et al. [66] proposed WeakRM, the first weakly supervised learning framework for predicting RNA modifications from low-resolution epitranscriptome datasets, such as those generated from acRIP-seq and hMeRIP-seq. Astonishingly, these extensive datasets have not been fully leveraged for the development of computational methods in this context. In most scenarios, our primary concern revolves around achieving optimal performance on one task, which requires the training of a single model or an ensemble of models to perform our desired task, as well as the fine-tuning and optimization of models. Through this process, we continuously iterate and refine these models until they reach a point where their performance plateaus. While this approach often yields acceptable results, by focusing on a single task, it tends to overlook valuable information that could potentially enhance our desired metrics. Specifically, that information comes from the training signals derived from related tasks. By leveraging shared representations among these related tasks, we can empower our model to generalize better and improve its performance on the original task.
The corresponding the number of supporting experiments or studies (NSES) [63] information for the methylation sites in RMBase may be the key information mentioned above that is ignored. Intuitively, the larger the number of experimental identifications of a methylation site, the greater our confidence in considering it as a genuine methylation site. Currently, the exploration of multi-task prediction for methylation sites incorporating the NSES information is still in its early stages. An example of such an algorithm is MTDeepM6A-2S [67], which entails the use of the NSES information in the construction of a multi-task model based on a combination of a CNN and BiLSTM deep framework. This model was designed for the prediction of base-resolution m6A sites. However, one limitation of the BiLSTM component lies in its sequential nature, where computations are executed step-by-step. As a result, the computational speed tends to be slower, and it becomes challenging to capture distant dependencies and global contextual information within the sequence. These factors can potentially limit the model's ability to effectively understand long-range relationships and extract comprehensive contextual features. Therefore, it is essential to assign relatively large attention weights to the vital information. While MTDeepM6A-2S represents a significant advancement in incorporating NSES information into the multi-task prediction of methylation sites, there is still room for further improvement. Addressing the limitations associated with the sequential computations of BiLSTM and enhancing the capture of remote dependencies and global contextual information remain important areas for future research in this field.
To address the limitations of existing models, we drew inspiration from the multi-stage post-calibration determinations used for high-resolution m6A site identification and the concept of multi-task learning. As a result, we propose A multi-task transfer learning approach for base-resolution mRNA m6A site prediction based on an improved transformer (MTTLm6A). In the initial stage, known as the source domain-stage, by using the NSES information for multi-task learning, we have improved the performance of the model in terms of the ability to detect low-resolution m6A sites by optimizing the transformer model structure, specifically, the structure applies the double multi-head-attention (multi-head-attention+multi-head-attention) mechanism, which assigns relatively large attention weights to the critical information to intensify it. In the target domain-stage, considering the similarity between the classification tasks in both stages, we have transferred the weights of specific layers and deep networks from the model trained in the source domain-stage to the model in the target domain-stage to predict m6A sites at base-resolution. Experimental results on Saccharomyces cerevisiae m6A and Homo sapiens m1A data demonstrate that MTTLm6A achieves area under the receiver operating characteristic (AUROC) values of 77.13% and 92.9%, outperforming the state-of-the-art models. At the same time, it shows that the model has strong generalization ability. To enhance user convenience, we have made a user-friendly web server for MTTLm6A publicly available at http://47.242.23.141/MTTLm6A/index.php.
We extracted datasets of two major types of RNA modification sites, including m1A and m6A, from the RMBase v2.0. For the m1A sites, we collected low-resolution m1A sites of Homo sapiens from the extensive database RMBase v2.0, in which 2574 m1A sites have been recorded. The RNA segments with upstream and downstream nucleotides were obtained from the genome. Negative sites (non-modified nucleotides) were randomly selected from the unmodified bases of the same transcript containing the positive sites. The negative samples were down-sampled and cut short to match the number and size of the positive samples. To avoid overfitting, CD-HIT [68] was used with a threshold of 0.7 to remove redundant segments. The redundancies of positive and negative samples were removed. Thus, we got 1987 positive samples and 2249 negative samples. To obtain a balanced dataset, 1987 negative samples were randomly selected to build the final dataset. For the second-stage model, we collected base-resolution m1A sites of Homo sapiens from DeepPromise [69] as positive samples. Consequently, 593 training samples and 114 test samples were obtained. Because the second-stage model is used to identify base-resolution m1A sites from low-resolution m1A sites, we used the low-resolution m1A sites recorded in RMBase 2.0 as negative samples in the current study. Therefore, based on the above 1987 positive samples, 707 (593 + 114) positive samples were randomly assigned to the second-stage as negative samples; and the remaining samples were divided into training sets and independent test sets at a ratio of 4:1 for the first stage model.
For the m6A sites, the dataset was derived from the low-resolution m6A sites previously described by Wang et al. [67]. This dataset contains a total of 24,669 m6A sites. Within these segments, two distinct central motif patterns exist, i.e., AAC and GAC. Notably, the existing methods for predicting m6A sites in Saccharomyces cerevisiae were developed by using the Met2614 dataset [57], which only includes the GAC central motif. To ensure a comprehensive analysis, we divided the original RNA segments into two parts: one containing segments with the GAC central motif and the other containing segments with the AAC central motif. The number of segments with the AAC central motif was 13,732, while the number of segments with the GAC central motif was 10,937. The ratio of positive to negative samples in both datasets was 1:1. Subsequently, the datasets were randomly split into benchmark and independent test datasets at a 4:1 ratio. This resulted in the AAC benchmark dataset containing 10,985 positive and negative samples, and the independent test dataset containing 2747 positive and negative samples. Similarly, the GAC benchmark dataset consisted of 8749 positive and negative samples, and the independent test dataset contained 2188 positive and negative samples. Referring to the experimental results in Chen's [69] and Wang's [67] paper, the sizes of the optimal window were respectively set as 101nt and 601nt for m1A and m6A sites.
Furthermore, in RMBase v2.0, the NSES value associated with each m6A site recorded in the database was used as the target for a regression task. The NSES indicates the number of experimental confirmations for the corresponding adenine being modified [63]. In simpler terms, a higher NSES value suggests a higher level of certainty regarding the authenticity of the m6A modification at that specific site. The distribution of NSES within the m6A dataset is depicted in Figure 1.
For the target domain-stage model, the positive samples were obtained from the base-resolution m6A sites of Saccharomyces cerevisiae from m6A-Atlas [65]; 4689 m6A sites were obtained, and they are all with GAC in the central motif. On the other hand, the negative samples were selected from the low-resolution m6A sites with GAC in the central motif, as recorded in RMBase v2.0. The ratio of positive to negative samples in both datasets was set at 1:1 to ensure a balanced training environment. Similar to the source domain-stage model, the datasets were randomly divided into benchmark datasets and independent test datasets at a 4:1 ratio. The statistics of the datasets are shown in Table 1.
Site | Species | Stage | Datasets | Window size | Number of positive | Number of negative |
m6A | Saccharomyces cerevisiae | the source domain-stage | AAC_BM | 601 | 10,985 | 10,985 |
AAC_IND | 601 | 2747 | 2747 | |||
GAC_BM | 601 | 8749 | 8749 | |||
GAC_IND | 601 | 2188 | 2188 | |||
the target domain-stage | GAC_hr_BM | 601 | 3751 | 3751 | ||
GAC_hr_IND | 601 | 938 | 938 | |||
m1A | Homo sapiens | the source domain-stage | BM | 101 | 1024 | 1024 |
IND | 101 | 256 | 256 | |||
the target domain-stage | hr_BM | 101 | 593 | 593 | ||
hr_IND | 101 | 114 | 114 | |||
BM benchmark; IND independent |
In the development of highly accurate computational methods, the features of sequence data play a crucial role. Suppose that we have raw data R0={xm}Mm=1, where M is the number of sequences and each xm∈Rl0 is an RNA sequence. Each entry xmi,i=1,2,3,4,...,l0 at position i takes its value from the alphabet ∑={A,U,C,G,N} from a sequence of constant length l0.
One widely used and effective encoding method is one-hot encoding, which provides a simple yet powerful approach to representing the RNA sequences. In this method, the four nucleotides (A, U, C, G) are encoded as binary vectors: A = (1, 0, 0, 0), U = (0, 1, 0, 0), C = (0, 0, 1, 0), G = (0, 0, 0, 1), and N = (0, 0, 0, 0), representing unknown or ambiguous positions. After that, R0={xm}Mm=1, where each xm∈Rl0×4 is an RNA sequence. By applying this encoding scheme, a sequence of 601 nucleotides is transformed into a matrix of 601 × 4.
We have devised a multi-task transfer learning approach for base-resolution mRNA m6A site prediction based on an improved transformer. The structure of the model is shown in Figure 2, and it is divided into the target source stage and the target domain-stage. The details are as follows.
The conventional approaches employed for the prediction of RNA m6A sites have primarily relied on single-task learning methods for classification. In contrast, we have adopted a novel multi-task architecture in the construction of the source domain model. Our aim is to enhance the classification results and realize a reasonable confidence value. To achieve this, we constructed a regression task by using the invaluable NSES information retrieved from RMBase v2.0. This regression task allows us to assign a confidence score to the classification results, thereby enhancing their interpretability and overall reliability. The feature encoding sequences were fed into the CNN layer in order to capture sequence patterns or motifs; the mathematical formulation of the CNN model is given below:
Conv(R)jf=ReLU(D−1∑d=0N−1∑n=0WfdnRj+d,n) | (1) |
where R denotes the input matrix, f represents the index of the kernel, and j represents the index of the output position; each filter Wf is a D×N weight matrix, where D is the filter size, and N is the input channels; Rl0×4↦Cl×d,l=l0 - f + 1.
Since convolution contains the order of the sequence, to avoid the occurrence of mixed correlations that may arise if the model is connected to a positional encoding layer in the transformer, we intentionally omit the positional encoding layer and directly link it to the multi-head-attention mechanism, the calculation of attention in this module can be divided into three steps.
In the first step, linearly transform the output matrix X of the CNN layer and divide it into three matrices, as follows
Q=XWq,K=XWk,V=XWv | (2) |
where X∈Cl×d,l=l0 - f + 1; then, three learnable matrices, Wq,Wk and Wv are used to project X into different spaces. Usually, the matrix size of each of the three matrices is Cd×dk, where dk is a hyper-parameter.
In the second step, the scaled dot product attention can be calculated by using the following equations:
Am,n=QmKTn | (3) |
Attention(Q,K,V)=softmax(A√dk)V | (4) |
where Qm is the query vector for the m-th token and Kn is the key vector representation of the nth token. The Softmax function is applied along the last dimension. Instead of using one group of Wq,Wk,Wv, using several groups will enhance the ability of self-attention.
In the third step, when several groups are used, it is called multi-head self-attention; the calculation can be formulated as follows:
Q(h)=XWq(h),K(h)=XWk(h),V(h)=XWv(h) | (5) |
head(h)=Attention(Q(h),K(h),V(h)) | (6) |
MultiHead(Q,K,V)=Concat(head(1),head(2),...,head(i))Wo | (7) |
where i is the number of heads and the superscript h represents the head index. Usually dk × n = d, which means that the output of [head(1),head(2),...,head(i)] will be of size Cl×d. Also note that Wo∈Cd×d, which is a learnable parameter.
Furthermore, to more effectively capture contextual information, we deliberately replaced the feed-forward layer in the transformer structure with the multi-head-attention mechanism layer. This choice empowers the model to assign greater attention weight to key information, thereby reinforcing its significance. At the heart of the source domain-stage lies the primary objective of classifying low-resolution m6A sites and accurately distinguishing them from non-m6A sites. This pivotal step forms the foundation for analyses and investigations in the subsequent target domain-stage.
Building upon the multi-task model training obtained in the source domain-stage, we progress to the target domain-stage. In this stage, we employ a transfer learning strategy to train the target domain-stage model and focus on identifying both base-resolution m6A sites and low-resolution m6A sites.
During the source domain-stage, our model takes RNA segment sequences and the NSES information as input; the RNA segment sequences are then transformed into numerical matrices by the one-hot encoding process, as shown in Figure 2. These numerical matrices are then fed into the deep network, which consists of a CNN and double-multi-head-attention, referred to as CNN+MM.
The CNN uses a 1D convolutional layer to extract local features from the input matrices. To optimize the hyperparameters, we employed a grid-search strategy. In this stage, we used 16 convolutional kernels, each with a size of 10. Subsequently, the output of the CNN stage is normalized with a group normalization layer, where the number of groups was set to 4. The multi-head-attention stage consists of two multi-head-attention networks. One attention mechanism has two heads, with each head having a size of 8 (d_model = 8). The other attention mechanism has four heads, with each head having a size of 16 (d_model = 16). To promote effective information flow, we incorporated a dropout layer and a residual connection around each of the two sub-layers. The dropout rate was set at 0.1 to reduce overfitting, and layer normalization was applied subsequently. Following the multi-head-attention stage, an AveragePooling1D layer is applied to reduce the dimensionality of the extracted features. The kernel size of the 1D pooling layer was set to 15. Subsequently, the data were flattened into a 1D form by using a flattening layer. This is followed by a dropout layer and a fully connected layer. The dropout rate was set at 0.6, and the fully connected layer was set to comprise 64 neurons activated by the exponential linear unit (ELU) function.
The output layer of our model consisted of two outputs, catering to the classification and regression tasks, respectively. The estimated loss is made up of classification loss and regression loss. The following calculation is then used to get the total loss:
lossmultitask=lossclassification+λlossregression | (8) |
where lossclassification is the classification loss, lossregression is the regression loss, and λ can be set arbitrarily according to specific circumstances. For the classification task, the Softmax activation function was employed, and the categorical cross-entropy was specified as the loss function. For the regression task, the activation function ELU was used, with the log-cosh employed as the loss function. Therefore, the overall loss function for the entire multi-task model can be expressed as follows:
lossmultitask=−1NN∑i=1(yclassilogpclassi+(1−yclassi)log(1−pclassi))+λN∑i=1log(cosh(yregressioni−pregressioni)) | (9) |
where N denotes the total number of samples in the dataset and yclassi and pclassi denote the true label and prediction probability of the ith sample of the classification task, respectively. Similarly, yregressioni and pregressioni denoted the true label and prediction probability of the ith sample of the regression task, respectively. The total loss function is optimized by using a grid search with the weight parameter λ set to 0.6. This loss function leverages the label information from the regression task to potentially enhance the prediction accuracy of the classification task.
Finally, the stochastic gradient descent (SGD) optimization algorithm is used with the momentum set to 0.95 and a learning rate of 0.01. SGD is a widely adopted optimization algorithm known for its effectiveness in iteratively adjusting the model's parameters during the training process to minimize the loss function.
To construct the target domain-stage model, we used transfer learning by transferring the feature extraction layers from the source domain-stage model. This approach was motivated by the similarity between the classification tasks in both stages.
During the transfer learning process, we initialized the parameters of the target domain-stage model by using the feature extraction layers from the source domain-stage model. This initialization includes all layers except the output layer, ensuring that the model starts with valuable learned representations. By inheriting the corresponding weights, we have provided a strong foundation for the target domain model.
Subsequently, we optimized all of the weights of the target domain-stage model during training without freezing any layers. This allowed the model to adapt and refine its parameters based on the target domain's specific characteristics and data. By performing transfer learning in this way, we aimed to capitalize on the knowledge and patterns learned in the source domain, ultimately enhancing the performance and generalization of the model on the target domain's classification tasks.
In this study, we comprehensively evaluated the performance of our prediction model by using eight commonly used classification metrics. These metrics include the accuracy (Acc), sensitivity (Sen), precision (Pre), Matthews correlation coefficient (MCC), specificity (Sp), and F1 score (F1). The formulas for these metrics are respectively as follows:
Accuracy=TP+TNTP+TN+FP+FN | (10) |
Sensitivity=TPTP+FN | (11) |
Precision=TPTP+FP | (12) |
MCC=TP×TN−FP×FN√(TP+FP)×(TP+FN)×(TN+FP)×(TN+FN) | (13) |
Specificity=TNTN+FP | (14) |
F1 score=2×Precision×RecallPrecision+Recall | (15) |
Additionally, we used the AUROC curve and the area under the precision-recall curve (AUPRC) to visually assess the overall performance of our model. These metrics provide insights into the model's ability to discriminate between different classes and the precision-recall trade-off, respectively. Time (s) indicates the training time of the model per epoch. By considering both the quantitative metrics and the visual evaluation, we gain a comprehensive understanding of the predictive capabilities of our model.
To evaluate the regression task, we used the Pearson correlation coefficient (PCC) as the index. The PCC measures the similarity between the predicted target values (X) and the actual target values (Y) of the samples. It is calculated as follows:
PCC=cov(X,Y)σXσY | (16) |
Here, cov(X,Y) represents the covariance between X and Y, and σX and σY represent the standard deviations of X and Y, respectively. The PCC ranges from –1 to 1, where a value of 0 indicates no correlation.
By applying different values of the loss weight λ of the regression task (i.e., λ = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]), various models, including CNN, CNN+BiLSTM (CN-N+BiL), CNN+transformer (CNN+TF), CNN+multi-head-attention+feed-forward (CNN+MF), and CNN+multi-head-attention+multi-head-attention (CNN+MM) models, were evaluated by using 5-fold cross-validation based on a benchmark dataset. Among them, the CNN+MF model directly connects the multi-head-attention mechanism layer and the feed-forward layer without connecting the positional encoding layer. On the other hand, the CNN+MM model replaces the feed-forward layer with the multi-head-attention mechanism layer based on the CNN+MF architecture. The optimal performance and the corresponding loss weight value of each model are shown in Table 2 and Figure 3. The results show that the CNN+MM model achieved the highest AUROC and AUPRC scores for both GAC_BM and AAC_BM datasets. The specific analysis is as follows:
Datasets | Classifiers | AUROC | ACC (%) | Sen (%) | Pre (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC | PCC | Time (s) |
AAC_BM | CNN (λ = 1) | 0.8507 | 77.64 | 78.58 | 77.13 | 55.40 | 76.69 | 77.85 | 0.8382 | 0.5844 | 1 |
CNN+BiL (λ = 1) | 0.8767 | 79.97 | 82.61 | 78.47 | 60.10 | 77.33 | 80.49 | 0.8607 | 0.6137 | 136 | |
CNN+TF (λ = 1) | 0.8700 | 79.01 | 84.65 | 76.08 | 58.54 | 73.38 | 80.13 | 0.8535 | 0.5986 | 4 | |
CNN+MF (λ = 1) | 0.8768 | 79.75 | 83.99 | 77.43 | 59.77 | 75.51 | 80.58 | 0.8627 | 0.6040 | 4 | |
CNN+MM (λ = 0.6) | 0.8793 | 79.57 | 80.94 | 79.06 | 59.98 | 78.56 | 79.99 | 0.8636 | 0.6140 | 25 | |
GAC_BM | CNN (λ = 0.4) | 0.8506 | 76.95 | 77.24 | 76.80 | 54.66 | 76.66 | 77.02 | 0.8394 | 0.5923 | 1 |
CNN+BiL (λ = 1) | 0.8742 | 79.61 | 85.46 | 76.51 | 59.71 | 73.75 | 80.74 | 0.8595 | 0.6199 | 108 | |
CNN+TF (λ = 0.4) | 0.8713 | 79.46 | 80.94 | 78.61 | 59.04 | 77.97 | 79.76 | 0.8566 | 0.6005 | 4 | |
CNN+MF (λ = 1) | 0.8745 | 79.61 | 81.19 | 78.70 | 59.35 | 78.03 | 79.93 | 0.8600 | 0.6112 | 4 | |
CNN+MM (λ = 0.6) | 0.8772 | 79.06 | 78.69 | 79.28 | 58.48 | 79.43 | 79.80 | 0.8635 | 0.6156 | 20 | |
Times: Running time per epoch for model training |
First, comparing CNN+TF with the CNN, the AUROC scores of CNN+TF were 1.93% and 2.07% higher on AAC_BM and GAC_BM, respectively, and the AUPRC scores of CNN+TF were 1.53% and 1.72% higher than the values for the CNN. These findings highlight the ability of the CNN+TF model to capture deep semantics from RNA sequences, surpassing the performance of the CNN alone.
Additionally, a comparison was made between the CNN+MF and CNN+TF models. The AUROC scores of CNN+MF were 0.68% and 0.32% higher than those for the CNN+TF on AAC_BM and GAC_BM, respectively, and the AUPRC scores of CNN+MF were 0.92% and 0.34% higher on AAC_BM and GAC_BM, respectively. This reason may be attributed to the mixed correlations between positional encoding and word embeddings in the CNN+TF model, which introduce unnecessary randomness to the attention mechanism and limit the model's expressiveness. Therefore, CNN+MF affords performance improvement, since the CNN+MF model directly connects the multi-head-attention mechanism layer without connecting the positional encoding layer.
Third, the study compared the performance of the CNN+MM and CNN+MF models. The AUROC scores of CNN+MM were 0.25% and 0.27% higher than the CNN+MF on AAC_BM and GAC_BM, respectively, and the AUPRC scores of CNN+MM were 0.09% and 0.35% higher on AAC_BM and GAC_BM, respectively. This disparity could be attributed to the ineffective capture of contextual information by the feed-forward networks in the transformer structure, as feed-forward networks lack the ability to effectively capture global contextual information, resulting in a less accurate understanding of long-term dependencies or global patterns in the sequence. The CNN+MM model replaces the feed-forward layer with the multi-head-attention mechanism layer based on the CNN+MF architecture. This modification allows the model to capture more complex and fine-grained features within the input sequence.
Fourth, a comparison was made between the CNN+MM and CNN+BiL models. The AUROC scores of CNN+MM were 0.26% and 0.3% higher than those for CNN+BiL on AAC_BM and GAC_BM, respectively, while the AUPRC scores of CNN+MM were 0.29% and 0.39% higher on AAC_BM and GAC_BM, respectively. The improved performance of CNN+MM may be attributed to the inclusion of multi-head-attention, which excels at capturing long-range dependencies and global contextual information. This allows the model to better understand relationships and important semantic connections within the input sequence. Furthermore, the experimental results demonstrate that CNN+BiL has a significantly longer training time per epoch than CNN+MM; particularly, it is five times longer than that of CNN+MM, likely due to the parallel computation of the CNN and multi-head-attention, which enables simultaneous processing of multiple input elements or subtasks. In contrast, BiLSTM's sequential nature, where computations are performed step-by-step, may result in slower computations than parallelizable operations.
Finally, in the case of the regression task, there is little difference between CNN+MM and CNN+BiL; specifically, the PCCs of CNN+MM and CNN+BiL differ by 0.03 and –0.43 on AAC_BM and GAC_BM, respectively. Interestingly, although the loss weight ratio between the classification task and the regression task of the CNN+MM model was 1:0.6, it had little effect on the correlation coefficient of the regression. The reason may be that the CNN+MM model has a more comprehensive and accurate understanding of the underlying data.
In summary, the CNN+MM classifier effectively captures sequence details on the AAC_BM and GAC_BM datasets, outperforming other models in terms of AUROC and AUPRC on the classification task.
Furthermore, this section presents a comparison of prediction performance between different models in single-task and multi-task settings. The experimental setup involved encoding sequences using one-hot encoding and applying the CNN, CNN+BiL, CNN+TF, CNN+MF, and CNN+MM models to predict modification sites based on the independent dataset. The ratio of the classification loss to the regression loss affects the performance of the models, which is one of the hyper parameters. Only the training sets can be used while choosing the hyper parameters. Therefore, when evaluating the performance of various models based on independent test sets, the models must use the optimal loss weight value for each model as obtained from the training set; the values were obtained as shown in the second column of Table 2.
As shown in Figure 4 and Table 3, except for the CNN+BiL model, the AUROC and the AUPRC scores for the multi-task model based on the two datasets were better than those for the single-task model. In addition, the multi-task model can also calculate the PCC at the same time, so the multi-task model is more efficient than the single-task model.
Datasets | Task type | Classifiers | AUROC | ACC (%) | Sen (%) | Pre (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC | PCC |
AAC_IND | Single-task | CNN | 0.8579 | 78.63 | 81.03 | 77.32 | 57.32 | 76.23 | 79.13 | 0.8510 | |
CNN+BiL | 0.8829 | 80.04 | 79.90 | 80.13 | 60.09 | 80.19 | 80.01 | 0.8737 | |||
CNN+TF | 0.8736 | 78.54 | 76.55 | 79.71 | 57.12 | 80.52 | 78.10 | 0.8642 | |||
CNN+MF | 0.8763 | 78.97 | 77.72 | 79.72 | 57.96 | 80.23 | 78.71 | 0.8657 | |||
CNN+MM | 0.8820 | 80.06 | 83.10 | 78.34 | 60.23 | 77.03 | 80.65 | 0.8724 | |||
Multi-task | CNN (λ = 1) | 0.8581 | 78.55 | 81.1 | 77.17 | 57.18 | 76.01 | 79.09 | 0.8516 | 0.5946 | |
CNN+BiL (λ = 1) | 0.8801 | 79.75 | 79.1 | 80.15 | 59.51 | 80.41 | 79.62 | 0.8682 | 0.6042 | ||
CNN+TF (λ = 1) | 0.8768 | 80.44 | 83.1 | 78.91 | 60.97 | 77.79 | 80.95 | 0.8666 | 0.613 | ||
CNN+MF (λ = 1) | 0.8816 | 80.12 | 89.35 | 75.42 | 61.29 | 70.88 | 81.80 | 0.8697 | 0.6053 | ||
CNN+MM (λ = 0.6) | 0.8888 | 80.97 | 83.72 | 79.36 | 62.03 | 78.23 | 81.48 | 0.8775 | 0.6219 | ||
GAC_IND | Single-task | CNN | 0.8553 | 77.92 | 84.78 | 74.55 | 56.37 | 71.06 | 79.33 | 0.8500 | |
CNN+BiL | 0.8767 | 79.97 | 82.45 | 78.55 | 60.01 | 77.48 | 80.45 | 0.8667 | |||
CNN+TF | 0.8699 | 77.26 | 92.94 | 70.75 | 57.41 | 61.58 | 80.34 | 0.8589 | |||
CNN+MF | 0.8635 | 73.29 | 95.67 | 66.09 | 52.09 | 50.91 | 78.18 | 0.8509 | |||
CNN+MM | 0.8761 | 80.20 | 84.23 | 77.94 | 60.59 | 76.16 | 80.96 | 0.8660 | |||
Multi-task | CNN (λ = 0.4) | 0.8560 | 78.28 | 83.45 | 75.63 | 56.87 | 73.11 | 79.35 | 0.8507 | 0.5935 | |
CNN+BiL (λ = 1) | 0.8729 | 78.87 | 79.63 | 78.45 | 57.75 | 78.12 | 79.03 | 0.8665 | 0.6124 | ||
CNN+TF (λ = 0.4) | 0.8734 | 79.58 | 86.69 | 75.90 | 59.77 | 72.47 | 80.94 | 0.8622 | 0.6029 | ||
CNN+MF (λ = 1) | 0.8795 | 80.20 | 83.59 | 78.28 | 60.53 | 76.80 | 80.85 | 0.8698 | 0.6202 | ||
CNN+MM (λ = 0.6) | 0.8880 | 81.11 | 83.23 | 79.84 | 62.27 | 78.99 | 81.50 | 0.8783 | 0.6343 |
The comparison results, demonstrate that CNN+MM, operating under the multi-tasking framework, outperforms other models across various evaluation metrics such as the AUROC, AUPRC, PCC, ACC, and MCC. Specifically, for AAC_IND and GAC_IND sites, CNN+MM achieved AUROC values of 0.8888 and 0.888, respectively, exhibiting better performance than other methods. In contrast, CNN+BiL did not incorporate the multi-head-attention mechanism, potentially limiting their ability to capture global contextual information as compared to CNN+MM. CNN+MF and CNN+TF contain the multi-head-attention layer. However, they both contain the feed-forward layer, which lacks the ability to effectively capture deeper global information compared to multi-head-attention layers, resulting in a less accurate understanding of long-term dependencies or global patterns in the sequence.
Considering the similarity between the two-stage classification tasks, we chose to leverage the feature extraction layer of the source domain-stage model to construct the target domain-stage model. Specifically, in the transfer learning process, we initialize the parameters of the target domain-stage model with the feature extraction layer (excluding the output layer) and the corresponding weights of the source domain-stage model. During training, we optimize all of the weights of the target domain-stage model without freezing them.
To evaluate the effectiveness of transfer learning compared to training from scratch, we also trained the same network without using the weights obtained from the source domain-stage model. Furthermore, to assess the performance of multi-task transfer learning against single-task fine-tuning, we also trained the same network indirectly through single-task transfer learning. We conducted three sets of comparative experiments on the GAC_hr_BM datasets, and the results of these experiments are shown in Table 3.
As shown in Figure 5 and Table 4, the AUROC and AUPRC values for MTTLm6Asingle were 2.07% and 2.92% higher than those for MTTLm6Adirect. This improvement suggests that transfer learning enhances the model's performance relative to training without transfer learning. The reason for this improvement is that transfer learning allows the model to leverage knowledge gained from related tasks or domains, allowing it to generalize well to unseen data. Furthermore, the performance of the model trained through multi-task transfer learning was significantly better than that of the model trained through single-task transfer learning. Specifically, the AUROC and AUPRC values for MTTLm6A were 0.86% and 0.68% higher than those for MTTLm6Asingle, respectively. This result can be attributed to the complementary information provided by the combination of classification and regression tasks. While the classification task focuses on predicting discrete class labels, the regression task aims to estimate continuous values. By jointly training the model on both tasks, MTTLm6A can effectively utilize the complementary information from both tasks to improve its understanding of the data and make more accurate predictions.
Classifiers | AUROC | ACC (%) | Sen (%) | Pre (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC |
MTTLm6A direct | 0.6750 | 62.33 | 61.69 | 62.49 | 24.81 | 62.97 | 62.09 | 0.6576 |
MTTLm6A single | 0.6957 | 63.74 | 60.28 | 64.77 | 27.69 | 67.21 | 62.44 | 0.6871 |
MTTLm6A | 0.7043 | 64.40 | 65.53 | 64.08 | 28.94 | 63.26 | 64.80 | 0.6938 |
MTTLm6Adirect refers to the model trained directly without transfer learning; MTTLm6Asingle indicates the model trained indirectly through single-task transfer learning. |
Finally, MTTLm6A was compared with other state-of-the-art approaches on the GAC_hr_IND datasets, including m6A-word2vec, MultiRM, and MTDeepM6A-2S. To make the comparison more convincing, we included the MTTLm6Asingle model in the evaluation.
As shown in Figure 6 and Table 5, the AUROC and AUPRC values for MTTLm6A were higher than those obtained for the other approaches. In particular, compared to MTDeepM6A-2S, the second-best performing model utilizing multi-task transfer learning, MTTLm6A, demonstrated an improvement of 0.37% in AUROC and 0.58% in AUPRC. This enhancement can be attributed to MTTLm6A's ability to capture long-range dependencies and global contextual information in the input sequence compared to MTDeepM6A-2S.
Classifiers | AUROC | ACC (%) | Sen (%) | Precision (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC |
m6A-word2vec | 0.6008 | 57.78 | 58.96 | 57.60 | 15.57 | 48.83 | 58.27 | 0.586 |
MultiRM | 0.6947 | 63.75 | 69.08 | 62.43 | 27.66 | 44.67 | 65.59 | 0.6922 |
MTTLm6Asingle | 0.7599 | 69.10 | 67.56 | 69.71 | 38.23 | 70.65 | 68.62 | 0.7709 |
MTDeepM6A-2S | 0.7676 | 70.44 | 66.81 | 72.04 | 40.98 | 74.07 | 69.32 | 0.7688 |
MTTLm6A | 0.7713 | 69.85 | 74.81 | 68.06 | 39.90 | 64.89 | 71.28 | 0.7746 |
Furthermore, the AUROC and AUPRC values for MTTLm6A were 1.14% and 0.37% higher than those for MTTLm6Asingle, respectively. The incorporation of multi-task learning in the MTTLm6A model allows for joint training on both classification and regression tasks. This integration allows the model to learn shared representations and extract more informative features that benefit both tasks. By collectively optimizing these tasks, the model achieves improved overall performance. In contrast, MTTLm6Asingle focuses solely on the classification task, potentially limiting its ability to capture the full accuracy of the data.
Additionally, MTTLm6A surpassed MultiRM by a notable margin, exhibiting AUROC and AUPRC improvements of 7.66% and 8.24%, respectively. This highlights the effectiveness of MTTLm6A in addressing the challenges posed by small sample classification data modeling problems.
In order to evaluate the reliability of the model, the m6A-word2vec, MultiRM, MTTLm6Asingle, MTDeepM6A-2S, and MTTLm6A models were applied for 100 replicate experiments on the same independent test sets of m6A, respectively. After 100 replicates, we tested the statistical significance of AUROC values between different tools by using the student's t-test [70]. The results are as shown in Table 6.
Modification type | Classifiers | Classifiers | ||||
m6A-word2vec | MultiRM | MTTLm6Asingl | MTDeepM6A-2S | MTTLm6A | ||
m6A | m6A-word2vec | |||||
MultiRM | 0 | |||||
MTTLm6Asingle | 0 | 0 | ||||
MTDeepM6A-2S | 0 | 0 | 0 | |||
MTTLm6A | 0 | 0 | 0 | 0 |
In order to evaluate the generalization ability of the MTTLm6A model, MTTLm6A was compared and analyzed against m6A-word2vec, MultiRM, MTTLm6Asingle and MTDeepM6A-2S by using the training set and independent set of m1A sites of Homo sapiens. As shown in Table 7, the AUROC and AUPRC values for MTTLm6A were higher than those obtained for other approaches. In particular, compared to MTTLm6Asingle, the second-best performing model, MTTLm6A demonstrated an improvement of 1.47% in AUROC and 0.63% in AUPRC. This shows that multi-task learning is effective in improving model performance. In summary, the MTTLm6A model has good versatility in predicting different methylation sites in different species.
Classifiers | AUROC | ACC (%) | Sen (%) | Precision (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC |
m6A-word2vec | 0.9095 | 91.67 | 100.00 | 85.71 | 84.52 | 83.33 | 92.31 | 0.8722 |
MultiRM | 0.9126 | 91.23 | 99.12 | 85.61 | 83.50 | 83.33 | 91.87 | 0.8818 |
MTTLm6Asingle | 0.9143 | 90.79 | 98.25 | 85.50 | 82.50 | 83.33 | 91.43 | 0.8841 |
MTDeepM6A-2S | 0.894 | 91.23 | 99.12 | 85.61 | 83.50 | 83.33 | 91.87 | 0.8509 |
MTTLm6A | 0.929 | 91.23 | 99.12 | 85.61 | 83.50 | 83.33 | 91.87 | 0.8904 |
We have developed a user-friendly web server, accessible at http://47.242.23.141/MTTLm6A/index.php, to facilitate the utilization of the MTTLm6A model as a tool for predicting the base-resolution m6A sites. Simply type or paste the RNA sequence of interest into the designated input area. To receive the prediction results, kindly provide your email address in the corresponding box and click the "submit" button. After a brief calculation period, the prediction results will be presented in a clear and organized table format. This intuitive web server provides researchers with an efficient and convenient platform to leverage the MTTLm6A model to quickly predict the base-resolution m6A site.
To assess the impact of different loss weights on the source domain-stage model, we conducted an optimization process by using the grid search method with AAC_BM and GAC_BM datasets. Through this process, we identified an optimal weight ratio of 1:0.6. The evaluation metrics, as displayed in Table 8, validate the effectiveness of this weight configuration. This finding aligns with the conclusions drawn by Kendall et al. [71], who emphasized the importance of relative weighting in multi-task learning scenarios. Their research demonstrated that numerous DL applications benefit from incorporating multiple regression and classification objectives, but the performance of such systems relies heavily on the appropriate weighting assigned to each task's loss function. By determining the optimal loss weight ratio for our source domain-stage model, we aimed to enhance its predictive capabilities and ensure a balanced influence of the classification and regression tasks. This optimization process allows us to fully leverage the benefits of multi-task learning and maximize the performance of our model.
Datasets | Weight ratio | AUROC | ACC (%) | Sen (%) | Pre (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC | PCC |
AAC_BM | 1:0.1 | 0.8753 | 79.3 | 80.59 | 78.56 | 59.05 | 78.01 | 79.56 | 0.8579 | 0.5894 |
1:0.2 | 0.8745 | 77.95 | 74.85 | 79.80 | 56.9 | 81.05 | 77.25 | 0.8586 | 0.6019 | |
1:0.3 | 0.8788 | 78.74 | 77.67 | 79.36 | 58.38 | 79.81 | 78.51 | 0.8615 | 0.6093 | |
1:0.4 | 0.8734 | 76.56 | 70.76 | 80.05 | 54.77 | 82.36 | 75.12 | 0.8574 | 0.6126 | |
1:0.5 | 0.8745 | 78.16 | 77.85 | 78.34 | 57.62 | 78.48 | 78.10 | 0.8590 | 0.6078 | |
1:0.6 | 0.8793 | 79.57 | 80.94 | 79.06 | 59.98 | 78.56 | 79.99 | 0.8636 | 0.614 | |
1:0.7 | 0.8754 | 77.6 | 73.8 | 79.86 | 56.65 | 81.39 | 76.71 | 0.8587 | 0.6108 | |
1:0.8 | 0.8753 | 78.04 | 76.7 | 78.81 | 56.98 | 79.38 | 77.74 | 0.8592 | 0.6218 | |
1:0.9 | 0.8749 | 77.85 | 75.95 | 78.94 | 56.95 | 79.74 | 77.42 | 0.8584 | 0.6124 | |
1:1.0 | 0.8756 | 78.66 | 77.96 | 79.06 | 58.11 | 79.36 | 78.51 | 0.8595 | 0.5996 | |
GAC_BM | 1:0.1 | 0.8746 | 79.62 | 81.97 | 78.30 | 59.59 | 77.28 | 80.09 | 0.8584 | 0.6011 |
1:0.2 | 0.8771 | 79.65 | 80.76 | 78.88 | 59.54 | 77.54 | 80.29 | 0.8612 | 0.6033 | |
1:0.3 | 0.8761 | 79.58 | 80.27 | 79.18 | 59.56 | 78.9 | 79.72 | 0.8627 | 0.6095 | |
1:0.4 | 0.8764 | 79.21 | 83.78 | 76.77 | 59.31 | 74.65 | 80.12 | 0.8634 | 0.6144 | |
1:0.5 | 0.8746 | 78.62 | 82.57 | 76.52 | 58.25 | 74.67 | 79.43 | 0.8605 | 0.6136 | |
1:0.6 | 0.8772 | 79.06 | 78.69 | 79.28 | 58.48 | 79.43 | 79.80 | 0.8635 | 0.6156 | |
1:0.7 | 0.8733 | 78.8 | 77.89 | 79.33 | 57.94 | 79.71 | 78.61 | 0.8584 | 0.6187 | |
1:0.8 | 0.8753 | 79.1 | 81.77 | 77.63 | 58.93 | 76.44 | 79.65 | 0.8589 | 0.6150 | |
1:0.9 | 0.8766 | 79.60 | 80.99 | 78.8 | 59.39 | 78.21 | 79.88 | 0.8613 | 0.6218 | |
1:1.0 | 0.8756 | 78.66 | 77.96 | 79.06 | 58.11 | 79.36 | 78.51 | 0.8595 | 0.6220 |
To evaluate the effectiveness of our multi-task learning architecture, we conducted two separate experiments: single-task learning for the classification task and single-task learning for the regression task. Both experiments used the CNN+MM network. Table 9 presents the cross-validation results of the single-task classification model and the multi-task model based on two different benchmark datasets. Our findings reveal that, in the case of multi-task learning, the AUROC values for the classification task were 0.8794 and 0.8772 for AAC_BM and GAC_BM, respectively. In comparison, the AUROC values for the classification task in single-task learning were 0.8774 and 0.8769 for AAC_BM and GAC_BM, respectively. Therefore, the performance of the multi-task classification model surpassed that of the single-task classification model for both AAC_BM and GAC_BM. The reason may be that multiple related tasks help to regularize each other and a more robust representation can be learned; thus, multi-task learning is usually believed to improve network performance. These results align with a study by Ruder [72], which emphasizes that multi-task learning allows the model to focus its attention on relevant features, as other tasks provide additional evidence for determining the relevance or irrelevance of those features. By adopting a multi-task learning approach, our model benefits from the shared representation and complementary information across tasks, leading to improved classification performance. Specifically, the model adds NSES information, which helps identify poor-quality methylation sites. These findings underscore the effectiveness of our multi-task learning architecture in enhancing model performance and feature relevance assessment.
Datasets | task (Weight-ratio) | AUROC | ACC (%) | Sen (%) | Pre (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC |
AAC_BM | single-task | 0.8774 | 78 | 74.71 | 79.97 | 57.33 | 81.29 | 77.25 | 0.8613 |
multi-task (1:0.6) | 0.8794 | 79.57 | 80.94 | 79.06 | 59.98 | 78.56 | 79.99 | 0.8636 | |
GAC_BM | single-task | 0.8769 | 79.76 | 79.61 | 79.85 | 59.7 | 79.91 | 79.73 | 0.8613 |
multi-task (1:0.6) | 0.8772 | 79.06 | 78.69 | 79.28 | 58.48 | 79.43 | 79.80 | 0.8635 |
Table 10 presents the PCC results for the cross-validation of the single-task regression model and the multi-task model on AAC_BM and GAC_BM. The results indicate that the multi-task model slightly outperforms the single-task regression model on both datasets. Specifically, the correlation coefficients for the regression task were 0.614 and 0.6156 for multi-task learning, while they were 0.6042 and 0.6 for single-task learning for AAC_BM and GAC_BM, respectively. Interestingly, despite the loss weight ratio between the classification and regression tasks of the CNN+MM model in the source domain-stage being 1:0.6, the impact on the correlation coefficients of the regressions is minimal. This suggests that the improved performance of our multi-task model relative to the single-task regression model can be attributed to several factors, including the utilization of complementary information, shared feature representation, regularization techniques, and knowledge transfer between tasks. By leveraging multi-task learning, our model benefits from the synergistic effects of multiple tasks, leading to enhanced PCCs. This highlights the advantages of incorporating related tasks and sharing representations, ultimately resulting in a more comprehensive and accurate understanding of the underlying data. By considering the interplay between tasks and the complementary nature of their information, we can leverage multi-task learning to further improve the performance of our model and achieve superior results across a range of metrics.
Datasets | PCC for multi-task regression models | PCC for single-task regression models |
AAC_BM | 0.614 | 0.6042 |
GAC_BM | 0.6156 | 0.6 |
In summary, our results demonstrate that the multi-task learning approach outperforms single-task learning on all four tasks on both the AAC_BM and GAC_BM datasets. Furthermore, the multi-task learning framework offers an added advantage of increased efficiency compared to the single-task model. By simultaneously addressing multiple tasks, the model can accomplish more with fewer resources and reduced computational overhead. In conclusion, our findings highlight the superiority of the multi-task model over the single-task model.
The contribution of this paper lies in the development of a novel predictor called MTTLm6A, which utilizes a multi-task learning and transfer learning approach based on an improved transformer architecture to identify base-resolution mRNA m6A sites. Experimental results on Saccharomyces cerevisiae m6A and Homo sapiens m1A data demonstrate that MTTLm6A respectively achieved AUROC values of 77.13% and 92.9%, outperforming the state-of-the-art models. At the same time, it shows that the model has strong generalization ability. But the model has a limitation, which is that source domain-stage training requires samples with NSES information, which is a necessary condition for multi-task learning.
Furthermore, considering that multi-task learning tends to benefit from an increasing number of tasks, we intend to delve deeper into the characteristics of methylation parameters. By incorporating more effective tasks into the learning framework, we will extract more base-resolution methylation sequences from low-resolution methylation sequences of different species by using the MTTLm6A model for scientific research.
In conclusion, the development of MTTLm6A, its promising performance, and future research directions contribute to the advancement of computational methods for the identification of methylation sites, demonstrating its potential for broader application and further refinement in the future.
The authors declare that they have not used artificial intelligence tools in the creation of this article.
This work has been supported by the National Natural Science Foundation of China (31871337 and 61971422), and the "333 Project" of Jiangsu (BRA2020328).
The authors declare no conflict of interest.
[1] |
Pavone P, Ceccarelli M, Marino S, et al. (2021) SARS-CoV-2 related paediatric acute-onset neuropsychiatric syndrome. Lancet Child Adolesc Health 5: e19-e21. https://doi.org/10.1016/s2352-4642(21)00135-8 ![]() |
[2] | Parkin E, Law R Support for Children and Young People's Mental Health (2021). Available from: https://core.ac.uk/download/pdf/475650372.pdf |
[3] | Drewes AA, Seymour JW (2019) Integrative Psychotherapy with Children. Handbook of Psychotherapy Integration . Oxford: Oxford University Press 341-356. https://doi.org/10.1093/med-psych/9780190690465.003.0016 |
[4] | Winnicott DW (1960) The theory of the parent-infant relationship 1 (1960). The Maturational Processes and the Facilitating Environment . London: Karnac Books Ltd. https://doi.org/10.4324/9780429482410-3 |
[5] | NHS DigitalMental Health of Children and Young People in England, 2020: Wave 1 follow up to the 2017 survey (2020). Available from: https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-of-children-and-young-people-in-england/2020-wave-1-follow-up |
[6] | NHS DigitalMental Health of Children and Young People in England, 2023: Wave 4 follow up to the 2017 survey (2023). Available from: https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-of-children-and-young-people-in-england |
[7] |
Deighton J, Croudace T, Fonagy P, et al. (2014) Measuring mental health and wellbeing outcomes for children and adolescents to inform practice and policy: a review of child self-report measures. Child Adolesc Psychiatry Ment Health 8: 1-14. https://doi.org/10.1186/1753-2000-8-14 ![]() |
[8] |
Cadman D, Boyle M, Szatmari P, et al. (1987) Chronic illness, disability, and mental and social well-being: findings of the Ontario Child Health Study. Pediatrics 79: 805-813. https://doi.org/10.1542/peds.79.5.805 ![]() |
[9] |
Becker S, Sempik J (2019) Young adult carers: the impact of caring on health and education. Child Soc 33: 377-386. https://doi.org/10.1111/chso.12310 ![]() |
[10] | Kooth plcKooth Pulse 2021 Identifying issues and collaborating to enable positive mental health for all (2021). Available from: https://explore.kooth.com/wp-content/uploads/2021/05/Kooth-Pulse-2021-Report.pdf |
[11] |
Loades ME, Chatburn E, Higson-Sweeney N, et al. (2020) Rapid systematic review: the impact of social isolation and loneliness on the mental health of children and adolescents in the context of COVID-19. J Am Acad Child Adolesc Psychiatr 59: 1218-1231. https://doi.org/10.1016/j.jaac.2020.05.009 ![]() |
[12] | Peytrignet S, Marszalek K, Grimm F, et al. (2022) Children and young people's mental health: COVID-19 and the road ahead. The Health Foundation . Available from: www.health.org.uk/news-and-comment/charts-andinfographics/children-and-young-people-s-mental-health |
[13] | Grimm F, Alcock B, Butler J, et al. (2022) Improving children and young people's mental health services. Local data insights from England, Scotland and Wales. The Health Foundation . Available from: https://www.health.org.uk/publications/reports/improving-children-and-young-peoples-mental-health-services. https://doi.org/10.37829/hf-2022-ndl1 |
[14] |
Midgley N, Mortimer R, Cirasola A, et al. (2021) The evidence-base for psychoanalytic and psychodynamic psychotherapy with children and adolescents. An update and narrative synthesis. Front Psychol 12: 662671. https://doi.org/10.3389/fpsyg.2021.662671 ![]() |
[15] | Freud S Analysis of a phobia in a five-year-old boy, standard edition (1909)10: 1-150. https://doi.org/10.1037/e417472005-239 |
[16] |
Menaker E (2001) Anna Freud's analysis by her father: the assault on the self. J Relig Health 40: 89-95. ![]() |
[17] | Kazdin AE, Weisz JR (2003) Introduction: Context and background of evidence-based psychotherapies for children and adolescents. Evidence-based Psychotherapies for Children and Adolescents . New York: The Guidlford Press 3-20. |
[18] |
Midgley N (2008) The ‘Matchbox School’ (1927–1932): Anna Freud and the idea of a ‘psychoanalytically informed education’. J Child Psychother 34: 23-42. https://doi.org/10.1080/00754170801895920 ![]() |
[19] | Hug-Hellmuth H (1913) A Study of the Mental Health of a Child. Washington, DC: Nervous and Mental Disease Publishing Company. |
[20] | Hug-Hellmuth H (1921) On the technique of child analysis. Int J Psychoanal 2: 287-305. |
[21] | Klein M (1932) The Psychoanalysis of Children. London: The Hogarth Press Company. |
[22] | Likierman M, Urban E (2009) The roots of child and adolescent psychotherapy in psychoanalysis. The Handbook of Child and Adolescent Psychotherapy . London: Routledge 29-40. https://doi.org/10.4324/9780203135341-10 |
[23] | Grünbaum L, Mortensen KV (2018) Psychodynamic Child and Adolescent Psychotherapy: Theories and Methods. London: Karnac. https://doi.org/10.4324/9781003423638 |
[24] | Daws D (1987) Thirty Years of Child Psychotherapy: The Psychoanalytic Approach to Children's Problems (Tavistock Clinic Paper, No. 48). London: Tavistock Clinic. |
[25] |
Koocher GP, D'Angelo EJ (1992) Evolution of practice in child psychotherapy. History of psychotherapy: A century of change . American Psychological Association 457-492. https://doi.org/10.1037/10110-012 ![]() |
[26] |
Hernández-Martínez C, Voltas Moreso N, Arija Val V, et al. (2019) The role of maternal emotional states during pregnancy and early infancy on infant cortisol levels: A prospective study. Infant Child Dev 28: 21-49. https://doi.org/10.1002/icd.2149 ![]() |
[27] |
Schore AN (2001) Effects of a secure attachment relationship on right brain development, affect regulation, and infant mental health. Infant Ment Health J 22: 7-66. https://doi.org/10.1002/1097-0355(200101/04)22:1<7::aid-imhj2>3.0.co;2-n ![]() |
[28] | Schore AN (2015) Affect Regulation and the Origin of the Self: The Neurobiology of Emotional Development. London: Routledge. https://doi.org/10.4324/9781315680019 |
[29] | Barish K (2018) How to be a Better Child Therapist: An Integrative Model for Therapeutic Change. New York: WW Norton & Company. |
[30] |
Lewis M (2005) Bridging emotion theory and neurobiology through dynamic systems modeling. Behav Brain Sci 28: 169-194. ![]() |
[31] | Coonerty S (1993) Integrative child therapy. Comprehensive Handbook of Psychotherapy Integration . Boston, MA: Springer. https://doi.org/10.1007/978-1-4757-9782-4_28 |
[32] | Shirk SR, Russell RL (1996) Change Processes in Child Psychotherapy: Revitalizing Treatment and Research. New York: Guilford Press. |
[33] |
Fonagy P, Target M (1998) Mentalisation and the changing aims of child psychoanalysis. Psychoanal Dialogues 8: 87-114. https://doi.org/10.1080/10481889809539235 ![]() |
[34] |
Han HR, Miller HN, Nkimbeng M, et al. (2021) Trauma informed interventions: A systematic review. PLoS ONE 16: e0252747. https://doi.org/10.1371/journal.pone.0252747 ![]() |
[35] |
Felitti VJ, Anda RF, Nordenberg D, et al. (1998) Relationship of childhood abuse and household dysfunction to many of the leading causes of death in adults: The Adverse Childhood Experiences (ACE) Study. Am J Prev Med 14: 245-258. https://doi.org/10.1016/s0749-3797(98)00017-8 ![]() |
[36] |
Isobel S, Goodyear M, Furness T, et al. (2019) Preventing intergenerational trauma transmission: A critical interpretive synthesis. J Clin Nurs 28: 1100-1113. https://doi.org/10.1111/jocn.14735 ![]() |
[37] |
Schore AN (2001) The effects of early relational trauma on right brain development, affect regulation, and infant mental health. Infant Ment Health J 22: 201-269. https://doi.org/10.1002/1097-0355(200101/04)22:1<201::aid-imhj8>3.0.co;2-9 ![]() |
[38] |
Schore AN (2021) The interpersonal neurobiology of intersubjectivity. Front Psychol 12: 648616. https://doi.org/10.3389/fpsyg.2021.648616 ![]() |
[39] |
Adler-Tapia R (2012) Child Psychotherapy: Integrating Developmental Theory into Clinical Practice. New York: Springer Publishing Company. https://doi.org/10.1891/9780826106742 ![]() |
[40] |
Adler-Tapia R, Settle C (2023) EMDR and the Art of Psychotherapy with Children: Guidebook and Treatment Manual. New York: Springer Publishing Company. https://doi.org/10.1891/9780826169969 ![]() |
[41] |
Chaffin M, Hanson R, Saunders BE, et al. (2006) Report of the APSAC task force on attachment therapy, reactive attachment disorder, and attachment problems. Child Maltreatment 11: 76-89. https://doi.org/10.1177/1077559505283699 ![]() |
[42] |
Zilberstein K (2014) The use and limitations of attachment theory in childpsychotherapy. Psychotherapy 51: 93. https://doi.org/10.1037/a0030930 ![]() |
[43] | Fairbairn WRD (1943) The Repression and Return of Bad Objects (with Special Reference to the ‘War Neuroses’). Psycho-Analytic Studies of the Personality . London: Routledge. https://doi.org/10.4324/9780203129449-7 |
[44] | Winnicott DW (1964) The Child, The Family and The Outside World. Harmondsworth: Penguin Books. https://doi.org/10.1093/med:psych/9780190271398.003.0017 |
[45] | Stern DN (1985) The Interpersonal World of the Infant: A View from Psychoanalysis and Developmental Psychology. London: Routledge. https://doi.org/10.4324/9780429482137 |
[46] |
Barish K (2018) Cycles of understanding and hope: toward an integrative model of therapeutic change in child psychotherapy. J Infant Child Adolesc Psychotherapy 17: 232-242. https://doi.org/10.1080/15289168.2018.1526022 ![]() |
[47] |
Nuñez L, Fernández S, Alamo N, et al. (2022) The therapeutic relationship and change processes in child psychotherapy: a qualitative, longitudinal study of the views of children, parents and therapists. Res Psychother Psych 25: 67-81. https://doi.org/10.4081/ripppo.2022.556 ![]() |
[48] |
Muñoz-Pérez C, Núñez Hidalgo LB, Capella Sepúlveda C, et al. (2023) Early child therapeutic relationship in child sexual abuse: Perspectives of children and their therapists. Psicoperspectivas 22: 117. https://doi.org/10.5027/psicoperspectivas-vol22-issue3-fulltext-2972 ![]() |
[49] | Nuñez L, Midgley N, Capella C, et al. (2021) The therapeutic relationship in child psychotherapy: integrating the perspectives of children, parents and therapists. Psychother Res 28: 1-13. https://doi.org/10.1080/10503307.2021.1876946 |
[50] |
Greenberg LS (2002) Emotion-focused therapy: Coaching clients to work through their feelings. Washington, DC: APA Press. https://doi.org/10.1037/10447-000 ![]() |
[51] |
Whelton WJ (2004) Emotional processes in psychotherapy: evidence across therapeutic modalities. Clin Psychol Psychother 11: 58-71. https://doi.org/10.1002/cpp.392 ![]() |
[52] | Sønderland NM, Solbakken OA, Eilertsen DE, et al. (2023) Emotional changes and outcomes in psychotherapy: a systematic review and meta-analysis. J Consult Clin Psychol . https://dx.doi.org/10.1037/ccp0000814 |
[53] |
Lane RD, Subic-Wrana C, Greenberg L, et al. (2022) The role of enhanced emotional awareness in promoting change across psychotherapy modalities. J Psychother Integr 32: 131-150. https://doi.org/10.1037/int0000244 ![]() |
[54] | Shirk SR, Burwell RA (2010) Research on therapeutic processes: In psychodynamic psychotherapy with children and adolescents. Assessing Change in Psychoanalytic Psychotherapy of Children and Adolescents . London: Karnac 177-205. https://doi.org/10.4324/9780429472008-7 |
[55] |
Fonagy P, Target M (1997) Attachment and reflective function: Their role in self-organization. Dev Psychopathol 9: 679-700. https://doi.org/10.1017/s0954579497001399 ![]() |
[56] |
Slade A (2005) Parental reflective functioning: An introduction. Attach Hum Dev 7: 269-281. https://doi.org/10.1080/14616730500245906 ![]() |
[57] |
Byrne G, Murphy S, Connon G (2020) Mentalization-based treatments with children and families: A systematic review of the literature. Clin Child Psychol Psychiatry 25: 1022-1048. https://doi.org/10.1177/1359104520920689 ![]() |
[58] | Schmidt Neven R (2010) Core Principles of Assessment and Therapeutic Communication with Children, Parents and Families: Towards the Promotion of Child and Family Wellbeing. London: Routledge. https://doi.org/10.4324/9780203841136 |
[59] |
Bosqui TJ, Marshoud B (2018) Mechanisms of change for interventions aimed at improving the wellbeing, mental health and resilience of children and adolescents affected by war and armed conflict: a systematic review of reviews. Confl Health 12: 1-17. https://doi.org/10.1186/s13031-018-0153-1 ![]() |
[60] | Sander LW (1977) The regulation of exchange in the infant-caregiver system and some aspects of the context-content relationship. Interaction, Conversation, and the Development of Language . New York: Wiley 133-155. |
[61] |
Stern DN, Bruschweiler-Stern N, Harrison AM, et al. (1998) The process of therapeutic change involving implicit knowledge: Some implications of developmental observations for adult psychotherapy. Infant Ment Health J 19: 300-308. https://doi.org/10.1002/(sici)1097-0355(199823)19:3<300::aid-imhj5>3.0.co;2-p ![]() |
[62] | Stern DN (2004) The Present Moment in Psychotherapy and Everyday Life (Norton series on interpersonal neurobiology). New York: WW Norton & Company. |
[63] | Bentzen M, Hart S (2015) Through Windows of Opportunity A Neuroaffective Approach to Child Psychotherapy. London: Routledge. https://doi.org/10.4324/9780429484018 |
[64] | Vygotsky L (1978) Mind in Society. The Development of Higher Mental Processes. Cambridge, MA: Harvard University Press. https://doi.org/10.2307/j.ctvjf9vz4 |
[65] | Alvarez A (1992) Live Company. Psychoanalytic Psychotherapy with Autistic, Borderline, Deprived and Abused Children. London: Routledge. https://doi.org/10.4324/9780203713839 |
[66] |
Alvarez A (2006) Some questions concerning states of fragmentation: unintegration, under-integration, disintegration, and the nature of early integrations. J Child Psychother 32: 158-180. https://doi.org/10.1080/00754170600780331 ![]() |
[67] |
Fitzpatrick M (1993) Adolescents. Comprehensive Handbook of Psychotherapy Integration . New York: Plenum Press 427-436. https://doi.org/10.1007/978-1-4757-9782-4_29 ![]() |
[68] | Boswell JF, Newman MG, McGinn LK (2019) Outcome research on psychotherapy integration. Handbook of Psychotherapy Integration . Oxford: Oxford University Press 405-431. https://doi.org/10.1093/medpsych/9780190690465.003.0019 |
[69] | Fonagy P, Gergely G, Jurist E, et al. (2002) Affect Regulation, Mentalization and the Development of the Self. New York: Other Press. https://doi.org/10.4324/9780429471643 |
[70] | Bion W (1962) Learning from Experience. New York: Basic Books. https://doi.org/10.4324/9781003411840-1 |
[71] |
Rustin R, Rustin M (2019) New Discoveries in Child Psychotherapy. Findings from Qualitative Research. London: Routledge. https://doi.org/10.4324/9780429282294 ![]() |
[72] | Novick KK, Novick J (2005) Working with Parents Makes Therapy Work. Plymouth: Jason Aronson. |
[73] | Beebe B (2005) Faces in relation: Forms of Intersubjectivity in an Adult treatment of early trauma. Forms of Intersubjectivity . Other Press, LLC 89-143. |
[74] |
Beebe B, Hoven CW, Kaitz M, et al. (2020) Urgent engagement in 9/11 pregnant widows and their infants: Transmission of trauma. Infancy 25: 165-189. https://doi.org/10.1111/infa.12323 ![]() |
[75] |
Lyons-Ruth K (1996) Attachment relationships among children with aggressive behaviour problems: the role of disorganized early attachment patterns. J Consult Clin Psychol 64: 64-73. https://doi.org/10.1037/0022-006x.64.1.64 ![]() |
[76] |
Lyons-Ruth K, Yellin C, Melnick S, et al. (2005) Expanding the concept of unresolved mental states: Hostile/helpless states of mind on the Adult Attachment Interview are associated with disrupted mother–infant communication and infant disorganization. Dev Psychopathol 17: 1-23. https://doi.org/10.1017/s0954579405050017 ![]() |
[77] | Lyons-Ruth K (2006) Play, precariousness, and the negotiation of shared meaning: A developmental research perspective on child psychotherapy. J Infant Child Adolesc Psychotherapy 5: 142-159. https://doi.org/10.2513/s15289168jicap0502_2 |
[78] | Sleed M, Bland K (2007) . London: North Central London Strategic Health Authority. |
[79] | Barlow J, Bennett C, Midgley N, et al. (2015) Parent-infant psychotherapy for improving parental and infant mental health. Cochrane Database Syst Rev 1. https://doi.org/10.1002/14651858.cd010534.pub2 |
[80] |
Barlow J, Bennett C, Midgley N, et al. (2016a) Parent–infant psychotherapy: a systematic review of the evidence for improving parental and infant mental health. J Reprod Infant Psychol 34: 464-482. https://doi.org/10.1080/02646838.2016.1222357 ![]() |
[81] |
Huang R, Yang D, Lei B, et al. (2020) The short-and long-term effectiveness of mother–infant psychotherapy on postpartum depression: A systematic review and meta-analysis. J Affect Disord 260: 670-679. https://doi.org/10.1016/j.jad.2019.09.056 ![]() |
[82] |
Barlow J, Schrader-McMillan A, Axford N, et al. (2016) Attachment and attachment-related outcomes in preschool children–a review of recent evidence. Child Adolesc Ment Health 21: 11-20. https://doi.org/10.1111/camh.12138 ![]() |
[83] | Stern DN (1995) The Motherhood Constellation: A Unified View of Parent-infant Psychotherapy. New York: International Universities Press. https://doi.org/10.4324/9780429482489-12 |
[84] |
Barrows P (1997) Parent-infant psychotherapy: a review article. J Child Psychother 23: 255-264. https://doi.org/10.1080/00754179708254545 ![]() |
[85] |
Avdi E, Amiran K, Baradon T, et al. (2020) Studying the process of psychoanalytic parent–infant psychotherapy: Embodied and discursive aspects. Infant Ment Health J 41: 589-602. https://doi.org/10.1002/imhj.21888 ![]() |
[86] | Moore ER, Anderson GC, Bergman N, et al. (2021) Early skin-to-skin contact for mothers and their healthy new-born infants. Cochrane Database of Systematic Reviews . Available from: https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD003519.pub4/full |
[87] |
Ionio C, Ciuffo G, Landoni M (2021) Parent–Infant Skin-to-Skin Contact and Stress Regulation: A Systematic Review of the Literature. Int J Environ Res Public Health 18: 4695. https://doi.org/10.3390/ijerph18094695 ![]() |
[88] |
Hertenstein MJ, Keltner D, App B, et al. (2006) Touch communicates distinct emotions. Emotion 6: 528-533. https://doi.org/10.1037/1528-3542.6.3.528 ![]() |
[89] | Osborne N, Fresquez C, Malloch S, et al. (2017) Rhythms of relating in children's therapies: Connecting creatively with vulnerable children. London: Jessica Kingsley Publishers. |
[90] |
Midgley N (2004) Sailing between Scylla and Charybdis: Incorporating qualitative approaches into child psychotherapy research. J Child Psychother 30: 89-111. https://doi.org/10.1080/0075417042000205814 ![]() |
[91] | Midgley N (2009) Research in child and adolescent psychotherapy: an overview. The Handbook of Child and AdolescentPpsychotherapy. Psychoanalytic Approaches . New York: Routledge 73-97. https://doi.org/10.4324/9780203877616 |
[92] |
Kazdin AE (1996) Combined and multimodal treatments in child and adolescent psychotherapy: Issues, challenges, and research directions. Clini Psychol Sci Pract 3: 69-100. https://doi.org/10.1111/j.1468-2850.1996.tb00059.x ![]() |
[93] |
Weisz JR, Kuppens S, Ng MY, et al. (2017) What five decades of research tells us about the effects of youth psychological therapy: a multilevel meta-analysis and implications for science and practice. Am Psychol 72: 79-117. https://doi.org/10.1037/a0040360 ![]() |
[94] |
Laska KM, Gurman AS, Wampold BE (2014) Expanding the lens of evidence-based practice in psychotherapy: a common factors perspective. Psychotherapy 51: 467. https://doi.org/10.1037/a0034332 ![]() |
[95] |
Lambert MJ, Ogles BM (2014) Common factors: Post hoc explanation or empirically based therapy approach?. Psychotherapy 51: 500-504. https://doi.org/10.1037/a0036580 ![]() |
[96] |
Barron IG, Bourgaize C, Lempertz D, et al. (2019) Eye movement desensitization reprocessing for children and adolescents with posttraumatic stress disorder: A systematic narrative review. J EMDR Pract Res 13: 270-283. https://doi.org/10.1891/1933-3196.13.4.270 ![]() |
[97] |
Moreno-Alcazar A, Treen D, Valiente-Gomez A, et al. (2017) Efficacy of eye movement desensitisation and reprocessing in children and adolescents with post-traumatic stress disorder: A meta-analysis of randomised controlled trials. Front Psychol 8: 1750. https://doi.org/10.3389/fpsyg.2017.01750 ![]() |
[98] |
de Jongh A, Amann BL, Hofmann A, et al. (2019) The status of EMDR therapy in the treatment of posttraumatic stress disorder 30 years after its introduction. J EMDR Pract Res 13: 261-269. https://doi.org/10.1891/1933-3196.13.4.261 ![]() |
[99] |
Rodenburg R, Benjamin A, de Roos C, et al. (2009) Efficacy of EMDR in children: A meta-analysis. Clin Psychol Rev 29: 599-606. https://doi.org/10.1016/j.cpr.2009.06.008 ![]() |
[100] |
Beer R (2018) Efficacy of EMDR therapy for children with PTSD: A review of the literature. J EMDR Pract Res 12: 177-195. https://doi.org/10.1891/1933-3196.12.4.177 ![]() |
[101] | NICE GuidanceDepression in children and young people: identification and management (2019). Available from: https://www.nice.org.uk/guidance/ng134/chapter/Recommendations |
[102] |
Oud M, De Winter L, Vermeulen-Smi E, et al. (2019) Effectiveness of CBT for children and adolescents with depression: A systematic review and meta-regression analysis. Eur Psychiat 57: 33-45. https://doi.org/10.1016/j.eurpsy.2018.12.008 ![]() |
[103] | Hetrick SE, Cox GR, Witt KG, et al. (2016) Cognitive behavioural therapy (CBT), third-wave CBT and interpersonal therapy (IPT) based interventions for preventing depression in children and adolescents. Cochrane Database Syst Rev 8. https://doi.org/10.1002/14651858.CD003380.pub4 |
[104] |
Klein JB, Jacobs RH, Reinecke MA (2007) Cognitive-behavioral therapy for adolescent depression: a meta-analytic investigation of changes in effect-size estimates. J Am Acad Child Adolesc Psychiatr 46: 1403-1413. https://doi.org/10.1097/chi.0b013e3180592aaa ![]() |
[105] |
Yang L, Zhou X, Zhou C, et al. (2017) Efficacy and acceptability of cognitive behavioral therapy for depression in children: A systematic review and meta-analysis. Acad Pediatr 17: 9-16. https://doi.org/10.1016/j.acap.2016.08.002 ![]() |
1. | Zhengtao Luo, Liyi Yu, Zhaochun Xu, Kening Liu, Lichuan Gu, Comprehensive Review and Assessment of Computational Methods for Prediction of N6-Methyladenosine Sites, 2024, 13, 2079-7737, 777, 10.3390/biology13100777 | |
2. | Muhammad Nabeel Asim, Muhammad Ali Ibrahim, Tayyaba Asif, Andreas Dengel, RNA Sequence Analysis Landscape: A Comprehensive Review of Task Types, Databases, Datasets, Word Embedding Methods, and Language Models, 2025, 24058440, e41488, 10.1016/j.heliyon.2024.e41488 | |
3. | Shuyan Cheng, Yishu Wei, Yiliang Zhou, Zihan Xu, Drew N Wright, Jinze Liu, Yifan Peng, Deciphering genomic codes using advanced natural language processing techniques: a scoping review, 2025, 1067-5027, 10.1093/jamia/ocaf029 |
Site | Species | Stage | Datasets | Window size | Number of positive | Number of negative |
m6A | Saccharomyces cerevisiae | the source domain-stage | AAC_BM | 601 | 10,985 | 10,985 |
AAC_IND | 601 | 2747 | 2747 | |||
GAC_BM | 601 | 8749 | 8749 | |||
GAC_IND | 601 | 2188 | 2188 | |||
the target domain-stage | GAC_hr_BM | 601 | 3751 | 3751 | ||
GAC_hr_IND | 601 | 938 | 938 | |||
m1A | Homo sapiens | the source domain-stage | BM | 101 | 1024 | 1024 |
IND | 101 | 256 | 256 | |||
the target domain-stage | hr_BM | 101 | 593 | 593 | ||
hr_IND | 101 | 114 | 114 | |||
BM benchmark; IND independent |
Datasets | Classifiers | AUROC | ACC (%) | Sen (%) | Pre (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC | PCC | Time (s) |
AAC_BM | CNN (λ = 1) | 0.8507 | 77.64 | 78.58 | 77.13 | 55.40 | 76.69 | 77.85 | 0.8382 | 0.5844 | 1 |
CNN+BiL (λ = 1) | 0.8767 | 79.97 | 82.61 | 78.47 | 60.10 | 77.33 | 80.49 | 0.8607 | 0.6137 | 136 | |
CNN+TF (λ = 1) | 0.8700 | 79.01 | 84.65 | 76.08 | 58.54 | 73.38 | 80.13 | 0.8535 | 0.5986 | 4 | |
CNN+MF (λ = 1) | 0.8768 | 79.75 | 83.99 | 77.43 | 59.77 | 75.51 | 80.58 | 0.8627 | 0.6040 | 4 | |
CNN+MM (λ = 0.6) | 0.8793 | 79.57 | 80.94 | 79.06 | 59.98 | 78.56 | 79.99 | 0.8636 | 0.6140 | 25 | |
GAC_BM | CNN (λ = 0.4) | 0.8506 | 76.95 | 77.24 | 76.80 | 54.66 | 76.66 | 77.02 | 0.8394 | 0.5923 | 1 |
CNN+BiL (λ = 1) | 0.8742 | 79.61 | 85.46 | 76.51 | 59.71 | 73.75 | 80.74 | 0.8595 | 0.6199 | 108 | |
CNN+TF (λ = 0.4) | 0.8713 | 79.46 | 80.94 | 78.61 | 59.04 | 77.97 | 79.76 | 0.8566 | 0.6005 | 4 | |
CNN+MF (λ = 1) | 0.8745 | 79.61 | 81.19 | 78.70 | 59.35 | 78.03 | 79.93 | 0.8600 | 0.6112 | 4 | |
CNN+MM (λ = 0.6) | 0.8772 | 79.06 | 78.69 | 79.28 | 58.48 | 79.43 | 79.80 | 0.8635 | 0.6156 | 20 | |
Times: Running time per epoch for model training |
Datasets | Task type | Classifiers | AUROC | ACC (%) | Sen (%) | Pre (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC | PCC |
AAC_IND | Single-task | CNN | 0.8579 | 78.63 | 81.03 | 77.32 | 57.32 | 76.23 | 79.13 | 0.8510 | |
CNN+BiL | 0.8829 | 80.04 | 79.90 | 80.13 | 60.09 | 80.19 | 80.01 | 0.8737 | |||
CNN+TF | 0.8736 | 78.54 | 76.55 | 79.71 | 57.12 | 80.52 | 78.10 | 0.8642 | |||
CNN+MF | 0.8763 | 78.97 | 77.72 | 79.72 | 57.96 | 80.23 | 78.71 | 0.8657 | |||
CNN+MM | 0.8820 | 80.06 | 83.10 | 78.34 | 60.23 | 77.03 | 80.65 | 0.8724 | |||
Multi-task | CNN (λ = 1) | 0.8581 | 78.55 | 81.1 | 77.17 | 57.18 | 76.01 | 79.09 | 0.8516 | 0.5946 | |
CNN+BiL (λ = 1) | 0.8801 | 79.75 | 79.1 | 80.15 | 59.51 | 80.41 | 79.62 | 0.8682 | 0.6042 | ||
CNN+TF (λ = 1) | 0.8768 | 80.44 | 83.1 | 78.91 | 60.97 | 77.79 | 80.95 | 0.8666 | 0.613 | ||
CNN+MF (λ = 1) | 0.8816 | 80.12 | 89.35 | 75.42 | 61.29 | 70.88 | 81.80 | 0.8697 | 0.6053 | ||
CNN+MM (λ = 0.6) | 0.8888 | 80.97 | 83.72 | 79.36 | 62.03 | 78.23 | 81.48 | 0.8775 | 0.6219 | ||
GAC_IND | Single-task | CNN | 0.8553 | 77.92 | 84.78 | 74.55 | 56.37 | 71.06 | 79.33 | 0.8500 | |
CNN+BiL | 0.8767 | 79.97 | 82.45 | 78.55 | 60.01 | 77.48 | 80.45 | 0.8667 | |||
CNN+TF | 0.8699 | 77.26 | 92.94 | 70.75 | 57.41 | 61.58 | 80.34 | 0.8589 | |||
CNN+MF | 0.8635 | 73.29 | 95.67 | 66.09 | 52.09 | 50.91 | 78.18 | 0.8509 | |||
CNN+MM | 0.8761 | 80.20 | 84.23 | 77.94 | 60.59 | 76.16 | 80.96 | 0.8660 | |||
Multi-task | CNN (λ = 0.4) | 0.8560 | 78.28 | 83.45 | 75.63 | 56.87 | 73.11 | 79.35 | 0.8507 | 0.5935 | |
CNN+BiL (λ = 1) | 0.8729 | 78.87 | 79.63 | 78.45 | 57.75 | 78.12 | 79.03 | 0.8665 | 0.6124 | ||
CNN+TF (λ = 0.4) | 0.8734 | 79.58 | 86.69 | 75.90 | 59.77 | 72.47 | 80.94 | 0.8622 | 0.6029 | ||
CNN+MF (λ = 1) | 0.8795 | 80.20 | 83.59 | 78.28 | 60.53 | 76.80 | 80.85 | 0.8698 | 0.6202 | ||
CNN+MM (λ = 0.6) | 0.8880 | 81.11 | 83.23 | 79.84 | 62.27 | 78.99 | 81.50 | 0.8783 | 0.6343 |
Classifiers | AUROC | ACC (%) | Sen (%) | Pre (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC |
MTTLm6A direct | 0.6750 | 62.33 | 61.69 | 62.49 | 24.81 | 62.97 | 62.09 | 0.6576 |
MTTLm6A single | 0.6957 | 63.74 | 60.28 | 64.77 | 27.69 | 67.21 | 62.44 | 0.6871 |
MTTLm6A | 0.7043 | 64.40 | 65.53 | 64.08 | 28.94 | 63.26 | 64.80 | 0.6938 |
MTTLm6Adirect refers to the model trained directly without transfer learning; MTTLm6Asingle indicates the model trained indirectly through single-task transfer learning. |
Classifiers | AUROC | ACC (%) | Sen (%) | Precision (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC |
m6A-word2vec | 0.6008 | 57.78 | 58.96 | 57.60 | 15.57 | 48.83 | 58.27 | 0.586 |
MultiRM | 0.6947 | 63.75 | 69.08 | 62.43 | 27.66 | 44.67 | 65.59 | 0.6922 |
MTTLm6Asingle | 0.7599 | 69.10 | 67.56 | 69.71 | 38.23 | 70.65 | 68.62 | 0.7709 |
MTDeepM6A-2S | 0.7676 | 70.44 | 66.81 | 72.04 | 40.98 | 74.07 | 69.32 | 0.7688 |
MTTLm6A | 0.7713 | 69.85 | 74.81 | 68.06 | 39.90 | 64.89 | 71.28 | 0.7746 |
Modification type | Classifiers | Classifiers | ||||
m6A-word2vec | MultiRM | MTTLm6Asingl | MTDeepM6A-2S | MTTLm6A | ||
m6A | m6A-word2vec | |||||
MultiRM | 0 | |||||
MTTLm6Asingle | 0 | 0 | ||||
MTDeepM6A-2S | 0 | 0 | 0 | |||
MTTLm6A | 0 | 0 | 0 | 0 |
Classifiers | AUROC | ACC (%) | Sen (%) | Precision (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC |
m6A-word2vec | 0.9095 | 91.67 | 100.00 | 85.71 | 84.52 | 83.33 | 92.31 | 0.8722 |
MultiRM | 0.9126 | 91.23 | 99.12 | 85.61 | 83.50 | 83.33 | 91.87 | 0.8818 |
MTTLm6Asingle | 0.9143 | 90.79 | 98.25 | 85.50 | 82.50 | 83.33 | 91.43 | 0.8841 |
MTDeepM6A-2S | 0.894 | 91.23 | 99.12 | 85.61 | 83.50 | 83.33 | 91.87 | 0.8509 |
MTTLm6A | 0.929 | 91.23 | 99.12 | 85.61 | 83.50 | 83.33 | 91.87 | 0.8904 |
Datasets | Weight ratio | AUROC | ACC (%) | Sen (%) | Pre (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC | PCC |
AAC_BM | 1:0.1 | 0.8753 | 79.3 | 80.59 | 78.56 | 59.05 | 78.01 | 79.56 | 0.8579 | 0.5894 |
1:0.2 | 0.8745 | 77.95 | 74.85 | 79.80 | 56.9 | 81.05 | 77.25 | 0.8586 | 0.6019 | |
1:0.3 | 0.8788 | 78.74 | 77.67 | 79.36 | 58.38 | 79.81 | 78.51 | 0.8615 | 0.6093 | |
1:0.4 | 0.8734 | 76.56 | 70.76 | 80.05 | 54.77 | 82.36 | 75.12 | 0.8574 | 0.6126 | |
1:0.5 | 0.8745 | 78.16 | 77.85 | 78.34 | 57.62 | 78.48 | 78.10 | 0.8590 | 0.6078 | |
1:0.6 | 0.8793 | 79.57 | 80.94 | 79.06 | 59.98 | 78.56 | 79.99 | 0.8636 | 0.614 | |
1:0.7 | 0.8754 | 77.6 | 73.8 | 79.86 | 56.65 | 81.39 | 76.71 | 0.8587 | 0.6108 | |
1:0.8 | 0.8753 | 78.04 | 76.7 | 78.81 | 56.98 | 79.38 | 77.74 | 0.8592 | 0.6218 | |
1:0.9 | 0.8749 | 77.85 | 75.95 | 78.94 | 56.95 | 79.74 | 77.42 | 0.8584 | 0.6124 | |
1:1.0 | 0.8756 | 78.66 | 77.96 | 79.06 | 58.11 | 79.36 | 78.51 | 0.8595 | 0.5996 | |
GAC_BM | 1:0.1 | 0.8746 | 79.62 | 81.97 | 78.30 | 59.59 | 77.28 | 80.09 | 0.8584 | 0.6011 |
1:0.2 | 0.8771 | 79.65 | 80.76 | 78.88 | 59.54 | 77.54 | 80.29 | 0.8612 | 0.6033 | |
1:0.3 | 0.8761 | 79.58 | 80.27 | 79.18 | 59.56 | 78.9 | 79.72 | 0.8627 | 0.6095 | |
1:0.4 | 0.8764 | 79.21 | 83.78 | 76.77 | 59.31 | 74.65 | 80.12 | 0.8634 | 0.6144 | |
1:0.5 | 0.8746 | 78.62 | 82.57 | 76.52 | 58.25 | 74.67 | 79.43 | 0.8605 | 0.6136 | |
1:0.6 | 0.8772 | 79.06 | 78.69 | 79.28 | 58.48 | 79.43 | 79.80 | 0.8635 | 0.6156 | |
1:0.7 | 0.8733 | 78.8 | 77.89 | 79.33 | 57.94 | 79.71 | 78.61 | 0.8584 | 0.6187 | |
1:0.8 | 0.8753 | 79.1 | 81.77 | 77.63 | 58.93 | 76.44 | 79.65 | 0.8589 | 0.6150 | |
1:0.9 | 0.8766 | 79.60 | 80.99 | 78.8 | 59.39 | 78.21 | 79.88 | 0.8613 | 0.6218 | |
1:1.0 | 0.8756 | 78.66 | 77.96 | 79.06 | 58.11 | 79.36 | 78.51 | 0.8595 | 0.6220 |
Datasets | task (Weight-ratio) | AUROC | ACC (%) | Sen (%) | Pre (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC |
AAC_BM | single-task | 0.8774 | 78 | 74.71 | 79.97 | 57.33 | 81.29 | 77.25 | 0.8613 |
multi-task (1:0.6) | 0.8794 | 79.57 | 80.94 | 79.06 | 59.98 | 78.56 | 79.99 | 0.8636 | |
GAC_BM | single-task | 0.8769 | 79.76 | 79.61 | 79.85 | 59.7 | 79.91 | 79.73 | 0.8613 |
multi-task (1:0.6) | 0.8772 | 79.06 | 78.69 | 79.28 | 58.48 | 79.43 | 79.80 | 0.8635 |
Datasets | PCC for multi-task regression models | PCC for single-task regression models |
AAC_BM | 0.614 | 0.6042 |
GAC_BM | 0.6156 | 0.6 |
Site | Species | Stage | Datasets | Window size | Number of positive | Number of negative |
m6A | Saccharomyces cerevisiae | the source domain-stage | AAC_BM | 601 | 10,985 | 10,985 |
AAC_IND | 601 | 2747 | 2747 | |||
GAC_BM | 601 | 8749 | 8749 | |||
GAC_IND | 601 | 2188 | 2188 | |||
the target domain-stage | GAC_hr_BM | 601 | 3751 | 3751 | ||
GAC_hr_IND | 601 | 938 | 938 | |||
m1A | Homo sapiens | the source domain-stage | BM | 101 | 1024 | 1024 |
IND | 101 | 256 | 256 | |||
the target domain-stage | hr_BM | 101 | 593 | 593 | ||
hr_IND | 101 | 114 | 114 | |||
BM benchmark; IND independent |
Datasets | Classifiers | AUROC | ACC (%) | Sen (%) | Pre (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC | PCC | Time (s) |
AAC_BM | CNN (λ = 1) | 0.8507 | 77.64 | 78.58 | 77.13 | 55.40 | 76.69 | 77.85 | 0.8382 | 0.5844 | 1 |
CNN+BiL (λ = 1) | 0.8767 | 79.97 | 82.61 | 78.47 | 60.10 | 77.33 | 80.49 | 0.8607 | 0.6137 | 136 | |
CNN+TF (λ = 1) | 0.8700 | 79.01 | 84.65 | 76.08 | 58.54 | 73.38 | 80.13 | 0.8535 | 0.5986 | 4 | |
CNN+MF (λ = 1) | 0.8768 | 79.75 | 83.99 | 77.43 | 59.77 | 75.51 | 80.58 | 0.8627 | 0.6040 | 4 | |
CNN+MM (λ = 0.6) | 0.8793 | 79.57 | 80.94 | 79.06 | 59.98 | 78.56 | 79.99 | 0.8636 | 0.6140 | 25 | |
GAC_BM | CNN (λ = 0.4) | 0.8506 | 76.95 | 77.24 | 76.80 | 54.66 | 76.66 | 77.02 | 0.8394 | 0.5923 | 1 |
CNN+BiL (λ = 1) | 0.8742 | 79.61 | 85.46 | 76.51 | 59.71 | 73.75 | 80.74 | 0.8595 | 0.6199 | 108 | |
CNN+TF (λ = 0.4) | 0.8713 | 79.46 | 80.94 | 78.61 | 59.04 | 77.97 | 79.76 | 0.8566 | 0.6005 | 4 | |
CNN+MF (λ = 1) | 0.8745 | 79.61 | 81.19 | 78.70 | 59.35 | 78.03 | 79.93 | 0.8600 | 0.6112 | 4 | |
CNN+MM (λ = 0.6) | 0.8772 | 79.06 | 78.69 | 79.28 | 58.48 | 79.43 | 79.80 | 0.8635 | 0.6156 | 20 | |
Times: Running time per epoch for model training |
Datasets | Task type | Classifiers | AUROC | ACC (%) | Sen (%) | Pre (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC | PCC |
AAC_IND | Single-task | CNN | 0.8579 | 78.63 | 81.03 | 77.32 | 57.32 | 76.23 | 79.13 | 0.8510 | |
CNN+BiL | 0.8829 | 80.04 | 79.90 | 80.13 | 60.09 | 80.19 | 80.01 | 0.8737 | |||
CNN+TF | 0.8736 | 78.54 | 76.55 | 79.71 | 57.12 | 80.52 | 78.10 | 0.8642 | |||
CNN+MF | 0.8763 | 78.97 | 77.72 | 79.72 | 57.96 | 80.23 | 78.71 | 0.8657 | |||
CNN+MM | 0.8820 | 80.06 | 83.10 | 78.34 | 60.23 | 77.03 | 80.65 | 0.8724 | |||
Multi-task | CNN (λ = 1) | 0.8581 | 78.55 | 81.1 | 77.17 | 57.18 | 76.01 | 79.09 | 0.8516 | 0.5946 | |
CNN+BiL (λ = 1) | 0.8801 | 79.75 | 79.1 | 80.15 | 59.51 | 80.41 | 79.62 | 0.8682 | 0.6042 | ||
CNN+TF (λ = 1) | 0.8768 | 80.44 | 83.1 | 78.91 | 60.97 | 77.79 | 80.95 | 0.8666 | 0.613 | ||
CNN+MF (λ = 1) | 0.8816 | 80.12 | 89.35 | 75.42 | 61.29 | 70.88 | 81.80 | 0.8697 | 0.6053 | ||
CNN+MM (λ = 0.6) | 0.8888 | 80.97 | 83.72 | 79.36 | 62.03 | 78.23 | 81.48 | 0.8775 | 0.6219 | ||
GAC_IND | Single-task | CNN | 0.8553 | 77.92 | 84.78 | 74.55 | 56.37 | 71.06 | 79.33 | 0.8500 | |
CNN+BiL | 0.8767 | 79.97 | 82.45 | 78.55 | 60.01 | 77.48 | 80.45 | 0.8667 | |||
CNN+TF | 0.8699 | 77.26 | 92.94 | 70.75 | 57.41 | 61.58 | 80.34 | 0.8589 | |||
CNN+MF | 0.8635 | 73.29 | 95.67 | 66.09 | 52.09 | 50.91 | 78.18 | 0.8509 | |||
CNN+MM | 0.8761 | 80.20 | 84.23 | 77.94 | 60.59 | 76.16 | 80.96 | 0.8660 | |||
Multi-task | CNN (λ = 0.4) | 0.8560 | 78.28 | 83.45 | 75.63 | 56.87 | 73.11 | 79.35 | 0.8507 | 0.5935 | |
CNN+BiL (λ = 1) | 0.8729 | 78.87 | 79.63 | 78.45 | 57.75 | 78.12 | 79.03 | 0.8665 | 0.6124 | ||
CNN+TF (λ = 0.4) | 0.8734 | 79.58 | 86.69 | 75.90 | 59.77 | 72.47 | 80.94 | 0.8622 | 0.6029 | ||
CNN+MF (λ = 1) | 0.8795 | 80.20 | 83.59 | 78.28 | 60.53 | 76.80 | 80.85 | 0.8698 | 0.6202 | ||
CNN+MM (λ = 0.6) | 0.8880 | 81.11 | 83.23 | 79.84 | 62.27 | 78.99 | 81.50 | 0.8783 | 0.6343 |
Classifiers | AUROC | ACC (%) | Sen (%) | Pre (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC |
MTTLm6A direct | 0.6750 | 62.33 | 61.69 | 62.49 | 24.81 | 62.97 | 62.09 | 0.6576 |
MTTLm6A single | 0.6957 | 63.74 | 60.28 | 64.77 | 27.69 | 67.21 | 62.44 | 0.6871 |
MTTLm6A | 0.7043 | 64.40 | 65.53 | 64.08 | 28.94 | 63.26 | 64.80 | 0.6938 |
MTTLm6Adirect refers to the model trained directly without transfer learning; MTTLm6Asingle indicates the model trained indirectly through single-task transfer learning. |
Classifiers | AUROC | ACC (%) | Sen (%) | Precision (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC |
m6A-word2vec | 0.6008 | 57.78 | 58.96 | 57.60 | 15.57 | 48.83 | 58.27 | 0.586 |
MultiRM | 0.6947 | 63.75 | 69.08 | 62.43 | 27.66 | 44.67 | 65.59 | 0.6922 |
MTTLm6Asingle | 0.7599 | 69.10 | 67.56 | 69.71 | 38.23 | 70.65 | 68.62 | 0.7709 |
MTDeepM6A-2S | 0.7676 | 70.44 | 66.81 | 72.04 | 40.98 | 74.07 | 69.32 | 0.7688 |
MTTLm6A | 0.7713 | 69.85 | 74.81 | 68.06 | 39.90 | 64.89 | 71.28 | 0.7746 |
Modification type | Classifiers | Classifiers | ||||
m6A-word2vec | MultiRM | MTTLm6Asingl | MTDeepM6A-2S | MTTLm6A | ||
m6A | m6A-word2vec | |||||
MultiRM | 0 | |||||
MTTLm6Asingle | 0 | 0 | ||||
MTDeepM6A-2S | 0 | 0 | 0 | |||
MTTLm6A | 0 | 0 | 0 | 0 |
Classifiers | AUROC | ACC (%) | Sen (%) | Precision (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC |
m6A-word2vec | 0.9095 | 91.67 | 100.00 | 85.71 | 84.52 | 83.33 | 92.31 | 0.8722 |
MultiRM | 0.9126 | 91.23 | 99.12 | 85.61 | 83.50 | 83.33 | 91.87 | 0.8818 |
MTTLm6Asingle | 0.9143 | 90.79 | 98.25 | 85.50 | 82.50 | 83.33 | 91.43 | 0.8841 |
MTDeepM6A-2S | 0.894 | 91.23 | 99.12 | 85.61 | 83.50 | 83.33 | 91.87 | 0.8509 |
MTTLm6A | 0.929 | 91.23 | 99.12 | 85.61 | 83.50 | 83.33 | 91.87 | 0.8904 |
Datasets | Weight ratio | AUROC | ACC (%) | Sen (%) | Pre (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC | PCC |
AAC_BM | 1:0.1 | 0.8753 | 79.3 | 80.59 | 78.56 | 59.05 | 78.01 | 79.56 | 0.8579 | 0.5894 |
1:0.2 | 0.8745 | 77.95 | 74.85 | 79.80 | 56.9 | 81.05 | 77.25 | 0.8586 | 0.6019 | |
1:0.3 | 0.8788 | 78.74 | 77.67 | 79.36 | 58.38 | 79.81 | 78.51 | 0.8615 | 0.6093 | |
1:0.4 | 0.8734 | 76.56 | 70.76 | 80.05 | 54.77 | 82.36 | 75.12 | 0.8574 | 0.6126 | |
1:0.5 | 0.8745 | 78.16 | 77.85 | 78.34 | 57.62 | 78.48 | 78.10 | 0.8590 | 0.6078 | |
1:0.6 | 0.8793 | 79.57 | 80.94 | 79.06 | 59.98 | 78.56 | 79.99 | 0.8636 | 0.614 | |
1:0.7 | 0.8754 | 77.6 | 73.8 | 79.86 | 56.65 | 81.39 | 76.71 | 0.8587 | 0.6108 | |
1:0.8 | 0.8753 | 78.04 | 76.7 | 78.81 | 56.98 | 79.38 | 77.74 | 0.8592 | 0.6218 | |
1:0.9 | 0.8749 | 77.85 | 75.95 | 78.94 | 56.95 | 79.74 | 77.42 | 0.8584 | 0.6124 | |
1:1.0 | 0.8756 | 78.66 | 77.96 | 79.06 | 58.11 | 79.36 | 78.51 | 0.8595 | 0.5996 | |
GAC_BM | 1:0.1 | 0.8746 | 79.62 | 81.97 | 78.30 | 59.59 | 77.28 | 80.09 | 0.8584 | 0.6011 |
1:0.2 | 0.8771 | 79.65 | 80.76 | 78.88 | 59.54 | 77.54 | 80.29 | 0.8612 | 0.6033 | |
1:0.3 | 0.8761 | 79.58 | 80.27 | 79.18 | 59.56 | 78.9 | 79.72 | 0.8627 | 0.6095 | |
1:0.4 | 0.8764 | 79.21 | 83.78 | 76.77 | 59.31 | 74.65 | 80.12 | 0.8634 | 0.6144 | |
1:0.5 | 0.8746 | 78.62 | 82.57 | 76.52 | 58.25 | 74.67 | 79.43 | 0.8605 | 0.6136 | |
1:0.6 | 0.8772 | 79.06 | 78.69 | 79.28 | 58.48 | 79.43 | 79.80 | 0.8635 | 0.6156 | |
1:0.7 | 0.8733 | 78.8 | 77.89 | 79.33 | 57.94 | 79.71 | 78.61 | 0.8584 | 0.6187 | |
1:0.8 | 0.8753 | 79.1 | 81.77 | 77.63 | 58.93 | 76.44 | 79.65 | 0.8589 | 0.6150 | |
1:0.9 | 0.8766 | 79.60 | 80.99 | 78.8 | 59.39 | 78.21 | 79.88 | 0.8613 | 0.6218 | |
1:1.0 | 0.8756 | 78.66 | 77.96 | 79.06 | 58.11 | 79.36 | 78.51 | 0.8595 | 0.6220 |
Datasets | task (Weight-ratio) | AUROC | ACC (%) | Sen (%) | Pre (%) | MCC (%) | Spe (%) | F-1 (%) | AUPRC |
AAC_BM | single-task | 0.8774 | 78 | 74.71 | 79.97 | 57.33 | 81.29 | 77.25 | 0.8613 |
multi-task (1:0.6) | 0.8794 | 79.57 | 80.94 | 79.06 | 59.98 | 78.56 | 79.99 | 0.8636 | |
GAC_BM | single-task | 0.8769 | 79.76 | 79.61 | 79.85 | 59.7 | 79.91 | 79.73 | 0.8613 |
multi-task (1:0.6) | 0.8772 | 79.06 | 78.69 | 79.28 | 58.48 | 79.43 | 79.80 | 0.8635 |
Datasets | PCC for multi-task regression models | PCC for single-task regression models |
AAC_BM | 0.614 | 0.6042 |
GAC_BM | 0.6156 | 0.6 |