Integrative child psychotherapy: discussion of a common core and unified theory approach

Tracey Cockerton Tattersall; Nadja Rolli; Martin Butwell; Tracey Cockerton Tattersall; Nadja Rolli; Martin Butwell

doi:10.3934/medsci.2024015

AIMS Medical Science

2024, Volume 11, Issue 2: 181-209. doi: 10.3934/medsci.2024015

Previous Article Next Article

Perspective

Integrative child psychotherapy: discussion of a common core and unified theory approach

1.
Ravensbourne University London, London, United Kingdom
2.
Terapia, Middlesex University London, London, United Kingdom

Received: 10 March 2024 Revised: 21 May 2024 Accepted: 14 June 2024 Published: 26 June 2024

This paper explored significant advancements in integrative child psychotherapy in the UK, aiming to establish a common core and unified theory. Informed by infant-parent observations, attachment theory, neuroscience, and socio-cognitive developmental psychology research, the findings integrated clinical approaches from a developmental and family systems perspective. The objective was to provide a framework based on common factors and a unified psychogenesis theory, emphasising a therapeutic action model and understanding child development. The escalating prevalence and severity of mental health issues among children and young people (CYP) are highlighted, with factors like the COVID-19 pandemic, educational disruptions, increased digital reliance, and the current cost of living crisis contributing to the surge. The impact of these changes necessitates a holistic approach to mental health care, specifically by specialists in integrative child psychotherapy. Current data underestimates CYP mental health needs due to the absence of a national approach to data collection and analysis. Moreover, there is a lack of consensus on the assessment and case formulation in CYP mental health treatment. The diversity in practitioners' approaches, training, and understanding of child development, evidence-based practices, and CYP mental health support is noted. Critiquing the limitations of evidence-based practices, the paper argues for a systematic assessment and case formulation framework. It advocates for an evidence base that acknowledges the individuality of CYP, emphasising psychotherapy's dynamic, relational foundation. The proposed framework seeks to inform training and practice requirements, challenging the conventional mechanistic understanding of mental health treatment and promoting a more integrative and client-centred approach.

Keywords:

Citation: Tracey Cockerton Tattersall, Nadja Rolli, Martin Butwell. Integrative child psychotherapy: discussion of a common core and unified theory approach[J]. AIMS Medical Science, 2024, 11(2): 181-209. doi: 10.3934/medsci.2024015

Related Papers:

[1]	Pingping Sun, Yongbing Chen, Bo Liu, Yanxin Gao, Ye Han, Fei He, Jinchao Ji . DeepMRMP: A new predictor for multiple types of RNA modification sites using deep learning. Mathematical Biosciences and Engineering, 2019, 16(6): 6231-6241. doi: 10.3934/mbe.2019310
[2]	Ruiping Yuan, Jiangtao Dou, Juntao Li, Wei Wang, Yingfan Jiang . Multi-robot task allocation in e-commerce RMFS based on deep reinforcement learning. Mathematical Biosciences and Engineering, 2023, 20(2): 1903-1918. doi: 10.3934/mbe.2023087
[3]	Quan Zhu, Xiaoyin Wang, Xuan Liu, Wanru Du, Xingxing Ding . Multi-task learning for aspect level semantic classification combining complex aspect target semantic enhancement and adaptive local focus. Mathematical Biosciences and Engineering, 2023, 20(10): 18566-18591. doi: 10.3934/mbe.2023824
[4]	Xiao Chu, Weiqing Wang, Zhaoyun Sun, Feichao Bao, Liang Feng . An N⁶-methyladenosine and target genes-based study on subtypes and prognosis of lung adenocarcinoma. Mathematical Biosciences and Engineering, 2022, 19(1): 253-270. doi: 10.3934/mbe.2022013
[5]	Sungwon Kim, Meysam Alizamir, Youngmin Seo, Salim Heddam, Il-Moon Chung, Young-Oh Kim, Ozgur Kisi, Vijay P. Singh . Estimating the incubated river water quality indicator based on machine learning and deep learning paradigms: BOD₅ Prediction. Mathematical Biosciences and Engineering, 2022, 19(12): 12744-12773. doi: 10.3934/mbe.2022595
[6]	Jianhua Jia, Lulu Qin, Rufeng Lei . DGA-5mC: A 5-methylcytosine site prediction model based on an improved DenseNet and bidirectional GRU method. Mathematical Biosciences and Engineering, 2023, 20(6): 9759-9780. doi: 10.3934/mbe.2023428
[7]	Huili Yang, Wangren Qiu, Zi Liu . Anoikis-related mRNA-lncRNA and DNA methylation profiles for overall survival prediction in breast cancer patients. Mathematical Biosciences and Engineering, 2024, 21(1): 1590-1609. doi: 10.3934/mbe.2024069
[8]	Mingshuai Chen, Xin Zhang, Ying Ju, Qing Liu, Yijie Ding . iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM. Mathematical Biosciences and Engineering, 2022, 19(12): 13829-13850. doi: 10.3934/mbe.2022644
[9]	Xuwen Wang, Yu Zhang, Zhen Guo, Jiao Li . Identifying concepts from medical images via transfer learning and image retrieval. Mathematical Biosciences and Engineering, 2019, 16(4): 1978-1991. doi: 10.3934/mbe.2019097
[10]	Long Wen, Liang Gao, Yan Dong, Zheng Zhu . A negative correlation ensemble transfer learning method for fault diagnosis based on convolutional neural network. Mathematical Biosciences and Engineering, 2019, 16(5): 3311-3330. doi: 10.3934/mbe.2019165

Abstract

1. Introduction

It has been demonstrated that more than 170 types of RNA modifications are reported within a diverse set of RNAs ^[1], including m⁶A, adenosine to inosine (A-to-I) deamination, cytosine to uracil (C-to-U) deamination, N¹-methyladenosine (m¹A), 5-methylcytosine (m⁵C), pseudouridylation (Ψ), and ribose 2'O-methylation ^[2], etc. There is a growing list of RNA modifications found in both coding and non-coding RNAs, significantly influencing their biological functions ^[3]. These modifications frequently result in changes to RNA stability, folding, interactions, translation, localization, and subsequent processing, thereby impacting their biological function ^[4,5]. However, insights into the molecular machineries responsible for the deposition and removal, as well as recognition and interpretation, of these modifications within the cell are available for only a few modifications ^[6]. Even for those modifications for which writers, erasers, and readers have been identified ^[7], such as m⁶A ^[8], we have limited knowledge about their regulation, their cooperation or competition with other RNA modification and processing events, and how they become deregulated in disease ^[9]. Therefore, the accurate identification of m⁶A sites is a crucial step in understanding the mechanisms underlying these biological phenomena ^[9].

To date, several experimental methods have been developed to localize the m⁶A site. High-throughput sequencing technologies have been successfully applied to detect m⁶A sites in various species, including Saccharomyces cerevisiae ^[10], Homo sapiens ^[11,12], Arabidopsis thaliana ^[13], and mouse ^[14]. However, it is important to note that most high-throughput sequencing techniques cannot quickly obtain and precisely pinpoint the exact location of the m⁶A site ^[15]. The m⁶A motif 'RRACH' is often used to further narrow the location of the m⁶A site to a basic resolution within the peak detected with the m⁶A signal. Other experimental techniques, such as miCLIP-seq ^[16], can identify m⁶A sites at the single nucleotide resolution level. However, these methods rely on m⁶A-specific antibodies, exhibit poor reproducibility are long and involve complex procedures ^[17], making them unsuitable for large-scale genomic data analysis. Hence, there is a strong motivation to explore computational methods that can accurately and efficiently identify methylation sites.

Researchers have developed a variety of computational methods to predict RNA modification sites, which serve as invaluable complements to experimental approaches ^[18]. These methods approach RNA methylation identification as a binary prediction task ^[19], training machine learning models to differentiate between truly methylated and unmethylated sites. By leveraging these computational methods, we can quickly predict whether this sequence contains RNA methylation sites. The traditional computational method involves extracting a comprehensive set of hand-designed features from biological sequences ^[20]. These features are then fed into classical shallow classification algorithms ^[21], such as a support vector machine ^[22], often utilizing a linear kernel. However, the selection of these features is typically an empirical process, relying on trial and error ^[23]. Moreover, the feature selection itself is highly task-dependent, necessitating additional research for each new predictive task ^[24].

It is evident that analyzing biological sequences and interpreting the underlying biological information pose significant challenges in the realization of biological breakthrough discoveries. Recently, the application of natural language processing (NLP) in sequence analysis has garnered considerable attention within the realm of biological sequence processing ^[25]. This approach considers biological sequences as sentences ^[21,26], while k-mer subsequences are akin to words ^[25,27]; NLP has emerged as a valuable approach for unraveling the structure and function encoded within these sequences ^[28,29]. In contrast to traditional machine learning methods, deep learning (DL) techniques offer an end-to-end design ^[30]. The input sentence undergoes a series of feature extraction layers, with the deep layers of the network automatically learning features that are relevant to the task through backpropagation ^[31]. The journey from raw data to the ultimate output entails the extraction of features derived directly from the input data, honing in on the crucial aspects for the final identification or prediction tasks. This intricate process involves sifting through the raw information, meticulously selecting the most pertinent characteristics, and transforming them into more meaningful representations. These extracted features serve as the bedrock of the entire analytical process, empowering the algorithms and models to discern patterns and relationships, which is essential for achieving accurate and insightful conclusions. Through this transformative journey, the system gains the ability to decipher complex information and deliver informed decisions, ultimately driving successful outcomes ^[32]. For example, EDLm⁶Apred ^[32] applies bidirectional long short-term memory (BiLSTM) to predict m⁶A site through the use of word2vec ^[33], RNA word embedding ^[34] and one-hot ^[35] encoding. However, long short-term, BiLSTM and recurrent neural networks do not allow parallel computation ^[36], which results in long training times ^[37]. Convolutional neural network (CNN) has the ability to achieve parallel computation ^[38] and learn local dependencies ^[39]. For example, in the case of m⁶A-word2vec ^[40], a CNN is employed to identify m⁶A sites by extracting features based on word2vec. Similarly, Deeppromise ^[41] utilizes a CNN to identify m¹A and m⁶A sites, extracting features through integrated enhanced nucleic acid composition ^[42,43], one-hot encoding, and RNA word embedding. However, these CNN structures primarily focus on contextual relationships among neighboring bases ^[44], without taking into account the dependencies over long distances within the sequence ^[45]. To address this limitation, DeepM⁶ASeq ^[46] combines the strengths of CNNs and BiLSTM by incorporating two layers of a CNN and one layer of BiLSTM to predict m⁶A sites. While this approach can be effective, it may extract redundant features that can interfere with prediction performance ^[47]. To quantify the degree of word-to-word dependency, the attention mechanism comes into play. By applying the attention mechanism, it becomes possible to capture the specific words that significantly impact the classification results. MultiRM ^[48], on the other hand, employs a BiLSTM layer and a Bahdanau attention ^[49] layer to identify various types of RNA modification sites and extract features based on word2vec encoding. In this case, Bahdanau attention calculates the attention weights for two words in different sentences. However, since Google introduced the transformer model in 2018, self-attention has been recognized as a special case of the attention mechanism ^[50,51]. The Transformer model, based entirely on self-attention mechanisms, has become the most widely used architecture in NLP representation learning, as demonstrated by its adoption in various applications ^[52]. Among them, Plant⁶mA ^[53], a transformer encoder, can be employed to identify whether the input sequence contains an m⁶A site. However, in the context of the transformer model, positional encoding plays a vital role, as other key components of the model are completely invariant to the order of the sequence. The original transformer employs absolute positional encoding, assigning each position a unique embedding vector ^[50]. By adding the positional embedding to the word embedding, the model affords valuable insights into the contextual representations of words at different positions. In addition to absolute positional encoding, Shaw et al. ^[54] and Raffel et al. ^[55]. have introduced relative positional encoding. This innovative technique incorporates carefully designed bias terms within the self-attention module, enabling the encoding of the distance between any two positions. However, it has been demonstrated by Ke et al. ^[56] that the additional operation applied to positional encoding and word embeddings in absolute positional encoding can introduce mixed correlations and unnecessary randomness in the attention mechanism. This may limit the expressive power of the model, potentially impacting its performance. Furthermore, the feed-forward networks within the transformer structure struggle to effectively capture contextual information. Since position-wise feed-forward networks process each position independently, they lack the capacity to adequately capture global contextual information. Consequently, the model may face difficulties in accurately comprehending long-term dependencies or recognizing global patterns within the sequence.

As m⁶A is the most prevalent modification observed in mammals, numerous methods have been developed to predict m⁶A sites in Saccharomyces cerevisiae. However, these methods ^{[57,58,59,60,61]} have primarily relied on a small dataset consisting of only 1307 m⁶A sites, as derived from base-resolution sequencing. Unfortunately, the limited size of this dataset has hindered the full utilization of the advantages offered by DL methods ^[62]. However, RMBase ^[63,64] and m⁶A-Atlas ^[65] respectively document over 60,000 low-resolution and 10,000 high-resolution m⁶A sites in Saccharomyces cerevisiae. Regarding the relatively novel m¹A site prediction, it also encounters the above problems. Many methods for predicting m¹A are primarily based on a smaller dataset containing only 707 human m¹A sites with base-resolution sequencing. Correspondingly, RMBase has records of more than 2000 low-resolution human m¹A sites. Huang et al. ^[66] proposed WeakRM, the first weakly supervised learning framework for predicting RNA modifications from low-resolution epitranscriptome datasets, such as those generated from acRIP-seq and hMeRIP-seq. Astonishingly, these extensive datasets have not been fully leveraged for the development of computational methods in this context. In most scenarios, our primary concern revolves around achieving optimal performance on one task, which requires the training of a single model or an ensemble of models to perform our desired task, as well as the fine-tuning and optimization of models. Through this process, we continuously iterate and refine these models until they reach a point where their performance plateaus. While this approach often yields acceptable results, by focusing on a single task, it tends to overlook valuable information that could potentially enhance our desired metrics. Specifically, that information comes from the training signals derived from related tasks. By leveraging shared representations among these related tasks, we can empower our model to generalize better and improve its performance on the original task.

The corresponding the number of supporting experiments or studies (NSES) ^[63] information for the methylation sites in RMBase may be the key information mentioned above that is ignored. Intuitively, the larger the number of experimental identifications of a methylation site, the greater our confidence in considering it as a genuine methylation site. Currently, the exploration of multi-task prediction for methylation sites incorporating the NSES information is still in its early stages. An example of such an algorithm is MTDeepM6A-2S ^[67], which entails the use of the NSES information in the construction of a multi-task model based on a combination of a CNN and BiLSTM deep framework. This model was designed for the prediction of base-resolution m⁶A sites. However, one limitation of the BiLSTM component lies in its sequential nature, where computations are executed step-by-step. As a result, the computational speed tends to be slower, and it becomes challenging to capture distant dependencies and global contextual information within the sequence. These factors can potentially limit the model's ability to effectively understand long-range relationships and extract comprehensive contextual features. Therefore, it is essential to assign relatively large attention weights to the vital information. While MTDeepM6A-2S represents a significant advancement in incorporating NSES information into the multi-task prediction of methylation sites, there is still room for further improvement. Addressing the limitations associated with the sequential computations of BiLSTM and enhancing the capture of remote dependencies and global contextual information remain important areas for future research in this field.

To address the limitations of existing models, we drew inspiration from the multi-stage post-calibration determinations used for high-resolution m⁶A site identification and the concept of multi-task learning. As a result, we propose A multi-task transfer learning approach for base-resolution mRNA m⁶A site prediction based on an improved transformer (MTTLm⁶A). In the initial stage, known as the source domain-stage, by using the NSES information for multi-task learning, we have improved the performance of the model in terms of the ability to detect low-resolution m⁶A sites by optimizing the transformer model structure, specifically, the structure applies the double multi-head-attention (multi-head-attention+multi-head-attention) mechanism, which assigns relatively large attention weights to the critical information to intensify it. In the target domain-stage, considering the similarity between the classification tasks in both stages, we have transferred the weights of specific layers and deep networks from the model trained in the source domain-stage to the model in the target domain-stage to predict m⁶A sites at base-resolution. Experimental results on Saccharomyces cerevisiae m⁶A and Homo sapiens m¹A data demonstrate that MTTLm⁶A achieves area under the receiver operating characteristic (AUROC) values of 77.13% and 92.9%, outperforming the state-of-the-art models. At the same time, it shows that the model has strong generalization ability. To enhance user convenience, we have made a user-friendly web server for MTTLm⁶A publicly available at http://47.242.23.141/MTTLm6A/index.php.

2. Materials and methods

2.1. Benchmark datasets

We extracted datasets of two major types of RNA modification sites, including m¹A and m⁶A, from the RMBase v2.0. For the m¹A sites, we collected low-resolution m¹A sites of Homo sapiens from the extensive database RMBase v2.0, in which 2574 m¹A sites have been recorded. The RNA segments with upstream and downstream nucleotides were obtained from the genome. Negative sites (non-modified nucleotides) were randomly selected from the unmodified bases of the same transcript containing the positive sites. The negative samples were down-sampled and cut short to match the number and size of the positive samples. To avoid overfitting, CD-HIT ^[68] was used with a threshold of 0.7 to remove redundant segments. The redundancies of positive and negative samples were removed. Thus, we got 1987 positive samples and 2249 negative samples. To obtain a balanced dataset, 1987 negative samples were randomly selected to build the final dataset. For the second-stage model, we collected base-resolution m¹A sites of Homo sapiens from DeepPromise ^[69] as positive samples. Consequently, 593 training samples and 114 test samples were obtained. Because the second-stage model is used to identify base-resolution m¹A sites from low-resolution m1A sites, we used the low-resolution m¹A sites recorded in RMBase 2.0 as negative samples in the current study. Therefore, based on the above 1987 positive samples, 707 (593 + 114) positive samples were randomly assigned to the second-stage as negative samples; and the remaining samples were divided into training sets and independent test sets at a ratio of 4:1 for the first stage model.

For the m⁶A sites, the dataset was derived from the low-resolution m⁶A sites previously described by Wang et al. ^[67]. This dataset contains a total of 24,669 m⁶A sites. Within these segments, two distinct central motif patterns exist, i.e., AAC and GAC. Notably, the existing methods for predicting m⁶A sites in Saccharomyces cerevisiae were developed by using the Met2614 dataset ^[57], which only includes the GAC central motif. To ensure a comprehensive analysis, we divided the original RNA segments into two parts: one containing segments with the GAC central motif and the other containing segments with the AAC central motif. The number of segments with the AAC central motif was 13,732, while the number of segments with the GAC central motif was 10,937. The ratio of positive to negative samples in both datasets was 1:1. Subsequently, the datasets were randomly split into benchmark and independent test datasets at a 4:1 ratio. This resulted in the AAC benchmark dataset containing 10,985 positive and negative samples, and the independent test dataset containing 2747 positive and negative samples. Similarly, the GAC benchmark dataset consisted of 8749 positive and negative samples, and the independent test dataset contained 2188 positive and negative samples. Referring to the experimental results in Chen's ^[69] and Wang's ^[67] paper, the sizes of the optimal window were respectively set as 101nt and 601nt for m¹A and m⁶A sites.

Furthermore, in RMBase v2.0, the NSES value associated with each m⁶A site recorded in the database was used as the target for a regression task. The NSES indicates the number of experimental confirmations for the corresponding adenine being modified ^[63]. In simpler terms, a higher NSES value suggests a higher level of certainty regarding the authenticity of the m⁶A modification at that specific site. The distribution of NSES within the m⁶A dataset is depicted in Figure 1.

Figure 1. Histograms of NSES on the m⁶A datasets. (a) AAC_BM, (b) GAC_BM, (c) AAC_IND, (d) GAC_IND.

DownLoad: Full-Size Img PowerPoint

For the target domain-stage model, the positive samples were obtained from the base-resolution m⁶A sites of Saccharomyces cerevisiae from m⁶A-Atlas ^[65]; 4689 m⁶A sites were obtained, and they are all with GAC in the central motif. On the other hand, the negative samples were selected from the low-resolution m⁶A sites with GAC in the central motif, as recorded in RMBase v2.0. The ratio of positive to negative samples in both datasets was set at 1:1 to ensure a balanced training environment. Similar to the source domain-stage model, the datasets were randomly divided into benchmark datasets and independent test datasets at a 4:1 ratio. The statistics of the datasets are shown in Table 1.

Table 1. Statistics of the benchmark and independent test datasets.

Site	Species	Stage	Datasets	Window size	Number of positive	Number of negative
m⁶A	Saccharomyces cerevisiae	the source domain-stage	AAC_BM	601	10,985	10,985
			AAC_IND	601	2747	2747
			GAC_BM	601	8749	8749
			GAC_IND	601	2188	2188
		the target domain-stage	GAC_hr_BM	601	3751	3751
		the target domain-stage	GAC_hr_IND	601	938	938
m¹A	Homo sapiens	the source domain-stage	BM	101	1024	1024
		the source domain-stage	IND	101	256	256
		the target domain-stage	hr_BM	101	593	593
		the target domain-stage	hr_IND	101	114	114
BM benchmark; IND independent

| Show Table

DownLoad: CSV

2.2. Encoding of RNA segments

In the development of highly accurate computational methods, the features of sequence data play a crucial role. Suppose that we have raw data ${R_0} = \left\{ {{x^m}} \right\}_{m = 1}^M$ , where M is the number of sequences and each ${x^m} \in {\mathbb{R}^{{l_0}}}$ is an RNA sequence. Each entry $x_i^m{, ^{}}i = 1, 2, 3, 4, ..., {l_0}$ at position i takes its value from the alphabet $\sum { = \left\{ {A, U, C, G, N} \right\}}$ from a sequence of constant length l₀.

One widely used and effective encoding method is one-hot encoding, which provides a simple yet powerful approach to representing the RNA sequences. In this method, the four nucleotides (A, U, C, G) are encoded as binary vectors: A = (1, 0, 0, 0), U = (0, 1, 0, 0), C = (0, 0, 1, 0), G = (0, 0, 0, 1), and N = (0, 0, 0, 0), representing unknown or ambiguous positions. After that, ${R_0} = \left\{ {{x^m}} \right\}_{m = 1}^M$ , where each ${x^m} \in {\mathbb{R}^{{l_0} \times 4}}$ is an RNA sequence. By applying this encoding scheme, a sequence of 601 nucleotides is transformed into a matrix of 601 × 4.

2.3. Deep network and transfer learning

We have devised a multi-task transfer learning approach for base-resolution mRNA m⁶A site prediction based on an improved transformer. The structure of the model is shown in Figure 2, and it is divided into the target source stage and the target domain-stage. The details are as follows.

Figure 2. The diagram of the model. The source domain-stage model is used to discriminate low-resolution m⁶A sites from non-m⁶A sites, and the target domain-stage model is used to identify high-resolution m⁶A sites from low-resolution m⁶A sites.

DownLoad: Full-Size Img PowerPoint

The conventional approaches employed for the prediction of RNA m⁶A sites have primarily relied on single-task learning methods for classification. In contrast, we have adopted a novel multi-task architecture in the construction of the source domain model. Our aim is to enhance the classification results and realize a reasonable confidence value. To achieve this, we constructed a regression task by using the invaluable NSES information retrieved from RMBase v2.0. This regression task allows us to assign a confidence score to the classification results, thereby enhancing their interpretability and overall reliability. The feature encoding sequences were fed into the CNN layer in order to capture sequence patterns or motifs; the mathematical formulation of the CNN model is given below:

$Conv{(R)_{jf}} = ReLU(\sum\limits_{d = 0}^{D - 1} {\sum\limits_{n = 0}^{N - 1} {W_{dn}^f{R_{j + d, n}}} } )$

(1)

where R denotes the input matrix, f represents the index of the kernel, and j represents the index of the output position; each filter ${W^f}$ is a $D \times N$ weight matrix, where $D$ is the filter size, and $N$ is the input channels; ${\mathbb{R}^{{l_0} \times 4}} \mapsto {\mathbb{C}^{l \times d}}, l = {l_0}{\text{ - }}f{\text{ + 1}}$ .

Since convolution contains the order of the sequence, to avoid the occurrence of mixed correlations that may arise if the model is connected to a positional encoding layer in the transformer, we intentionally omit the positional encoding layer and directly link it to the multi-head-attention mechanism, the calculation of attention in this module can be divided into three steps.

In the first step, linearly transform the output matrix X of the CNN layer and divide it into three matrices, as follows

$Q = X{W^q}, K = X{W^k}, V = X{W^v}$

(2)

where $X \in {\mathbb{C}^{l \times d}}, l = {l_0}{\text{ - }}f{\text{ + 1}}$ ; then, three learnable matrices, ${W^q}, {W^k}{\text{ and }}{W^v}$ are used to project X into different spaces. Usually, the matrix size of each of the three matrices is ${\mathbb{C}^{d \times {d_k}}}$ , where d_k is a hyper-parameter.

In the second step, the scaled dot product attention can be calculated by using the following equations:

${A_{m, n}} = {Q_m}K_n^T$

(3)

$Attention(Q, K, V) = softmax(\frac{A}{{\sqrt {{d_k}} }})V$

(4)

where ${Q_m}$ is the query vector for the m-th token and ${K_n}$ is the key vector representation of the nth token. The Softmax function is applied along the last dimension. Instead of using one group of ${W^q}, {W^k}, {W^v}$ , using several groups will enhance the ability of self-attention.

In the third step, when several groups are used, it is called multi-head self-attention; the calculation can be formulated as follows:

${Q^{\left( h \right)}} = X{W^q}^{\left( h \right)}, {K^{\left( h \right)}} = X{W^k}^{\left( h \right)}, {V^{\left( h \right)}} = X{W^v}^{\left( h \right)}$

(5)

$hea{d^{\left( h \right)}} = Attention({Q^{\left( h \right)}}, {K^{\left( h \right)}}, {V^{\left( h \right)}})$

(6)

$MultiHead(Q, K, V) = Concat(hea{d^{\left( 1 \right)}}, hea{d^{\left( 2 \right)}}, ..., hea{d^{\left( i \right)}}){W^o}$

(7)

where i is the number of heads and the superscript h represents the head index. Usually d_k × n = d, which means that the output of $\left[{hea{d^{\left(1 \right)}}, hea{d^{\left(2 \right)}}, ..., hea{d^{\left(i \right)}}} \right]$ will be of size $\mathbb{C}_{}^{l \times d}$ . Also note that ${W^o} \in {\mathbb{C}^{d \times d}}$ , which is a learnable parameter.

Furthermore, to more effectively capture contextual information, we deliberately replaced the feed-forward layer in the transformer structure with the multi-head-attention mechanism layer. This choice empowers the model to assign greater attention weight to key information, thereby reinforcing its significance. At the heart of the source domain-stage lies the primary objective of classifying low-resolution m⁶A sites and accurately distinguishing them from non-m⁶A sites. This pivotal step forms the foundation for analyses and investigations in the subsequent target domain-stage.

Building upon the multi-task model training obtained in the source domain-stage, we progress to the target domain-stage. In this stage, we employ a transfer learning strategy to train the target domain-stage model and focus on identifying both base-resolution m⁶A sites and low-resolution m⁶A sites.

2.3.1. Deep learning for building the source domain-stage model

During the source domain-stage, our model takes RNA segment sequences and the NSES information as input; the RNA segment sequences are then transformed into numerical matrices by the one-hot encoding process, as shown in Figure 2. These numerical matrices are then fed into the deep network, which consists of a CNN and double-multi-head-attention, referred to as CNN+MM.

The CNN uses a 1D convolutional layer to extract local features from the input matrices. To optimize the hyperparameters, we employed a grid-search strategy. In this stage, we used 16 convolutional kernels, each with a size of 10. Subsequently, the output of the CNN stage is normalized with a group normalization layer, where the number of groups was set to 4. The multi-head-attention stage consists of two multi-head-attention networks. One attention mechanism has two heads, with each head having a size of 8 (d_model = 8). The other attention mechanism has four heads, with each head having a size of 16 (d_model = 16). To promote effective information flow, we incorporated a dropout layer and a residual connection around each of the two sub-layers. The dropout rate was set at 0.1 to reduce overfitting, and layer normalization was applied subsequently. Following the multi-head-attention stage, an AveragePooling1D layer is applied to reduce the dimensionality of the extracted features. The kernel size of the 1D pooling layer was set to 15. Subsequently, the data were flattened into a 1D form by using a flattening layer. This is followed by a dropout layer and a fully connected layer. The dropout rate was set at 0.6, and the fully connected layer was set to comprise 64 neurons activated by the exponential linear unit (ELU) function.

The output layer of our model consisted of two outputs, catering to the classification and regression tasks, respectively. The estimated loss is made up of classification loss and regression loss. The following calculation is then used to get the total loss:

$los{s_{multitask}} = los{s_{classification}} + \lambda los{s_{regression}}$

(8)

where $los{s_{classification}}$ is the classification loss, $los{s_{regression}}$ is the regression loss, and $\lambda$ can be set arbitrarily according to specific circumstances. For the classification task, the Softmax activation function was employed, and the categorical cross-entropy was specified as the loss function. For the regression task, the activation function ELU was used, with the log-cosh employed as the loss function. Therefore, the overall loss function for the entire multi-task model can be expressed as follows:

$los{s_{multitask}} = - \frac{1}{N}\sum\limits_{i = 1}^N {(y_i^{class}\log p_i^{class} + (1 - y_i^{class})\log (1 - p_i^{class}))} + \lambda \sum\limits_{i = 1}^N {\log (\cosh (y_i^{regression} - p_i^{regression}))}$

(9)

where N denotes the total number of samples in the dataset and $y_i^{class}$ and $p_i^{class}$ denote the true label and prediction probability of the ith sample of the classification task, respectively. Similarly, $y_i^{regression}$ and $p_i^{regression}$ denoted the true label and prediction probability of the ith sample of the regression task, respectively. The total loss function is optimized by using a grid search with the weight parameter $\lambda$ set to 0.6. This loss function leverages the label information from the regression task to potentially enhance the prediction accuracy of the classification task.

Finally, the stochastic gradient descent (SGD) optimization algorithm is used with the momentum set to 0.95 and a learning rate of 0.01. SGD is a widely adopted optimization algorithm known for its effectiveness in iteratively adjusting the model's parameters during the training process to minimize the loss function.

2.3.2. Transfer learning for the construction of the target domain-stage model

To construct the target domain-stage model, we used transfer learning by transferring the feature extraction layers from the source domain-stage model. This approach was motivated by the similarity between the classification tasks in both stages.

During the transfer learning process, we initialized the parameters of the target domain-stage model by using the feature extraction layers from the source domain-stage model. This initialization includes all layers except the output layer, ensuring that the model starts with valuable learned representations. By inheriting the corresponding weights, we have provided a strong foundation for the target domain model.

Subsequently, we optimized all of the weights of the target domain-stage model during training without freezing any layers. This allowed the model to adapt and refine its parameters based on the target domain's specific characteristics and data. By performing transfer learning in this way, we aimed to capitalize on the knowledge and patterns learned in the source domain, ultimately enhancing the performance and generalization of the model on the target domain's classification tasks.

2.3.3. Evaluation indicators

In this study, we comprehensively evaluated the performance of our prediction model by using eight commonly used classification metrics. These metrics include the accuracy (Acc), sensitivity (Sen), precision (Pre), Matthews correlation coefficient (MCC), specificity (Sp), and F1 score (F1). The formulas for these metrics are respectively as follows:

$Accuracy = \frac{{TP + TN}}{{TP + TN + FP + FN}}$

(10)

$Sensitivity = \frac{{TP}}{{TP + FN}}$

(11)

$Precision = \frac{{TP}}{{TP + FP}}$

(12)

$MCC = \frac{{TP \times TN - FP \times FN}}{{\sqrt {(TP + FP) \times (TP + FN) \times (TN + FP) \times (TN + FN)} }}$

(13)

$Specificity = \frac{{TN}}{{TN + FP}}$

(14)

$F1{\text{ }}score = 2 \times \frac{{Precision \times Recall}}{{Precision + Recall}}$

(15)

Additionally, we used the AUROC curve and the area under the precision-recall curve (AUPRC) to visually assess the overall performance of our model. These metrics provide insights into the model's ability to discriminate between different classes and the precision-recall trade-off, respectively. Time (s) indicates the training time of the model per epoch. By considering both the quantitative metrics and the visual evaluation, we gain a comprehensive understanding of the predictive capabilities of our model.

To evaluate the regression task, we used the Pearson correlation coefficient (PCC) as the index. The PCC measures the similarity between the predicted target values (X) and the actual target values (Y) of the samples. It is calculated as follows:

$PCC = \frac{{{{\rm{cov}}} (X, Y)}}{{{\sigma _X}{\sigma _Y}}}$

(16)

Here, ${{\rm{cov}}} (X, Y)$ represents the covariance between X and Y, and ${\sigma _X}$ and ${\sigma _Y}$ represent the standard deviations of X and Y, respectively. The PCC ranges from –1 to 1, where a value of 0 indicates no correlation.

3. Results

3.1. Comparison with other different learning models based on benchmark dataset

By applying different values of the loss weight $\lambda$ of the regression task (i.e., $\lambda$ = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]), various models, including CNN, CNN+BiLSTM (CN-N+BiL), CNN+transformer (CNN+TF), CNN+multi-head-attention+feed-forward (CNN+MF), and CNN+multi-head-attention+multi-head-attention (CNN+MM) models, were evaluated by using 5-fold cross-validation based on a benchmark dataset. Among them, the CNN+MF model directly connects the multi-head-attention mechanism layer and the feed-forward layer without connecting the positional encoding layer. On the other hand, the CNN+MM model replaces the feed-forward layer with the multi-head-attention mechanism layer based on the CNN+MF architecture. The optimal performance and the corresponding loss weight value of each model are shown in Table 2 and Figure 3. The results show that the CNN+MM model achieved the highest AUROC and AUPRC scores for both GAC_BM and AAC_BM datasets. The specific analysis is as follows:

Table 2. Performance of different models trained by using their respective best value λ in a 5-fold cross-validation test. Values in bold indicate the best performance.

Datasets	Classifiers	AUROC	ACC (%)	Sen (%)	Pre (%)	MCC (%)	Spe (%)	F-1 (%)	AUPRC	PCC	Time (s)
AAC_BM	CNN (λ = 1)	0.8507	77.64	78.58	77.13	55.40	76.69	77.85	0.8382	0.5844	1
	CNN+BiL (λ = 1)	0.8767	79.97	82.61	78.47	60.10	77.33	80.49	0.8607	0.6137	136
	CNN+TF (λ = 1)	0.8700	79.01	84.65	76.08	58.54	73.38	80.13	0.8535	0.5986	4
	CNN+MF (λ = 1)	0.8768	79.75	83.99	77.43	59.77	75.51	80.58	0.8627	0.6040	4
	CNN+MM (λ = 0.6)	0.8793	79.57	80.94	79.06	59.98	78.56	79.99	0.8636	0.6140	25
GAC_BM	CNN (λ = 0.4)	0.8506	76.95	77.24	76.80	54.66	76.66	77.02	0.8394	0.5923	1
	CNN+BiL (λ = 1)	0.8742	79.61	85.46	76.51	59.71	73.75	80.74	0.8595	0.6199	108
	CNN+TF (λ = 0.4)	0.8713	79.46	80.94	78.61	59.04	77.97	79.76	0.8566	0.6005	4
	CNN+MF (λ = 1)	0.8745	79.61	81.19	78.70	59.35	78.03	79.93	0.8600	0.6112	4
	CNN+MM (λ = 0.6)	0.8772	79.06	78.69	79.28	58.48	79.43	79.80	0.8635	0.6156	20
Times: Running time per epoch for model training

| Show Table

DownLoad: CSV

Figure 3. Performance of the different models through 5-fold cross-validation. (a) AUROC for AAC_BM, (b) AUPRC for AAC_BM, (c)AUROC for GAC_BM, (d) AUPRC for GAC_BM.

DownLoad: Full-Size Img PowerPoint

First, comparing CNN+TF with the CNN, the AUROC scores of CNN+TF were 1.93% and 2.07% higher on AAC_BM and GAC_BM, respectively, and the AUPRC scores of CNN+TF were 1.53% and 1.72% higher than the values for the CNN. These findings highlight the ability of the CNN+TF model to capture deep semantics from RNA sequences, surpassing the performance of the CNN alone.

Additionally, a comparison was made between the CNN+MF and CNN+TF models. The AUROC scores of CNN+MF were 0.68% and 0.32% higher than those for the CNN+TF on AAC_BM and GAC_BM, respectively, and the AUPRC scores of CNN+MF were 0.92% and 0.34% higher on AAC_BM and GAC_BM, respectively. This reason may be attributed to the mixed correlations between positional encoding and word embeddings in the CNN+TF model, which introduce unnecessary randomness to the attention mechanism and limit the model's expressiveness. Therefore, CNN+MF affords performance improvement, since the CNN+MF model directly connects the multi-head-attention mechanism layer without connecting the positional encoding layer.

Third, the study compared the performance of the CNN+MM and CNN+MF models. The AUROC scores of CNN+MM were 0.25% and 0.27% higher than the CNN+MF on AAC_BM and GAC_BM, respectively, and the AUPRC scores of CNN+MM were 0.09% and 0.35% higher on AAC_BM and GAC_BM, respectively. This disparity could be attributed to the ineffective capture of contextual information by the feed-forward networks in the transformer structure, as feed-forward networks lack the ability to effectively capture global contextual information, resulting in a less accurate understanding of long-term dependencies or global patterns in the sequence. The CNN+MM model replaces the feed-forward layer with the multi-head-attention mechanism layer based on the CNN+MF architecture. This modification allows the model to capture more complex and fine-grained features within the input sequence.

Fourth, a comparison was made between the CNN+MM and CNN+BiL models. The AUROC scores of CNN+MM were 0.26% and 0.3% higher than those for CNN+BiL on AAC_BM and GAC_BM, respectively, while the AUPRC scores of CNN+MM were 0.29% and 0.39% higher on AAC_BM and GAC_BM, respectively. The improved performance of CNN+MM may be attributed to the inclusion of multi-head-attention, which excels at capturing long-range dependencies and global contextual information. This allows the model to better understand relationships and important semantic connections within the input sequence. Furthermore, the experimental results demonstrate that CNN+BiL has a significantly longer training time per epoch than CNN+MM; particularly, it is five times longer than that of CNN+MM, likely due to the parallel computation of the CNN and multi-head-attention, which enables simultaneous processing of multiple input elements or subtasks. In contrast, BiLSTM's sequential nature, where computations are performed step-by-step, may result in slower computations than parallelizable operations.

Finally, in the case of the regression task, there is little difference between CNN+MM and CNN+BiL; specifically, the PCCs of CNN+MM and CNN+BiL differ by 0.03 and –0.43 on AAC_BM and GAC_BM, respectively. Interestingly, although the loss weight ratio between the classification task and the regression task of the CNN+MM model was 1:0.6, it had little effect on the correlation coefficient of the regression. The reason may be that the CNN+MM model has a more comprehensive and accurate understanding of the underlying data.

In summary, the CNN+MM classifier effectively captures sequence details on the AAC_BM and GAC_BM datasets, outperforming other models in terms of AUROC and AUPRC on the classification task.

3.2. Comparison with other different learning models based on independent dataset

Furthermore, this section presents a comparison of prediction performance between different models in single-task and multi-task settings. The experimental setup involved encoding sequences using one-hot encoding and applying the CNN, CNN+BiL, CNN+TF, CNN+MF, and CNN+MM models to predict modification sites based on the independent dataset. The ratio of the classification loss to the regression loss affects the performance of the models, which is one of the hyper parameters. Only the training sets can be used while choosing the hyper parameters. Therefore, when evaluating the performance of various models based on independent test sets, the models must use the optimal loss weight value for each model as obtained from the training set; the values were obtained as shown in the second column of Table 2.

As shown in Figure 4 and Table 3, except for the CNN+BiL model, the AUROC and the AUPRC scores for the multi-task model based on the two datasets were better than those for the single-task model. In addition, the multi-task model can also calculate the PCC at the same time, so the multi-task model is more efficient than the single-task model.

Figure 4. Performance results for the different models under single-task and multi-task conditions. (a) AUROC_single_task for AAC_IND, (b) AUPRC_single_task for AAC_IND, (c) AUROC_multi_task for AAC_IND, (d) AUPRC_multi_task for AAC_IND, (e) AUROC_single_task for GAC_IND, (f) AUPRC_single_task for GAC_IND, (g) AUROC_multi_task for GAC_IND, (h) AUPRC_multi_task for GAC_ IND.

DownLoad: Full-Size Img PowerPoint

Table 3. Evaluation results for the different models under single-task and multi-task conditions. Values in bold indicate the best performance.

Datasets	Task type	Classifiers	AUROC	ACC (%)	Sen (%)	Pre (%)	MCC (%)	Spe (%)	F-1 (%)	AUPRC	PCC
AAC_IND	Single-task	CNN	0.8579	78.63	81.03	77.32	57.32	76.23	79.13	0.8510
		CNN+BiL	0.8829	80.04	79.90	80.13	60.09	80.19	80.01	0.8737
		CNN+TF	0.8736	78.54	76.55	79.71	57.12	80.52	78.10	0.8642
		CNN+MF	0.8763	78.97	77.72	79.72	57.96	80.23	78.71	0.8657
		CNN+MM	0.8820	80.06	83.10	78.34	60.23	77.03	80.65	0.8724
	Multi-task	CNN (λ = 1)	0.8581	78.55	81.1	77.17	57.18	76.01	79.09	0.8516	0.5946
		CNN+BiL (λ = 1)	0.8801	79.75	79.1	80.15	59.51	80.41	79.62	0.8682	0.6042
		CNN+TF (λ = 1)	0.8768	80.44	83.1	78.91	60.97	77.79	80.95	0.8666	0.613
		CNN+MF (λ = 1)	0.8816	80.12	89.35	75.42	61.29	70.88	81.80	0.8697	0.6053
		CNN+MM (λ = 0.6)	0.8888	80.97	83.72	79.36	62.03	78.23	81.48	0.8775	0.6219
GAC_IND	Single-task	CNN	0.8553	77.92	84.78	74.55	56.37	71.06	79.33	0.8500
		CNN+BiL	0.8767	79.97	82.45	78.55	60.01	77.48	80.45	0.8667
		CNN+TF	0.8699	77.26	92.94	70.75	57.41	61.58	80.34	0.8589
		CNN+MF	0.8635	73.29	95.67	66.09	52.09	50.91	78.18	0.8509
		CNN+MM	0.8761	80.20	84.23	77.94	60.59	76.16	80.96	0.8660
	Multi-task	CNN (λ = 0.4)	0.8560	78.28	83.45	75.63	56.87	73.11	79.35	0.8507	0.5935
		CNN+BiL (λ = 1)	0.8729	78.87	79.63	78.45	57.75	78.12	79.03	0.8665	0.6124
		CNN+TF (λ = 0.4)	0.8734	79.58	86.69	75.90	59.77	72.47	80.94	0.8622	0.6029
		CNN+MF (λ = 1)	0.8795	80.20	83.59	78.28	60.53	76.80	80.85	0.8698	0.6202
		CNN+MM (λ = 0.6)	0.8880	81.11	83.23	79.84	62.27	78.99	81.50	0.8783	0.6343

| Show Table

DownLoad: CSV

The comparison results, demonstrate that CNN+MM, operating under the multi-tasking framework, outperforms other models across various evaluation metrics such as the AUROC, AUPRC, PCC, ACC, and MCC. Specifically, for AAC_IND and GAC_IND sites, CNN+MM achieved AUROC values of 0.8888 and 0.888, respectively, exhibiting better performance than other methods. In contrast, CNN+BiL did not incorporate the multi-head-attention mechanism, potentially limiting their ability to capture global contextual information as compared to CNN+MM. CNN+MF and CNN+TF contain the multi-head-attention layer. However, they both contain the feed-forward layer, which lacks the ability to effectively capture deeper global information compared to multi-head-attention layers, resulting in a less accurate understanding of long-term dependencies or global patterns in the sequence.

3.3. Comparison with other different learning models in the target domain-stage

Considering the similarity between the two-stage classification tasks, we chose to leverage the feature extraction layer of the source domain-stage model to construct the target domain-stage model. Specifically, in the transfer learning process, we initialize the parameters of the target domain-stage model with the feature extraction layer (excluding the output layer) and the corresponding weights of the source domain-stage model. During training, we optimize all of the weights of the target domain-stage model without freezing them.

To evaluate the effectiveness of transfer learning compared to training from scratch, we also trained the same network without using the weights obtained from the source domain-stage model. Furthermore, to assess the performance of multi-task transfer learning against single-task fine-tuning, we also trained the same network indirectly through single-task transfer learning. We conducted three sets of comparative experiments on the GAC_hr_BM datasets, and the results of these experiments are shown in Table 3.

As shown in Figure 5 and Table 4, the AUROC and AUPRC values for MTTLm⁶A_single were 2.07% and 2.92% higher than those for MTTLm⁶A_direct. This improvement suggests that transfer learning enhances the model's performance relative to training without transfer learning. The reason for this improvement is that transfer learning allows the model to leverage knowledge gained from related tasks or domains, allowing it to generalize well to unseen data. Furthermore, the performance of the model trained through multi-task transfer learning was significantly better than that of the model trained through single-task transfer learning. Specifically, the AUROC and AUPRC values for MTTLm⁶A were 0.86% and 0.68% higher than those for MTTLm⁶A_single, respectively. This result can be attributed to the complementary information provided by the combination of classification and regression tasks. While the classification task focuses on predicting discrete class labels, the regression task aims to estimate continuous values. By jointly training the model on both tasks, MTTLm⁶A can effectively utilize the complementary information from both tasks to improve its understanding of the data and make more accurate predictions.

Figure 5. Performance results for the different models in terms 5-fold cross-validation based on the GAC_hr_BM dataset.

DownLoad: Full-Size Img PowerPoint

Table 4. Evaluation results for the different models from the perspective of 5-fold cross-validation based on the GAC_hr_BM dataset.

Classifiers	AUROC	ACC (%)	Sen (%)	Pre (%)	MCC (%)	Spe (%)	F-1 (%)	AUPRC
MTTLm⁶A _direct	0.6750	62.33	61.69	62.49	24.81	62.97	62.09	0.6576
MTTLm⁶A _single	0.6957	63.74	60.28	64.77	27.69	67.21	62.44	0.6871
MTTLm⁶A	0.7043	64.40	65.53	64.08	28.94	63.26	64.80	0.6938
MTTLm⁶A_direct refers to the model trained directly without transfer learning; MTTLm⁶A_single indicates the model trained indirectly through single-task transfer learning.

| Show Table

DownLoad: CSV

3.4. Comparison with state-of-the-art approaches

Finally, MTTLm⁶A was compared with other state-of-the-art approaches on the GAC_hr_IND datasets, including m⁶A-word2vec, MultiRM, and MTDeepM6A-2S. To make the comparison more convincing, we included the MTTLm⁶A_single model in the evaluation.

As shown in Figure 6 and Table 5, the AUROC and AUPRC values for MTTLm⁶A were higher than those obtained for the other approaches. In particular, compared to MTDeepM⁶A-2S, the second-best performing model utilizing multi-task transfer learning, MTTLm⁶A, demonstrated an improvement of 0.37% in AUROC and 0.58% in AUPRC. This enhancement can be attributed to MTTLm⁶A's ability to capture long-range dependencies and global contextual information in the input sequence compared to MTDeepM⁶A-2S.

Figure 6. Performance of MTTLm⁶A and other models based on the GAC_hr_IND dataset.

DownLoad: Full-Size Img PowerPoint

Table 5. Comparison of the performance of different models based on the GAC_hr_IND dataset.

Classifiers	AUROC	ACC (%)	Sen (%)	Precision (%)	MCC (%)	Spe (%)	F-1 (%)	AUPRC
m⁶A-word2vec	0.6008	57.78	58.96	57.60	15.57	48.83	58.27	0.586
MultiRM	0.6947	63.75	69.08	62.43	27.66	44.67	65.59	0.6922
MTTLm⁶A_single	0.7599	69.10	67.56	69.71	38.23	70.65	68.62	0.7709
MTDeepM6A-2S	0.7676	70.44	66.81	72.04	40.98	74.07	69.32	0.7688
MTTLm⁶A	0.7713	69.85	74.81	68.06	39.90	64.89	71.28	0.7746

| Show Table

DownLoad: CSV

Furthermore, the AUROC and AUPRC values for MTTLm⁶A were 1.14% and 0.37% higher than those for MTTLm⁶A_single, respectively. The incorporation of multi-task learning in the MTTLm⁶A model allows for joint training on both classification and regression tasks. This integration allows the model to learn shared representations and extract more informative features that benefit both tasks. By collectively optimizing these tasks, the model achieves improved overall performance. In contrast, MTTLm⁶A_single focuses solely on the classification task, potentially limiting its ability to capture the full accuracy of the data.

Additionally, MTTLm⁶A surpassed MultiRM by a notable margin, exhibiting AUROC and AUPRC improvements of 7.66% and 8.24%, respectively. This highlights the effectiveness of MTTLm⁶A in addressing the challenges posed by small sample classification data modeling problems.

In order to evaluate the reliability of the model, the m6A-word2vec, MultiRM, MTTLm⁶Asingle, MTDeepM6A-2S, and MTTLm⁶A models were applied for 100 replicate experiments on the same independent test sets of m⁶A, respectively. After 100 replicates, we tested the statistical significance of AUROC values between different tools by using the student's t-test ^[70]. The results are as shown in Table 6.

Table 6. Statistically significant correlation matrix for the difference in the performance between the five selected classifiers.

Modification type	Classifiers	Classifiers
Modification type	Classifiers	m6A-word2vec	MultiRM	MTTLm⁶A_singl	MTDeepM6A-2S	MTTLm⁶A
m⁶A	m6A-word2vec
	MultiRM	0
	MTTLm⁶A_single	0	0
	MTDeepM6A-2S	0	0	0
	MTTLm⁶A	0	0	0	0

| Show Table

DownLoad: CSV

3.5. Assessing model generalization ability

In order to evaluate the generalization ability of the MTTLm⁶A model, MTTLm⁶A was compared and analyzed against m6A-word2vec, MultiRM, MTTLm⁶Asingle and MTDeepM6A-2S by using the training set and independent set of m¹A sites of Homo sapiens. As shown in Table 7, the AUROC and AUPRC values for MTTLm⁶A were higher than those obtained for other approaches. In particular, compared to MTTLm⁶A_single, the second-best performing model, MTTLm⁶A demonstrated an improvement of 1.47% in AUROC and 0.63% in AUPRC. This shows that multi-task learning is effective in improving model performance. In summary, the MTTLm⁶A model has good versatility in predicting different methylation sites in different species.

Table 7. Comparison of the performance of different models based on the hr_IND dataset.

Classifiers	AUROC	ACC (%)	Sen (%)	Precision (%)	MCC (%)	Spe (%)	F-1 (%)	AUPRC
m6A-word2vec	0.9095	91.67	100.00	85.71	84.52	83.33	92.31	0.8722
MultiRM	0.9126	91.23	99.12	85.61	83.50	83.33	91.87	0.8818
MTTLm⁶A_single	0.9143	90.79	98.25	85.50	82.50	83.33	91.43	0.8841
MTDeepM6A-2S	0.894	91.23	99.12	85.61	83.50	83.33	91.87	0.8509
MTTLm⁶A	0.929	91.23	99.12	85.61	83.50	83.33	91.87	0.8904

| Show Table

DownLoad: CSV

3.6. Web server

We have developed a user-friendly web server, accessible at http://47.242.23.141/MTTLm6A/index.php, to facilitate the utilization of the MTTLm⁶A model as a tool for predicting the base-resolution m⁶A sites. Simply type or paste the RNA sequence of interest into the designated input area. To receive the prediction results, kindly provide your email address in the corresponding box and click the "submit" button. After a brief calculation period, the prediction results will be presented in a clear and organized table format. This intuitive web server provides researchers with an efficient and convenient platform to leverage the MTTLm⁶A model to quickly predict the base-resolution m⁶A site.

4. Discussion

To assess the impact of different loss weights on the source domain-stage model, we conducted an optimization process by using the grid search method with AAC_BM and GAC_BM datasets. Through this process, we identified an optimal weight ratio of 1:0.6. The evaluation metrics, as displayed in Table 8, validate the effectiveness of this weight configuration. This finding aligns with the conclusions drawn by Kendall et al. ^[71], who emphasized the importance of relative weighting in multi-task learning scenarios. Their research demonstrated that numerous DL applications benefit from incorporating multiple regression and classification objectives, but the performance of such systems relies heavily on the appropriate weighting assigned to each task's loss function. By determining the optimal loss weight ratio for our source domain-stage model, we aimed to enhance its predictive capabilities and ensure a balanced influence of the classification and regression tasks. This optimization process allows us to fully leverage the benefits of multi-task learning and maximize the performance of our model.

Table 8. Performance of models trained by using different loss weights in a 5-fold cross-validation test. Values in bold indicate the best performance.

Datasets	Weight ratio	AUROC	ACC (%)	Sen (%)	Pre (%)	MCC (%)	Spe (%)	F-1 (%)	AUPRC	PCC
AAC_BM	1:0.1	0.8753	79.3	80.59	78.56	59.05	78.01	79.56	0.8579	0.5894
	1:0.2	0.8745	77.95	74.85	79.80	56.9	81.05	77.25	0.8586	0.6019
	1:0.3	0.8788	78.74	77.67	79.36	58.38	79.81	78.51	0.8615	0.6093
	1:0.4	0.8734	76.56	70.76	80.05	54.77	82.36	75.12	0.8574	0.6126
	1:0.5	0.8745	78.16	77.85	78.34	57.62	78.48	78.10	0.8590	0.6078
	1:0.6	0.8793	79.57	80.94	79.06	59.98	78.56	79.99	0.8636	0.614
	1:0.7	0.8754	77.6	73.8	79.86	56.65	81.39	76.71	0.8587	0.6108
	1:0.8	0.8753	78.04	76.7	78.81	56.98	79.38	77.74	0.8592	0.6218
	1:0.9	0.8749	77.85	75.95	78.94	56.95	79.74	77.42	0.8584	0.6124
	1:1.0	0.8756	78.66	77.96	79.06	58.11	79.36	78.51	0.8595	0.5996
GAC_BM	1:0.1	0.8746	79.62	81.97	78.30	59.59	77.28	80.09	0.8584	0.6011
	1:0.2	0.8771	79.65	80.76	78.88	59.54	77.54	80.29	0.8612	0.6033
	1:0.3	0.8761	79.58	80.27	79.18	59.56	78.9	79.72	0.8627	0.6095
	1:0.4	0.8764	79.21	83.78	76.77	59.31	74.65	80.12	0.8634	0.6144
	1:0.5	0.8746	78.62	82.57	76.52	58.25	74.67	79.43	0.8605	0.6136
	1:0.6	0.8772	79.06	78.69	79.28	58.48	79.43	79.80	0.8635	0.6156
	1:0.7	0.8733	78.8	77.89	79.33	57.94	79.71	78.61	0.8584	0.6187
	1:0.8	0.8753	79.1	81.77	77.63	58.93	76.44	79.65	0.8589	0.6150
	1:0.9	0.8766	79.60	80.99	78.8	59.39	78.21	79.88	0.8613	0.6218
	1:1.0	0.8756	78.66	77.96	79.06	58.11	79.36	78.51	0.8595	0.6220

| Show Table

DownLoad: CSV

To evaluate the effectiveness of our multi-task learning architecture, we conducted two separate experiments: single-task learning for the classification task and single-task learning for the regression task. Both experiments used the CNN+MM network. Table 9 presents the cross-validation results of the single-task classification model and the multi-task model based on two different benchmark datasets. Our findings reveal that, in the case of multi-task learning, the AUROC values for the classification task were 0.8794 and 0.8772 for AAC_BM and GAC_BM, respectively. In comparison, the AUROC values for the classification task in single-task learning were 0.8774 and 0.8769 for AAC_BM and GAC_BM, respectively. Therefore, the performance of the multi-task classification model surpassed that of the single-task classification model for both AAC_BM and GAC_BM. The reason may be that multiple related tasks help to regularize each other and a more robust representation can be learned; thus, multi-task learning is usually believed to improve network performance. These results align with a study by Ruder ^[72], which emphasizes that multi-task learning allows the model to focus its attention on relevant features, as other tasks provide additional evidence for determining the relevance or irrelevance of those features. By adopting a multi-task learning approach, our model benefits from the shared representation and complementary information across tasks, leading to improved classification performance. Specifically, the model adds NSES information, which helps identify poor-quality methylation sites. These findings underscore the effectiveness of our multi-task learning architecture in enhancing model performance and feature relevance assessment.

Table 9. Performance comparison between the single-task classification models and the multi-task classification models based on AAC_BM, and GAC_BM, respectively. Values in bold indicate the best performance.

Datasets	task (Weight-ratio)	AUROC	ACC (%)	Sen (%)	Pre (%)	MCC (%)	Spe (%)	F-1 (%)	AUPRC
AAC_BM	single-task	0.8774	78	74.71	79.97	57.33	81.29	77.25	0.8613
AAC_BM	multi-task (1:0.6)	0.8794	79.57	80.94	79.06	59.98	78.56	79.99	0.8636
GAC_BM	single-task	0.8769	79.76	79.61	79.85	59.7	79.91	79.73	0.8613
GAC_BM	multi-task (1:0.6)	0.8772	79.06	78.69	79.28	58.48	79.43	79.80	0.8635

| Show Table

DownLoad: CSV

Table 10 presents the PCC results for the cross-validation of the single-task regression model and the multi-task model on AAC_BM and GAC_BM. The results indicate that the multi-task model slightly outperforms the single-task regression model on both datasets. Specifically, the correlation coefficients for the regression task were 0.614 and 0.6156 for multi-task learning, while they were 0.6042 and 0.6 for single-task learning for AAC_BM and GAC_BM, respectively. Interestingly, despite the loss weight ratio between the classification and regression tasks of the CNN+MM model in the source domain-stage being 1:0.6, the impact on the correlation coefficients of the regressions is minimal. This suggests that the improved performance of our multi-task model relative to the single-task regression model can be attributed to several factors, including the utilization of complementary information, shared feature representation, regularization techniques, and knowledge transfer between tasks. By leveraging multi-task learning, our model benefits from the synergistic effects of multiple tasks, leading to enhanced PCCs. This highlights the advantages of incorporating related tasks and sharing representations, ultimately resulting in a more comprehensive and accurate understanding of the underlying data. By considering the interplay between tasks and the complementary nature of their information, we can leverage multi-task learning to further improve the performance of our model and achieve superior results across a range of metrics.

Table 10. PCC results for cross-validation of single-task regression model and multi-task model on AAC_BM and GAC_BM. Values in bold indicate the best performance.

Datasets	PCC for multi-task regression models	PCC for single-task regression models
AAC_BM	0.614	0.6042
GAC_BM	0.6156	0.6

| Show Table

DownLoad: CSV

In summary, our results demonstrate that the multi-task learning approach outperforms single-task learning on all four tasks on both the AAC_BM and GAC_BM datasets. Furthermore, the multi-task learning framework offers an added advantage of increased efficiency compared to the single-task model. By simultaneously addressing multiple tasks, the model can accomplish more with fewer resources and reduced computational overhead. In conclusion, our findings highlight the superiority of the multi-task model over the single-task model.

5. Conclusions

The contribution of this paper lies in the development of a novel predictor called MTTLm⁶A, which utilizes a multi-task learning and transfer learning approach based on an improved transformer architecture to identify base-resolution mRNA m⁶A sites. Experimental results on Saccharomyces cerevisiae m⁶A and Homo sapiens m¹A data demonstrate that MTTLm⁶A respectively achieved AUROC values of 77.13% and 92.9%, outperforming the state-of-the-art models. At the same time, it shows that the model has strong generalization ability. But the model has a limitation, which is that source domain-stage training requires samples with NSES information, which is a necessary condition for multi-task learning.

Furthermore, considering that multi-task learning tends to benefit from an increasing number of tasks, we intend to delve deeper into the characteristics of methylation parameters. By incorporating more effective tasks into the learning framework, we will extract more base-resolution methylation sequences from low-resolution methylation sequences of different species by using the MTTLm⁶A model for scientific research.

In conclusion, the development of MTTLm⁶A, its promising performance, and future research directions contribute to the advancement of computational methods for the identification of methylation sites, demonstrating its potential for broader application and further refinement in the future.

Use of AI tools declaration

The authors declare that they have not used artificial intelligence tools in the creation of this article.

Acknowledgments

This work has been supported by the National Natural Science Foundation of China (31871337 and 61971422), and the "333 Project" of Jiangsu (BRA2020328).

Conflict of interest

The authors declare no conflict of interest.

¹ The use of the terms “children” and “young people” covers the age range from birth to 25 years. We use the term “children” to refer to younger children who do not have the maturity and understanding to make important decisions for themselves.

² We use the term “young adult” and “young people” to refer to those aged 18–25 years old and more experienced children who are more likely to be able to make these decisions for themselves.

³ In this paper, the terms “child psychotherapist” and “child psychotherapy” are used to refer to ACP and UKCP-accredited psychotherapists trained to work with infants, children, adolescents, young people, and parents, individually and within the family. UKCP child psychotherapists are integrative in their approach as reflected in the Standards of Education and Training.

⁴ The term “child” is used throughout this paper when referring to children and adolescents up to their 18th birthday, unless additional or specialist emphasis is required. “Young people” refer to clients aged 18–25 years old. “Young clients” refers to children, adolescents, and young people who work with child psychotherapists.

⁵ https://www.bacp.co.uk/about-us/advancing-the-profession/scoped/scoped-faqs/.

Conflict of interest

The authors declare no conflict of interest.

References

[1]	Pavone P, Ceccarelli M, Marino S, et al. (2021) SARS-CoV-2 related paediatric acute-onset neuropsychiatric syndrome. Lancet Child Adolesc Health 5: e19-e21. https://doi.org/10.1016/s2352-4642(21)00135-8
[2]	Parkin E, Law R Support for Children and Young People's Mental Health (2021). Available from: https://core.ac.uk/download/pdf/475650372.pdf
[3]	Drewes AA, Seymour JW (2019) Integrative Psychotherapy with Children. Handbook of Psychotherapy Integration . Oxford: Oxford University Press 341-356. https://doi.org/10.1093/med-psych/9780190690465.003.0016
[4]	Winnicott DW (1960) The theory of the parent-infant relationship 1 (1960). The Maturational Processes and the Facilitating Environment . London: Karnac Books Ltd. https://doi.org/10.4324/9780429482410-3
[5]	NHS DigitalMental Health of Children and Young People in England, 2020: Wave 1 follow up to the 2017 survey (2020). Available from: https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-of-children-and-young-people-in-england/2020-wave-1-follow-up
[6]	NHS DigitalMental Health of Children and Young People in England, 2023: Wave 4 follow up to the 2017 survey (2023). Available from: https://digital.nhs.uk/data-and-information/publications/statistical/mental-health-of-children-and-young-people-in-england
[7]	Deighton J, Croudace T, Fonagy P, et al. (2014) Measuring mental health and wellbeing outcomes for children and adolescents to inform practice and policy: a review of child self-report measures. Child Adolesc Psychiatry Ment Health 8: 1-14. https://doi.org/10.1186/1753-2000-8-14
[8]	Cadman D, Boyle M, Szatmari P, et al. (1987) Chronic illness, disability, and mental and social well-being: findings of the Ontario Child Health Study. Pediatrics 79: 805-813. https://doi.org/10.1542/peds.79.5.805
[9]	Becker S, Sempik J (2019) Young adult carers: the impact of caring on health and education. Child Soc 33: 377-386. https://doi.org/10.1111/chso.12310
[10]	Kooth plcKooth Pulse 2021 Identifying issues and collaborating to enable positive mental health for all (2021). Available from: https://explore.kooth.com/wp-content/uploads/2021/05/Kooth-Pulse-2021-Report.pdf
[11]	Loades ME, Chatburn E, Higson-Sweeney N, et al. (2020) Rapid systematic review: the impact of social isolation and loneliness on the mental health of children and adolescents in the context of COVID-19. J Am Acad Child Adolesc Psychiatr 59: 1218-1231. https://doi.org/10.1016/j.jaac.2020.05.009
[12]	Peytrignet S, Marszalek K, Grimm F, et al. (2022) Children and young people's mental health: COVID-19 and the road ahead. The Health Foundation . Available from: www.health.org.uk/news-and-comment/charts-andinfographics/children-and-young-people-s-mental-health
[13]	Grimm F, Alcock B, Butler J, et al. (2022) Improving children and young people's mental health services. Local data insights from England, Scotland and Wales. The Health Foundation . Available from: https://www.health.org.uk/publications/reports/improving-children-and-young-peoples-mental-health-services. https://doi.org/10.37829/hf-2022-ndl1
[14]	Midgley N, Mortimer R, Cirasola A, et al. (2021) The evidence-base for psychoanalytic and psychodynamic psychotherapy with children and adolescents. An update and narrative synthesis. Front Psychol 12: 662671. https://doi.org/10.3389/fpsyg.2021.662671
[15]	Freud S Analysis of a phobia in a five-year-old boy, standard edition (1909)10: 1-150. https://doi.org/10.1037/e417472005-239
[16]	Menaker E (2001) Anna Freud's analysis by her father: the assault on the self. J Relig Health 40: 89-95.
[17]	Kazdin AE, Weisz JR (2003) Introduction: Context and background of evidence-based psychotherapies for children and adolescents. Evidence-based Psychotherapies for Children and Adolescents . New York: The Guidlford Press 3-20.
[18]	Midgley N (2008) The ‘Matchbox School’ (1927–1932): Anna Freud and the idea of a ‘psychoanalytically informed education’. J Child Psychother 34: 23-42. https://doi.org/10.1080/00754170801895920
[19]	Hug-Hellmuth H (1913) A Study of the Mental Health of a Child. Washington, DC: Nervous and Mental Disease Publishing Company.
[20]	Hug-Hellmuth H (1921) On the technique of child analysis. Int J Psychoanal 2: 287-305.
[21]	Klein M (1932) The Psychoanalysis of Children. London: The Hogarth Press Company.
[22]	Likierman M, Urban E (2009) The roots of child and adolescent psychotherapy in psychoanalysis. The Handbook of Child and Adolescent Psychotherapy . London: Routledge 29-40. https://doi.org/10.4324/9780203135341-10
[23]	Grünbaum L, Mortensen KV (2018) Psychodynamic Child and Adolescent Psychotherapy: Theories and Methods. London: Karnac. https://doi.org/10.4324/9781003423638
[24]	Daws D (1987) Thirty Years of Child Psychotherapy: The Psychoanalytic Approach to Children's Problems (Tavistock Clinic Paper, No. 48). London: Tavistock Clinic.
[25]	Koocher GP, D'Angelo EJ (1992) Evolution of practice in child psychotherapy. History of psychotherapy: A century of change . American Psychological Association 457-492. https://doi.org/10.1037/10110-012
[26]	Hernández-Martínez C, Voltas Moreso N, Arija Val V, et al. (2019) The role of maternal emotional states during pregnancy and early infancy on infant cortisol levels: A prospective study. Infant Child Dev 28: 21-49. https://doi.org/10.1002/icd.2149
[27]	Schore AN (2001) Effects of a secure attachment relationship on right brain development, affect regulation, and infant mental health. Infant Ment Health J 22: 7-66. https://doi.org/10.1002/1097-0355(200101/04)22:1<7::aid-imhj2>3.0.co;2-n
[28]	Schore AN (2015) Affect Regulation and the Origin of the Self: The Neurobiology of Emotional Development. London: Routledge. https://doi.org/10.4324/9781315680019
[29]	Barish K (2018) How to be a Better Child Therapist: An Integrative Model for Therapeutic Change. New York: WW Norton & Company.
[30]	Lewis M (2005) Bridging emotion theory and neurobiology through dynamic systems modeling. Behav Brain Sci 28: 169-194.
[31]	Coonerty S (1993) Integrative child therapy. Comprehensive Handbook of Psychotherapy Integration . Boston, MA: Springer. https://doi.org/10.1007/978-1-4757-9782-4_28
[32]	Shirk SR, Russell RL (1996) Change Processes in Child Psychotherapy: Revitalizing Treatment and Research. New York: Guilford Press.
[33]	Fonagy P, Target M (1998) Mentalisation and the changing aims of child psychoanalysis. Psychoanal Dialogues 8: 87-114. https://doi.org/10.1080/10481889809539235
[34]	Han HR, Miller HN, Nkimbeng M, et al. (2021) Trauma informed interventions: A systematic review. PLoS ONE 16: e0252747. https://doi.org/10.1371/journal.pone.0252747
[35]	Felitti VJ, Anda RF, Nordenberg D, et al. (1998) Relationship of childhood abuse and household dysfunction to many of the leading causes of death in adults: The Adverse Childhood Experiences (ACE) Study. Am J Prev Med 14: 245-258. https://doi.org/10.1016/s0749-3797(98)00017-8
[36]	Isobel S, Goodyear M, Furness T, et al. (2019) Preventing intergenerational trauma transmission: A critical interpretive synthesis. J Clin Nurs 28: 1100-1113. https://doi.org/10.1111/jocn.14735
[37]	Schore AN (2001) The effects of early relational trauma on right brain development, affect regulation, and infant mental health. Infant Ment Health J 22: 201-269. https://doi.org/10.1002/1097-0355(200101/04)22:1<201::aid-imhj8>3.0.co;2-9
[38]	Schore AN (2021) The interpersonal neurobiology of intersubjectivity. Front Psychol 12: 648616. https://doi.org/10.3389/fpsyg.2021.648616
[39]	Adler-Tapia R (2012) Child Psychotherapy: Integrating Developmental Theory into Clinical Practice. New York: Springer Publishing Company. https://doi.org/10.1891/9780826106742
[40]	Adler-Tapia R, Settle C (2023) EMDR and the Art of Psychotherapy with Children: Guidebook and Treatment Manual. New York: Springer Publishing Company. https://doi.org/10.1891/9780826169969
[41]	Chaffin M, Hanson R, Saunders BE, et al. (2006) Report of the APSAC task force on attachment therapy, reactive attachment disorder, and attachment problems. Child Maltreatment 11: 76-89. https://doi.org/10.1177/1077559505283699
[42]	Zilberstein K (2014) The use and limitations of attachment theory in childpsychotherapy. Psychotherapy 51: 93. https://doi.org/10.1037/a0030930
[43]	Fairbairn WRD (1943) The Repression and Return of Bad Objects (with Special Reference to the ‘War Neuroses’). Psycho-Analytic Studies of the Personality . London: Routledge. https://doi.org/10.4324/9780203129449-7
[44]	Winnicott DW (1964) The Child, The Family and The Outside World. Harmondsworth: Penguin Books. https://doi.org/10.1093/med:psych/9780190271398.003.0017
[45]	Stern DN (1985) The Interpersonal World of the Infant: A View from Psychoanalysis and Developmental Psychology. London: Routledge. https://doi.org/10.4324/9780429482137
[46]	Barish K (2018) Cycles of understanding and hope: toward an integrative model of therapeutic change in child psychotherapy. J Infant Child Adolesc Psychotherapy 17: 232-242. https://doi.org/10.1080/15289168.2018.1526022
[47]	Nuñez L, Fernández S, Alamo N, et al. (2022) The therapeutic relationship and change processes in child psychotherapy: a qualitative, longitudinal study of the views of children, parents and therapists. Res Psychother Psych 25: 67-81. https://doi.org/10.4081/ripppo.2022.556
[48]	Muñoz-Pérez C, Núñez Hidalgo LB, Capella Sepúlveda C, et al. (2023) Early child therapeutic relationship in child sexual abuse: Perspectives of children and their therapists. Psicoperspectivas 22: 117. https://doi.org/10.5027/psicoperspectivas-vol22-issue3-fulltext-2972
[49]	Nuñez L, Midgley N, Capella C, et al. (2021) The therapeutic relationship in child psychotherapy: integrating the perspectives of children, parents and therapists. Psychother Res 28: 1-13. https://doi.org/10.1080/10503307.2021.1876946
[50]	Greenberg LS (2002) Emotion-focused therapy: Coaching clients to work through their feelings. Washington, DC: APA Press. https://doi.org/10.1037/10447-000
[51]	Whelton WJ (2004) Emotional processes in psychotherapy: evidence across therapeutic modalities. Clin Psychol Psychother 11: 58-71. https://doi.org/10.1002/cpp.392
[52]	Sønderland NM, Solbakken OA, Eilertsen DE, et al. (2023) Emotional changes and outcomes in psychotherapy: a systematic review and meta-analysis. J Consult Clin Psychol . https://dx.doi.org/10.1037/ccp0000814
[53]	Lane RD, Subic-Wrana C, Greenberg L, et al. (2022) The role of enhanced emotional awareness in promoting change across psychotherapy modalities. J Psychother Integr 32: 131-150. https://doi.org/10.1037/int0000244
[54]	Shirk SR, Burwell RA (2010) Research on therapeutic processes: In psychodynamic psychotherapy with children and adolescents. Assessing Change in Psychoanalytic Psychotherapy of Children and Adolescents . London: Karnac 177-205. https://doi.org/10.4324/9780429472008-7
[55]	Fonagy P, Target M (1997) Attachment and reflective function: Their role in self-organization. Dev Psychopathol 9: 679-700. https://doi.org/10.1017/s0954579497001399
[56]	Slade A (2005) Parental reflective functioning: An introduction. Attach Hum Dev 7: 269-281. https://doi.org/10.1080/14616730500245906
[57]	Byrne G, Murphy S, Connon G (2020) Mentalization-based treatments with children and families: A systematic review of the literature. Clin Child Psychol Psychiatry 25: 1022-1048. https://doi.org/10.1177/1359104520920689
[58]	Schmidt Neven R (2010) Core Principles of Assessment and Therapeutic Communication with Children, Parents and Families: Towards the Promotion of Child and Family Wellbeing. London: Routledge. https://doi.org/10.4324/9780203841136
[59]	Bosqui TJ, Marshoud B (2018) Mechanisms of change for interventions aimed at improving the wellbeing, mental health and resilience of children and adolescents affected by war and armed conflict: a systematic review of reviews. Confl Health 12: 1-17. https://doi.org/10.1186/s13031-018-0153-1
[60]	Sander LW (1977) The regulation of exchange in the infant-caregiver system and some aspects of the context-content relationship. Interaction, Conversation, and the Development of Language . New York: Wiley 133-155.
[61]	Stern DN, Bruschweiler-Stern N, Harrison AM, et al. (1998) The process of therapeutic change involving implicit knowledge: Some implications of developmental observations for adult psychotherapy. Infant Ment Health J 19: 300-308. https://doi.org/10.1002/(sici)1097-0355(199823)19:3<300::aid-imhj5>3.0.co;2-p
[62]	Stern DN (2004) The Present Moment in Psychotherapy and Everyday Life (Norton series on interpersonal neurobiology). New York: WW Norton & Company.
[63]	Bentzen M, Hart S (2015) Through Windows of Opportunity A Neuroaffective Approach to Child Psychotherapy. London: Routledge. https://doi.org/10.4324/9780429484018
[64]	Vygotsky L (1978) Mind in Society. The Development of Higher Mental Processes. Cambridge, MA: Harvard University Press. https://doi.org/10.2307/j.ctvjf9vz4
[65]	Alvarez A (1992) Live Company. Psychoanalytic Psychotherapy with Autistic, Borderline, Deprived and Abused Children. London: Routledge. https://doi.org/10.4324/9780203713839
[66]	Alvarez A (2006) Some questions concerning states of fragmentation: unintegration, under-integration, disintegration, and the nature of early integrations. J Child Psychother 32: 158-180. https://doi.org/10.1080/00754170600780331
[67]	Fitzpatrick M (1993) Adolescents. Comprehensive Handbook of Psychotherapy Integration . New York: Plenum Press 427-436. https://doi.org/10.1007/978-1-4757-9782-4_29
[68]	Boswell JF, Newman MG, McGinn LK (2019) Outcome research on psychotherapy integration. Handbook of Psychotherapy Integration . Oxford: Oxford University Press 405-431. https://doi.org/10.1093/medpsych/9780190690465.003.0019
[69]	Fonagy P, Gergely G, Jurist E, et al. (2002) Affect Regulation, Mentalization and the Development of the Self. New York: Other Press. https://doi.org/10.4324/9780429471643
[70]	Bion W (1962) Learning from Experience. New York: Basic Books. https://doi.org/10.4324/9781003411840-1
[71]	Rustin R, Rustin M (2019) New Discoveries in Child Psychotherapy. Findings from Qualitative Research. London: Routledge. https://doi.org/10.4324/9780429282294
[72]	Novick KK, Novick J (2005) Working with Parents Makes Therapy Work. Plymouth: Jason Aronson.
[73]	Beebe B (2005) Faces in relation: Forms of Intersubjectivity in an Adult treatment of early trauma. Forms of Intersubjectivity . Other Press, LLC 89-143.
[74]	Beebe B, Hoven CW, Kaitz M, et al. (2020) Urgent engagement in 9/11 pregnant widows and their infants: Transmission of trauma. Infancy 25: 165-189. https://doi.org/10.1111/infa.12323
[75]	Lyons-Ruth K (1996) Attachment relationships among children with aggressive behaviour problems: the role of disorganized early attachment patterns. J Consult Clin Psychol 64: 64-73. https://doi.org/10.1037/0022-006x.64.1.64
[76]	Lyons-Ruth K, Yellin C, Melnick S, et al. (2005) Expanding the concept of unresolved mental states: Hostile/helpless states of mind on the Adult Attachment Interview are associated with disrupted mother–infant communication and infant disorganization. Dev Psychopathol 17: 1-23. https://doi.org/10.1017/s0954579405050017
[77]	Lyons-Ruth K (2006) Play, precariousness, and the negotiation of shared meaning: A developmental research perspective on child psychotherapy. J Infant Child Adolesc Psychotherapy 5: 142-159. https://doi.org/10.2513/s15289168jicap0502_2
[78]	Sleed M, Bland K (2007) . London: North Central London Strategic Health Authority.
[79]	Barlow J, Bennett C, Midgley N, et al. (2015) Parent-infant psychotherapy for improving parental and infant mental health. Cochrane Database Syst Rev 1. https://doi.org/10.1002/14651858.cd010534.pub2
[80]	Barlow J, Bennett C, Midgley N, et al. (2016a) Parent–infant psychotherapy: a systematic review of the evidence for improving parental and infant mental health. J Reprod Infant Psychol 34: 464-482. https://doi.org/10.1080/02646838.2016.1222357
[81]	Huang R, Yang D, Lei B, et al. (2020) The short-and long-term effectiveness of mother–infant psychotherapy on postpartum depression: A systematic review and meta-analysis. J Affect Disord 260: 670-679. https://doi.org/10.1016/j.jad.2019.09.056
[82]	Barlow J, Schrader-McMillan A, Axford N, et al. (2016) Attachment and attachment-related outcomes in preschool children–a review of recent evidence. Child Adolesc Ment Health 21: 11-20. https://doi.org/10.1111/camh.12138
[83]	Stern DN (1995) The Motherhood Constellation: A Unified View of Parent-infant Psychotherapy. New York: International Universities Press. https://doi.org/10.4324/9780429482489-12
[84]	Barrows P (1997) Parent-infant psychotherapy: a review article. J Child Psychother 23: 255-264. https://doi.org/10.1080/00754179708254545
[85]	Avdi E, Amiran K, Baradon T, et al. (2020) Studying the process of psychoanalytic parent–infant psychotherapy: Embodied and discursive aspects. Infant Ment Health J 41: 589-602. https://doi.org/10.1002/imhj.21888
[86]	Moore ER, Anderson GC, Bergman N, et al. (2021) Early skin-to-skin contact for mothers and their healthy new-born infants. Cochrane Database of Systematic Reviews . Available from: https://www.cochranelibrary.com/cdsr/doi/10.1002/14651858.CD003519.pub4/full
[87]	Ionio C, Ciuffo G, Landoni M (2021) Parent–Infant Skin-to-Skin Contact and Stress Regulation: A Systematic Review of the Literature. Int J Environ Res Public Health 18: 4695. https://doi.org/10.3390/ijerph18094695
[88]	Hertenstein MJ, Keltner D, App B, et al. (2006) Touch communicates distinct emotions. Emotion 6: 528-533. https://doi.org/10.1037/1528-3542.6.3.528
[89]	Osborne N, Fresquez C, Malloch S, et al. (2017) Rhythms of relating in children's therapies: Connecting creatively with vulnerable children. London: Jessica Kingsley Publishers.
[90]	Midgley N (2004) Sailing between Scylla and Charybdis: Incorporating qualitative approaches into child psychotherapy research. J Child Psychother 30: 89-111. https://doi.org/10.1080/0075417042000205814
[91]	Midgley N (2009) Research in child and adolescent psychotherapy: an overview. The Handbook of Child and AdolescentPpsychotherapy. Psychoanalytic Approaches . New York: Routledge 73-97. https://doi.org/10.4324/9780203877616
[92]	Kazdin AE (1996) Combined and multimodal treatments in child and adolescent psychotherapy: Issues, challenges, and research directions. Clini Psychol Sci Pract 3: 69-100. https://doi.org/10.1111/j.1468-2850.1996.tb00059.x
[93]	Weisz JR, Kuppens S, Ng MY, et al. (2017) What five decades of research tells us about the effects of youth psychological therapy: a multilevel meta-analysis and implications for science and practice. Am Psychol 72: 79-117. https://doi.org/10.1037/a0040360
[94]	Laska KM, Gurman AS, Wampold BE (2014) Expanding the lens of evidence-based practice in psychotherapy: a common factors perspective. Psychotherapy 51: 467. https://doi.org/10.1037/a0034332
[95]	Lambert MJ, Ogles BM (2014) Common factors: Post hoc explanation or empirically based therapy approach?. Psychotherapy 51: 500-504. https://doi.org/10.1037/a0036580
[96]	Barron IG, Bourgaize C, Lempertz D, et al. (2019) Eye movement desensitization reprocessing for children and adolescents with posttraumatic stress disorder: A systematic narrative review. J EMDR Pract Res 13: 270-283. https://doi.org/10.1891/1933-3196.13.4.270
[97]	Moreno-Alcazar A, Treen D, Valiente-Gomez A, et al. (2017) Efficacy of eye movement desensitisation and reprocessing in children and adolescents with post-traumatic stress disorder: A meta-analysis of randomised controlled trials. Front Psychol 8: 1750. https://doi.org/10.3389/fpsyg.2017.01750
[98]	de Jongh A, Amann BL, Hofmann A, et al. (2019) The status of EMDR therapy in the treatment of posttraumatic stress disorder 30 years after its introduction. J EMDR Pract Res 13: 261-269. https://doi.org/10.1891/1933-3196.13.4.261
[99]	Rodenburg R, Benjamin A, de Roos C, et al. (2009) Efficacy of EMDR in children: A meta-analysis. Clin Psychol Rev 29: 599-606. https://doi.org/10.1016/j.cpr.2009.06.008
[100]	Beer R (2018) Efficacy of EMDR therapy for children with PTSD: A review of the literature. J EMDR Pract Res 12: 177-195. https://doi.org/10.1891/1933-3196.12.4.177
[101]	NICE GuidanceDepression in children and young people: identification and management (2019). Available from: https://www.nice.org.uk/guidance/ng134/chapter/Recommendations
[102]	Oud M, De Winter L, Vermeulen-Smi E, et al. (2019) Effectiveness of CBT for children and adolescents with depression: A systematic review and meta-regression analysis. Eur Psychiat 57: 33-45. https://doi.org/10.1016/j.eurpsy.2018.12.008
[103]	Hetrick SE, Cox GR, Witt KG, et al. (2016) Cognitive behavioural therapy (CBT), third-wave CBT and interpersonal therapy (IPT) based interventions for preventing depression in children and adolescents. Cochrane Database Syst Rev 8. https://doi.org/10.1002/14651858.CD003380.pub4
[104]	Klein JB, Jacobs RH, Reinecke MA (2007) Cognitive-behavioral therapy for adolescent depression: a meta-analytic investigation of changes in effect-size estimates. J Am Acad Child Adolesc Psychiatr 46: 1403-1413. https://doi.org/10.1097/chi.0b013e3180592aaa
[105]	Yang L, Zhou X, Zhou C, et al. (2017) Efficacy and acceptability of cognitive behavioral therapy for depression in children: A systematic review and meta-analysis. Acad Pediatr 17: 9-16. https://doi.org/10.1016/j.acap.2016.08.002

This article has been cited by:

1.	Zhengtao Luo, Liyi Yu, Zhaochun Xu, Kening Liu, Lichuan Gu, Comprehensive Review and Assessment of Computational Methods for Prediction of N6-Methyladenosine Sites, 2024, 13, 2079-7737, 777, 10.3390/biology13100777
2.	Muhammad Nabeel Asim, Muhammad Ali Ibrahim, Tayyaba Asif, Andreas Dengel, RNA Sequence Analysis Landscape: A Comprehensive Review of Task Types, Databases, Datasets, Word Embedding Methods, and Language Models, 2025, 24058440, e41488, 10.1016/j.heliyon.2024.e41488
3.	Shuyan Cheng, Yishu Wei, Yiliang Zhou, Zihan Xu, Drew N Wright, Jinze Liu, Yifan Peng, Deciphering genomic codes using advanced natural language processing techniques: a scoping review, 2025, 1067-5027, 10.1093/jamia/ocaf029

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Medical Science

0.7

Metrics

Article views(1714) PDF downloads(171) Cited by(0)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Tables(3)

AIMS Medical Science

Integrative child psychotherapy: discussion of a common core and unified theory approach

Related Papers:

Abstract

1. Introduction

2. Materials and methods

2.1. Benchmark datasets

2.2. Encoding of RNA segments

2.3. Deep network and transfer learning

2.3.1. Deep learning for building the source domain-stage model

2.3.2. Transfer learning for the construction of the target domain-stage model

2.3.3. Evaluation indicators

3. Results

3.1. Comparison with other different learning models based on benchmark dataset

3.2. Comparison with other different learning models based on independent dataset

3.3. Comparison with other different learning models in the target domain-stage

3.4. Comparison with state-of-the-art approaches

3.5. Assessing model generalization ability

3.6. Web server

4. Discussion

5. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Medical Science

Integrative child psychotherapy: discussion of a common core and unified theory approach

Related Papers:

Abstract

1. Introduction

2. Materials and methods

2.1. Benchmark datasets

2.2. Encoding of RNA segments

2.3. Deep network and transfer learning

2.3.1. Deep learning for building the source domain-stage model

2.3.2. Transfer learning for the construction of the target domain-stage model

2.3.3. Evaluation indicators

3. Results

3.1. Comparison with other different learning models based on benchmark dataset

3.2. Comparison with other different learning models based on independent dataset

3.3. Comparison with other different learning models in the target domain-stage

3.4. Comparison with state-of-the-art approaches

3.5. Assessing model generalization ability

3.6. Web server

4. Discussion

5. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog