Department of Child and Adolescent Psychiatry, Psychiatry Region Zealand, 4000 Roskilde, Denmark; Department of Clinical Medicine, Faculty of Medical and Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
2.
Department of Adult Psychiatry, Mental Health Centre Northern Zealand, Mental Health Services of the Capital Region, 3400 Hillerød, Denmark
Received:
14 January 2020
Accepted:
27 May 2020
Published:
29 May 2020
The present article is a theoretical contribution suggesting a simple model of the dynamics involved in the development of neurodevelopmental disorders and the phenomenon of autism offering an explanation of why different individuals have different onset of manifest disease. The model relates to present genetic and epigenetic evidence and previous theoretical models. The dynamic model applies an individualized transdiagnostic and dimensional approach integrating several levels of influence involved in the dynamic interaction between an individual and their environment across time. The dynamic model illustrates the interaction between a basic neurobiological susceptibility, compensating mechanisms, and stress-related releasing mechanisms involved in the development of manifest clinical illness. The model has a particular focus on the dynamics of neurocognitive processes and their relationship to more basic information, psychological and social processes. A basic assumption guiding the model is that even quite normal events related to typical development may increase the risk of enduring stress in cognitively vulnerable individuals, further increasing their risk of developing manifest clinical illness. The model suggests that genetic variation, endogenous epigenetic processes of development, and epigenetic changes influenced by physical/chemical and social environmental risk and resilience factors all may contribute to phenotypical expression. A genetic susceptibility may translate into a biological susceptibility reflected in cognitive impairments. A cognitively vulnerable individual may be at increased risk of misinterpreting sensory inputs, potentially resulting in psychopathological expressions, e.g., autistic symptoms, observed across several neurodevelopmental disorders. The genetically influenced experience of an individual relates to cognitive phenomena occurring at the interfaces between the brain, mind, and society. The severity of clinical illness may differ from the severity of a disorder of reasoning. The dynamic model may help guide future development of personalized medicine in psychiatry and identify relevant points of intervention. A discussion of the implications of the model relating to epigenetic evidence and to previous theoretical models is included at the end of the paper.
Citation: Bodil Aggernæs. Suggestion of a dynamic model of the development of neurodevelopmental disorders and the phenomenon of autism[J]. AIMS Molecular Science, 2020, 7(2): 122-182. doi: 10.3934/molsci.2020008
Related Papers:
[1]
Hui Xu, Jun Kong, Mengyao Liang, Hui Sun, Miao Qi .
Video behavior recognition based on actional-structural graph convolution and temporal extension module. Electronic Research Archive, 2022, 30(11): 4157-4177.
doi: 10.3934/era.2022210
[2]
Bingjie Zhang, Junchao Yu, Zhe Kang, Tianyu Wei, Xiaoyu Liu, Suhua Wang .
An adaptive preference retention collaborative filtering algorithm based on graph convolutional method. Electronic Research Archive, 2023, 31(2): 793-811.
doi: 10.3934/era.2023040
[3]
Ruxin Xue, Jinggui Huang, Zaitang Huang, Bingyan Li .
Reconstructed graph spatio-temporal stochastic controlled differential equation for traffic flow forecasting. Electronic Research Archive, 2025, 33(4): 2543-2566.
doi: 10.3934/era.2025113
[4]
Yunfei Tan, Shuyu Li, Zehua Li .
A privacy preserving recommendation and fraud detection method based on graph convolution. Electronic Research Archive, 2023, 31(12): 7559-7577.
doi: 10.3934/era.2023382
[5]
Suhua Wang, Zhen Huang, Hongjie Ji, Huinan Zhao, Guoyan Zhou, Xiaoxin Sun .
PM2.5 hourly concentration prediction based on graph capsule networks. Electronic Research Archive, 2023, 31(1): 509-529.
doi: 10.3934/era.2023025
[6]
Min Li, Ke Chen, Yunqing Bai, Jihong Pei .
Skeleton action recognition via graph convolutional network with self-attention module. Electronic Research Archive, 2024, 32(4): 2848-2864.
doi: 10.3934/era.2024129
[7]
Yongmei Zhang, Zhirong Du, Lei Hu .
A construction method of urban road risky vehicles based on dynamic knowledge graph. Electronic Research Archive, 2023, 31(7): 3776-3790.
doi: 10.3934/era.2023192
[8]
Chengyong Yang, Jie Wang, Shiwei Wei, Xiukang Yu .
A feature fusion-based attention graph convolutional network for 3D classification and segmentation. Electronic Research Archive, 2023, 31(12): 7365-7384.
doi: 10.3934/era.2023373
[9]
Wenrui Guan, Xun Wang .
A Dual-channel Progressive Graph Convolutional Network via subgraph sampling. Electronic Research Archive, 2024, 32(7): 4398-4415.
doi: 10.3934/era.2024198
[10]
Xinzheng Xu, Yanyan Ding, Zhenhu Lv, Zhongnian Li, Renke Sun .
Optimized pointwise convolution operation by Ghost blocks. Electronic Research Archive, 2023, 31(6): 3187-3199.
doi: 10.3934/era.2023161
Abstract
The present article is a theoretical contribution suggesting a simple model of the dynamics involved in the development of neurodevelopmental disorders and the phenomenon of autism offering an explanation of why different individuals have different onset of manifest disease. The model relates to present genetic and epigenetic evidence and previous theoretical models. The dynamic model applies an individualized transdiagnostic and dimensional approach integrating several levels of influence involved in the dynamic interaction between an individual and their environment across time. The dynamic model illustrates the interaction between a basic neurobiological susceptibility, compensating mechanisms, and stress-related releasing mechanisms involved in the development of manifest clinical illness. The model has a particular focus on the dynamics of neurocognitive processes and their relationship to more basic information, psychological and social processes. A basic assumption guiding the model is that even quite normal events related to typical development may increase the risk of enduring stress in cognitively vulnerable individuals, further increasing their risk of developing manifest clinical illness. The model suggests that genetic variation, endogenous epigenetic processes of development, and epigenetic changes influenced by physical/chemical and social environmental risk and resilience factors all may contribute to phenotypical expression. A genetic susceptibility may translate into a biological susceptibility reflected in cognitive impairments. A cognitively vulnerable individual may be at increased risk of misinterpreting sensory inputs, potentially resulting in psychopathological expressions, e.g., autistic symptoms, observed across several neurodevelopmental disorders. The genetically influenced experience of an individual relates to cognitive phenomena occurring at the interfaces between the brain, mind, and society. The severity of clinical illness may differ from the severity of a disorder of reasoning. The dynamic model may help guide future development of personalized medicine in psychiatry and identify relevant points of intervention. A discussion of the implications of the model relating to epigenetic evidence and to previous theoretical models is included at the end of the paper.
1.
Introduction
In our lives, non-Euclidean structured data exist in various forms, such as molecular structures in chemical analysis, sensor networks in communication networks and social networks in social sciences. Traditional convolutional neural networks (CNNs) cannot perform local feature extraction on graph data, mainly because these data have unstructured features, and the surrounding structure of each node may be unique. Researchers still hope to extend deep learning models in non-Euclidean domains, because deep learning is a proven powerful tool for solving a series of problems in acoustics, computer vision and natural language processing.
In recent years, more and more researchers have paid attention to using deep learning methods to analyze graph data. These methods are mainly divided into two categories. One is to embed nodes in a low-dimensional vector space to constrain proximity to learn a uniform length expression for each node, and the representative work is network embedding. The other is to apply different deep-learning models to the graph. Representative work can be divided into five categories according to the model structure and training strategy. These categories are graph recurrent neural networks, graph convolutional networks (GCNs), graph autoencoders, graph reinforcement learning and graph adversarial methods [1]. Among them, the GCN is the most outstanding branch of the graph deep learning model. GCNs play the same role as a CNN, which is a feature extractor. GCNs can process non-Euclidean structured data that traditional deep learning models such as CNN and recurrent neural networks cannot process. The core part of the GCN is introduced in Section 2.1.1 of this paper. The main contents of this paper are as follows. Section 1 introduces the development history, spectral domain methods and spatial domain methods of GCNs. Section 2 introduces some basic models of GCN and graph pooling methods. Section 3 introduces the research progress of graph convolutional networks in the face of multiple challenges. Section 4 introduces applications of GCNs in various fields. Section 5 discusses the future trends and draws the conclusion.
1.1. Development history of GCNs
The original idea of the definition of graph convolutional operation comes from the domain of graph signal processing [2,3,4]. Bruna et al. proposed the first CNNs formula for the graph, which is defined as using convolution in the spectral domain based on the Laplacian of the graph [5]. Duvenaud et al. [6], Kipf and Welling [7] and Atwood and Towsley [8] proposed various GCN models, in which the traditional graph filter was replaced by an adjacency matrix with self-loops. When updating network weights, propagation rules were used to calculate the output of each neural network layer. Defferrard et al. [9] extended this type of GCN model by using fast local spectral filters and effective merging operations. In recent years, there has been a surge in the application of convolution for graphs, and GCN related models have appeared in various forms. Graph convolution can be further divided into two branches: spectral convolution and spatial convolution. When a polynomial spectral kernel is used, these two methods can be overlapped.
1.2. Spectral convolution of GCNs
Spectral convolution uses the graph Fourier transform or its extension to transform the node representation into the spectral domain. The convolution theorem was used to perform convolution operations.
For the first time, the graph Laplacian matrix L was used in the spectral CNN, which has a convolutional layer like the classic Euclidean CNN proposed by Bruna et al. [5], to introduce the graph data convolution from the spectral domain. The graph convolutional operation ∗G is defined as follows:
u1∗Gu2=Q((QTu1)⊙(QTu2))
(1)
The Laplacian matrix L is defined as L = D-A, where D is the degree matrix, and A is the adjacency matrix of the graph, and the definition applies to all Laplacian matrices in this paper. u1,u2∈RN are two signals defined on the node, and N is the number of nodes. Q is the eigenvectors of L. Multiplying QT can transform the graph signals u1 and u2 into the spectral domain, that is, the graph Fourier transform. Then, multiply by Q for inverse transformation; ⨀ represents element multiplication.
Applying different filters to different input-output signal pairs, the definition of the spectral convolutional layer is as follows:
ul+1j=ρ(∑fli=1QΘli,jQTuli)j=1,⋯,fl+1
(2)
where l is the layer, uljϵRN is the jth signal of the node in the lth layer, and fl is the dimensionality of the hidden representation in the lth layer. Θli,j is a learnable filter, and ρ(⋅) is the activation function. Equation (2) indicates that the input signal passes through a set of learnable filters to gather information and then performs some nonlinear transformations to obtain the output signal.
Since the eigenvector Q needs to explicitly calculate the eigendecomposition of the graph Laplacian matrix, the time complexity is O(N3). Even if the feature value can be calculated in advance, it requires O(N2) time complexity, which means that it is unrealistic to extend this method to large graphs. On the other hand, because the filter depends on the eigenbasis Q, for graphs of different structures and sizes, the convolutional layer cannot be unified due to the change of dimensionality, and the parameters cannot be shared, so the spectral method cannot be applied to graphs of different structures and sizes.
To solve the problem of a spectral CNN's high computational complexity and the filter not being local, Defferrard et al. [9] proposed ChebNet, which uses a polynomial filter of the form
Θ(Λ)=∑kk=0θkΛk
(3)
where θ0,⋯,θk are learnable parameters, and k is the order of the polynomial. Λ is the eigenvalue of the graph Laplacian matrix L. The K-polynomial filter realizes strictly local localization by integrating the node features in the k-hop neighborhood [5] so that the number of learnable parameters decreases to O(K)=O(1).
In order to further reduce the computational complexity, Chebyshev polynomials are used to approximate the calculation of the spectral convolution. By using the recursive relationship of Chebyshev polynomials Tk(x)=2xTk−1(x)−Tk−2(x) and T0(x)=1,T1(x)=x, (3) is rewritten as follows:
Θ(Λ)=∑kk=0θkTk(˜Λ)
(4)
where ˜Λ=2Λλmax−I is the rescaled eigenvalue, which is in the range of [-1, 1]. λmax is the largest eigenvalue, IϵRN×N is the identity matrix, and Tk(x) is a Chebyshev polynomial of order k. Equation (4) is essentially a K-order polynomial of the graph Laplacian, so it is K-range, that is, it is only affected by the K-order neighborhood of the node, so the computational complexity of (4) becomes O(|ε|), where |ε| is the number of edges. Combining (4) and (2), the convolutional layer of the ChebNet can be expressed as
where Θli,j is the k-dimensional parameter vector of the ith column of the input feature map and the jth column of the output feature map, which can be learned filters.
ChebNet and its extension combine spectral convolution with spatial structure; spectral convolution is equivalent to spatial convolution when the spectral convolution function is polynomial or first order. For example, the GCN proposed by Kipf and Welling [7] further simplifies the filter by only intercepting the first-order Chebyshev polynomial, which is not only spectral convolution but also spatial convolution.
There are also some spectrum methods that do not use Chebyshev polynomial expansion. For example, CayleyNet [10] defines graph convolution with a complex coefficient Cayley polynomial and Cayley transform:
Θ(Λ)=θ0+2Re{∑Kk=1θk(θhΛ−iI)k(θhΛ+iI)k}
(6)
where i=√−1 is the imaginary unit, and θh is another spectrum scaling parameter. Re(⋅) denotes a real-valued function with a complex coefficient. Because of the nonlinear characteristic of the Cayley transform, we can adjust θh to achieve the detection of the concerned frequency bands to obtain better results.
In addition, Levie et al. [11] conducted research on the transferability of spectral GCNs, proving that spectral GCNs always generalize to unseen signals on graphs. They overturned a misconception that using a spectral GCN is inappropriate when the data consist of many different graphs and many different signals on these graphs.
1.3. Spatial convolution of GCNs
Spatial convolution determines graph convolution by aggregating neighbor feature information in the vertex domain.
MoNet, proposed by Monti et al. [12], is a general GCN framework that was created by designing a universal patch operator to integrate signals about nodes. For the neighborhood nodes yϵN(x) of node x on a graph, a d-dimensional pseudo coordinate u(x,y) is defined to associate with it and fed into J learnable kernel functions (ω1(u),⋯,ωJ(u)). Therefore, the patch operator can be expressed as follows:
Dj(x)f=∑y∈N(x)ωj(u(x,y))f(y),j=1,⋯,J
(7)
where Dj(x)f denotes a patch on the manifold, and f(y) is the signal value at node y. The graph convolution of the spatial domain based on the patch operator is defined as follows:
(f∗Gg)(x)=∑Jl=1gjDj(x)f.
(8)
Among them, ∗G is graph convolution operation, and gj is the convolution kernel. By properly selecting u(x,y) and kernel function ωJ(u), many existing graph convolution models can be regarded as a special case of MoNet.
SplineCNN [13] also designs a universe patch operator to integrate local information but uses different convolution kernels based on the B-spline function.
GraphSAGE [14] is a representation learning method based on aggregate functions proposed by Hamilton et al. A fixed number of neighbors are uniformly sampled to provide minibatch variants. Multiple aggregate functions are used. The selection of aggregate functions includes average aggregator, LSTM aggregator and pooling aggregator. The GraphSAGE using the average aggregator is similar to the GCN model [7].
WGCN [15] distinguishes the in-neighbor and out-neighbor weights of nodes according to the local topology of nodes, which captures the structural information of nodes. Nodes are embedded into the latent space, and higher-order dependencies and latent geometric relationships of nodes are obtained to learn nodes' representations.
Some studies [16,17,18,19] have shown that spatial graph convolutional networks are vulnerable to adversarial attacks, but it is unclear why the attacks succeed. Liang et al. [20] confirmed that it is non-robust aggregation functions that lead to the structural vulnerability of GCNs. The spatial graph convolution aggregation scheme, i.e., weighted mean, has a low breakdown point, and its output can be changed significantly by injecting a single edge. Then, Liang et al. [20] proposed trimmed mean and median aggregation functions with high breakdown points to improve GCNs' robustness against structural attacks.
2.
Graph convolutional neural network models
2.1. Basic models based on graph convolutional neural networks
2.1.1. Graph convolutional neural network
The graph convolutional neural network (GCN) proposed by Kipf et al. [7] was applied to the semi-supervised classification problem of graphs. For a given undirected graph G=(V,ξ,X), where V is the vertex set, the number of vertices is |V|=n. ξ is the edge set. X={x1,x2,⋯,xn}T∈Rn×c is the feature matrix, and xi∈Rc represents the c-dimensional feature vector of node i. The goal of the GCN model is to use a small part of the marked nodes, combined with the graph structure, to predict the marked scores of the remaining unmarked nodes for marking.
This model uses a first-order Chebyshev polynomial in each GCN layer and only uses the first two terms of the Chebyshev polynomial to approximate the convolution on the graph to construct the transfer rule of the graph convolutional network:
H(l+1)=σ(ˊD−12ˊAˊD−12H(l)W(l))
(9)
where ˊA=A+IN is the adjacency matrix of the undirected graph G with a self-loop, ˊDii=∑jˊAij is the diagonal degree matrix, H(l) refers to the hidden representation in the lth layer, H(0)=X, W(l) is the trainable weight matrix of a specific layer, and σ=(⋅) is the activation function. The multilayer GCN structure will have multilayer propagation, which can realize feature propagation between multilevel adjacent nodes.
The GCN model uses a two-layer GCN structure and applies a softmax classifier on the output features. According to the transfer rule of the network, the specific form of forwarding calculation for semi-supervised node classification on a graph with a symmetric adjacency matrix A is
Z=f(X,A)=softmax(ˊAReLU(ˊAXW(0))W(1))
(10)
where ˊA=ˊD−12ˊAˊD−12, and W(0)∈RC×H and W(1)∈RH×F represent the weight matrices from the input layer to the hidden layer and the hidden layer to the output layer, respectively. The hidden layer has H features. C is the number of input channels, and F is the number of feature maps of the output layer. The first layer of the activation function is the linear rectification function, and the second layer is the softmax function. The softmax function form is softmax(xi)=1Zexp(xi),Z=∑iexp(xi), and softmax is suitable for every row. X is the input signal, a sample with C-dimensional features. Each sample output has a corresponding F-dimensional vector, where component f represents the probability of the sample being classified into class f.
The graph convolutional layer of the GCN model is a special Laplacian smoothing, mixing nodes and their domain characteristics. The smoothing operation makes the features of neighboring nodes similar, which greatly simplifies the classification task, but the multilayer graph convolution will make the output feature excessively smooth, and the nodes of different clusters cannot be effectively distinguished. The shallow GCN model also has limitations: Many additional labeled nodes are required for verification, and it is affected by the localized characteristics of the convolution filter.
In addition, the GCN model has disadvantages such as poor scalability, inability to handle large-scale graphs and inability to handle directed graphs. The specific improvement measures will be explained in the Research Progress part of Section 3. Table 1 summarizes the advantages and disadvantages of several classic GCN models.
Table 1.
An overview of the advantages and disadvantages of the classic GCN models.
Type of model
Description
Advantages
Disadvantages
GCN
For the first time, the convolution operation in image processing is applied to graph-structured data processing
●Non-Euclidean structured data ●Capture global information of graphs ●Well characterized node features and edge features
●Poor flexibility, poor scalability, limited to undirected graphs and shallow layers ●GCN training needs to know the structural information of the entire graph
GraphSAGE
Improve GCN scalability and training methods
●Overcome the limitations of memory and video memory during GCNs training ●Able to handle large scale graphs ●The parameters of the aggregator and weight matrix are shared for all nodes
●Unable to process weighted graph ●Neighbor sampling will lead to large gradient variance during backpropagation ●The limited number of samples will lead to the loss of important local information of some nodes
GAT(Graph attention network)
Add attention mechanism to give importance to edges between nodes
●Prone to oversmoothing ●Poor training mode ●Large number of parameters
●High computational efficiency ●Attention mechanism ●No need to know the whole graph structure ●Can be used for transductive learning and inductive learning
GraphSAGE [14] is an inductive learning framework. In the specific implementation, it only retains edges from training samples to training samples during training and includes two steps: Sample and Aggregate. Sample refers to how to sample the number of neighbors, and Aggregate refers to aggregating the embeddings of neighboring nodes to update their own embedding information. Figure 1 shows the process of GraphSAGE generating a target node (red) embedding and predicting for downstream tasks.
Figure 1.
GraphSAGE [14] sampling and aggregation approach.
The first step is to sample the neighbors. In the second step, the sampled neighbor embedding is passed to the node, and an aggregation function is used to aggregate the neighbor information to update the node's embedding. The third step is to predict the label of the node according to the updated embedding.
2.1.3. Graph attention network
Velickovic et al. [21] were inspired by the attention mechanism and proposed an attention-based GAT model for node classification tasks on graph-structured data, following the self-attention mechanism [22]. When each node updates the hidden representation, the attention of its neighbors must be calculated.
The model input is the set of node features h={→h1,→h2,⋯,→hN},→hi∈RF, where N is the number of nodes, and F is the number of features for each node, that is, the length of the feature vector. The output set is h′={→h′1,→h′2,⋯,→h′N},→h′i∈RF′. The number of features of each node becomes F′. The input feature undergoes at least one linear transformation to obtain the output feature, so a weight matrix W∈RF′×F is defined to be applied to the feature of each node. The function a for calculating the attention coefficient eij is a single-layer feedforward network, →a∈R2F′ is determined by a weight vector, and LeakyReLU is used for processing, so the specific form of the attention mechanism is
where eij=a(W→hi,W→hj) is the attention coefficient, and || is the concatenation operation. The normalized attention coefficients αij are used to calculate the linear combination of their corresponding features and used as the final output feature of each node. In addition to the weighted summation, a nonlinear function is also needed, namely,
→h′i=σ(∑j∈NiαijW→hj).
(12)
To make the model more stable, a multi-head attention mechanism is also proposed, that is, not only using a function a to calculate the attention coefficient but also using a set of k functions, and each function calculates a set of attention coefficients and a set of weighted summations of the coefficient αij. In each convolution layer, k attention mechanisms work independently, calculate their results separately and connect them to obtain the convolution result:
→h′i=K||k=1σ(∑j∈NiakijWk→hj)
(13)
where akij is calculated using the attention coefficient calculated by the kth function ak. The whole process is shown in Figure 2.
Figure 2.
An illustration of multi-head attention (with K = 3 heads) by node 1 in its neighborhood. Different arrow styles and colors denote independent attention computations. The aggregated features from each head are concatenated or averaged to obtain →h′1.
The characteristics and advantages of the GAT model are as follows: (1) The calculation is efficient, with parallel computing of the attention of each node and its neighbor nodes on all edges, and the output features on all nodes can be calculated in parallel. (2) It can be applied to graph nodes of different degrees by assigning different weights to nodes of the same neighborhood. The attention mechanism is shared for all edges, without matrix operations and without prior knowledge of the graph structure. This solves the problem that the GCN model must be based on the corresponding graph structure to learn the Laplacian matrix. (3) It can be directly applied to the inductive learning problem. There is no problem that there is always an order in the neighborhood node of each node [14]. The model can be applied to completely unknown graph tasks. This is to make up for the shortcoming of the GCN model that a model trained on one graph structure cannot be applied to other graph structures. (4) GAT can be redefined as a special case of MoNet, that is, select the pseudo coordinate function u(x,y)=f(x)||f(y), where f(x) represents the feature (potentially MLP-transformed) of node x, || is concatenation, and set the kernel function ωJ(u)=softmax(MLP(u)) (softmax is executed on the entire neighborhood of the node). MoNet's patch operator will be similar to GAT. Here, the GAT model uses node features for similarity calculation, not the structural attributes of nodes.
2.1.4. Other models
Zhuang et al. [23] proposed a dual graph convolutional network (DGCN) for graph-based semisupervised learning that not only considers local consistency but also encodes global consistency using PPMI. Hu et al. [24] proposed a deep hierarchical graph convolutional network model (H-GCN) to address the problem that traditional models based on neighborhood aggregation are limited to shallow layers and have difficulty obtaining global information. Bayesian GCN [25] and Self-Ensembling GCN (SEGCN) [26] can make full use of unlabeled nodes. The neighborhood adaptive graph convolutional network (NAGCN) [27] can effectively learn the representation of each node. Hyperbolic graph convolutional networks (HGCNs) [28] can learn the inductive node representation of hierarchical or scale-free graph structure data and demonstrate powerful representation ability to model graphs with hierarchical structures. HGCNs work on hyperbolic manifolds with tangent spaces, which are just local approximations of a manifold. Dai et al. [29] proposed a hyperbolic-to-hyperbolic graph convolutional network (H2H-GCN) that directly works on hyperbolic manifolds. H2H-GCN [29] avoids the distortion caused by the tangent space approximation and keeps the global hyperbolic structure.
Figure 3 summarizes the performances of different GCN models on different graph datasets. CiteSeer, Cora and PubMed are citation network datasets, where each sample point is a scientific paper, and each paper cites at least one other paper or is cited by another paper, treating citation links between papers as (undirected) edges. NELL is a knowledge graph dataset with 55,864 relational nodes and 9891 entity nodes.
Figure 3.
The performance of different GCN models on different graph datasets.
The results in Figure 3 show that for the same graph data, there are some differences in the node classification accuracy of different GCN models, but the differences are not significant. These results confirm the power of GCN models for graph data processing tasks.
2.2. Graph pooling method
The tasks on the graph can be roughly divided into two categories: node-level tasks and graph-level tasks. Node-level tasks, including node classification, link prediction and node recommendation, are related to nodes in the graph; and graph-level tasks, including graph classification and graph generation, are related to the entire graph. Graph convolutional neural networks generally add pooling operations in graph-level tasks and use pooling operators to reflect the hierarchical structure of the network.
There are few existing graph pooling methods, which are mainly divided into three types: topology-based pooling, global pooling and hierarchical pooling. The topology-based pooling method only considers the topological structure of the graph, and representative works include ChebNet [9] and the hybrid model proposed by Rhee et al. [30]. The global pooling method only considers the characteristics of the graph, including representative works such as a general framework for graph classification (taking graph neural networks (GNNs) as a message passing scheme, using the Set2Set method to obtain the representation of the entire graph) proposed by Gilmer et al. [31] and SortPool (according to the structure of the graph, sorting the embeddings of the nodes) proposed by Zhang et al. [32]. The main motivation of the hierarchical pooling method is to build a model that can learn the feature-based or topology-based node assignment in each layer and allow the neural network to end-to-end learn the assignment matrix to obtain a scaled-down graph. Typical hierarchical pooling methods include DIFFPOOL [33], EigenPooling [34] and SAGPool [35]. The three of these common and typical hierarchical pooling methods and their connections are described below.
DIFFPOOL [33] is a differentiable graph pool module proposed by combining GNN and the pooling operation, similar to CNN, which can capture the hierarchical information of the graph. The main mechanism of hierarchical pooling DIFFPOOL is to learn differentiable cluster assignments S(l) for nodes based on features or topology in each layer of the deep GNN, map the nodes to a set of clusters and use these clusters as coarse input to the next layer GNN. Figure 4 shows the working mechanism of DIFFPOOL.
Figure 4.
The working mechanism of DIFFPOOL [33]. Run a GNN model at each layer to obtain the embedding of the node. Then, these learned embeddings are used to cluster the nodes together to form a coarsened graph, and another GNN layer is run on the coarsened graph. The whole process is repeated for the L layer, and the final output representation is used to classify the graph.
A GNN model is run in each layer, and the adjacency matrix and features (A(l),X(l)) are input to obtain the node embedding Z(l)=GNNl,embed(A(l),X(l)) and cluster assignment matrix S(l)=softmax(GNNl,pool(A(l),X(l))). After the DIFFPOOL differentiable pooling layer, the input of the next layer coarsening graph is obtained, (A(l+1),X(l+1))=DIFFPOOL(A(l),Z(l)), where X(l+1)=S(l)TZ(l)∈Rnl+1×d, A(l+1)=S(l)TA(l)S(l)∈Rnl+1×nl+1.
EigenPooling [34] is based on the idea of DIFFPOOL [33] and combines the pooling layer of the pooling operator based on the graph Fourier transform with the traditional GCN convolutional layer to form a graph neural network framework for graph classification. The pooling operator and the graph coarsening method based on the subgraph jointly constitute the pooling layer, and the features of the entire graph are extracted hierarchically. EigenPooling uses the clustering method to cut the large graph into subgraphs G(k) that have no intersection with each other. The assignment matrix S is determined by spectral clustering, and the feature matrix Xcoar of the coarsened graph is determined by the graph Fourier transform. Different from the coarsening graph feature matrix and the assignment matrix in DIFFPOOL, both are obtained through a single graph network. Figure 5 shows an example of EigenPooling for graph classification.
Figure 5.
An example of graph classification by EigenPooling [34]. In this example, the graph is coarsened three times to form a single supernode.
The assignment matrix of EigenPooling represents the relationship between node vi and all subgraphs in graph G and is defined as
S[i,k]=1ifandonlyifvi∈Γ(k)
(14)
where Γ(k) represents a list of all nodes in graph G(k). After graph G is coarsened, the subgraph becomes a supernode. The adjacency matrix of the coarsened graph Gcoar is expressed as Acoar=STAextS, that is, the adjacency matrix representing the edge relationship between super nodes. Aext=A−Aint represents the adjacency matrix that only contains the edges between the subgraph and the subgraph. A(k) represents the adjacency matrix of the node connection in subgraph G(k), and Aint represents the adjacency matrix that does not contain the edge between the subgraphs.
The feature matrix X of graph G after being downsampled by ΘTl is represented as Xl∈RK×d, and the kth row represents the downsampled information of the kth subgraph G(k). Θl represents a matrix composed of the eigenvectors of all subgraphs (the eigenvectors of the Laplacian matrix). Concatenate all the downsampled Xl to obtain the final downsampled result Xpooled∈RK×d⋅Nmax. To simplify the operation, take H≪Nmax; then, the feature matrix of the coarsening graph is Xcoar=[X0,⋯,XH].
The number of parameters of DIFFPOOL depends on the number of nodes, and it has quadratic space complexity. Subsequently, Cangea et al. [36] introduced the hierarchical pooling gPool [37] to solve the complexity problem but did not consider the topology of the graph.
To further improve the graph pooling method, a Self-Attention Graph Pooling method (SAGPool) [35] was proposed based on gPool which considers the characteristics and topological structure of the graph at the same time and uses an End2End method to generate hierarchical representation with reasonable time and space complexity. Figure 6 shows the workflow of the SAGPool layer, which analyzes the importance of nodes through self-attention and selects some important nodes to replace the original graph, such as the operation of node classification.
SAGPool uses the graph convolution method to obtain the self-attention score. If the convolution formula of Kipf & Welling [7] is used, the self-attention score Z∈RN×1 is calculated as
Z=σ(ˊD−12ˊAˊD−12XΘatt).
(15)
Θatt∈RF×1 is the only parameter of the SAGPool layer. The parameters of SAGPool are the same, not considering the size of the input graph.
SAGPool uses the node selection method of gPool; even if the graph is small (few nodes), it still retains part of the nodes in the input graph. The pooling ratio k∈ is a hyperparameter that maintains the number of nodes. The first ⌈kN⌉ nodes are selected according to the value of Z:
idx=top−rank(Z,⌈kN⌉),Zmask=Zidx.
(16)
top−rank(Z,⌈kN⌉) indicates that the Z index is used for sorting, and the first kN nodes are selected. idx represents the get index operation, and Zmask is the feature attention mask.
The input graph is processed by the masking operation, as shown in Eq 17:
X′=Xidx,Xout=X′⊙Zmask,Aout=Aidx,idx
(17)
where Xidx is the feature matrix of the index arranged in rows (each row represents the feature vector of a node), and ⨀ is the broadcast elementwise product. Xout is the new feature matrix, and Aout is the new adjacency matrix. Aidx,idx is the rowwise and columnwise indexed adjacency matrix.
3.
Research progress of graph convolutional neural networks
Spectral domain GCNs are very powerful, but they still have some shortcomings: 1) Their scalability is poor. The training of GCNs needs to know the adjacency matrix of all training nodes and test nodes, so it is transductive and cannot handle large-scale graphs. 2) They are confined to undirected graphs because the Laplacian matrix used by GCNs requires that eigendecomposition must meet the condition of matrix symmetry, and the premise of spectral convolution is that the graph must be an undirected graph. 3) They are confined to shallow layers, because each layer of GCNs is a special Laplace smoothing, and adding a multilayer GCN structure will make the node attributes of the same connected component too similar, thus making the output features too smooth. At present, there are many works devoted to addressing these shortcomings. The spatial domain graph convolution method can compensate for these shortcomings of spectral domain GCNs, but because it saves memory at the expense of time efficiency, the complexity is high. This chapter separately introduces the latest research progress in solving large-scale graph data, directed graphs, multiscale graph tasks, dynamic graphs and relational graphs. Table 2 summarizes GCN-related methods and applications for complex graph tasks.
Table 2.
An overview of complex graph task methods.
Gao et al. proposed large-scale learnable graph convolutional networks (LGCNs) [38], which convert graph structure data into one-dimensional grid data and use traditional convolution operations on the graph. A learnable graph convolutional layer (LGCL) was proposed, which automatically selects a fixed number of neighbor nodes for each feature, that is, it sorts the neighbor nodes and takes the top k number of the feature as the k values of such features of the target node. The neighbor nodes are not enough, and 0 is used as a filler so that a matrix of the neighbor information of the node can be obtained. Figure 7 shows the process of LGCL constructing grid data. For each feature, the top k values (the features of neighboring nodes are sorted in descending order) are selected from the neighboring nodes; for example, the selection result of the first feature of the center node (orange) {9, 6, 5, 3} is obtained by taking the top 4 values of {9, 6, 5, 3, 0, 0}. For the (k+1) three-component feature vector obtained from the center node, connect to obtain one-dimensional grid data with k+1 positions and 3 channels. Finally, a 1-D convolutional neural network is used to generate the final feature vector, and a new vector with five features is used as the new feature representation of the central node. To allow the LGCN model to be trained on large-scale graphs, a subgraph training strategy is proposed to reduce the overhead of memory and computing resources.
Figure 7.
LGCL's process of constructing grid data [38]. The left part represents the process of selecting the top k (k = 4) values for each feature from the neighboring nodes. The right part represents a one-dimensional convolutional neural network.
The subgraph training strategy of large-scale graph data draws on the idea of random sampling and cropping patches. Given a graph, first, random sampling is performed for some initial nodes, and then breadth first search (BFS) is used to iteratively expand adjacent nodes into subgraphs. Through multiple iterations, the higher-order neighbor nodes of the initial node will be added. Figure 8 shows a process of subgraph selection. First, we randomly sample, take three (Ninit=3) initial nodes (orange) and aim to obtain a subgraph with 15 (Ns=15) nodes. Use breadth-first search (BFS) to iteratively expand neighboring nodes for the subgraph. In the first iteration, 5 blue nodes (not including themselves) among all first-order neighbor nodes of the 3 initial nodes are randomly selected. In the second iteration, 7 green nodes among the second-order neighbor nodes are randomly selected. Thus, a subgraph of 3 + 5 + 7 = 15 nodes is obtained, together with the corresponding adjacency matrix, as the input of LGCNs in the training iteration. With such an operation of randomly cropping subgraphs, large-scale graph data can be solved effectively.
Chiang et al. proposed a GCN algorithm called Cluster-GCN [39], which is based on a graph clustering structure and SGD training that can train a deep GCN on large-scale graph data. The graph clustering algorithm aims to construct partitions of nodes so that there are more graph links between nodes in the same partition than between nodes in different partitions to better capture the clustering and partition structure. Figure 9 shows the neighborhood spread difference between the traditional graph convolution and the clustering method used by Cluster-GCN. Cluster-GCN can avoid heavy neighborhood search and concentrate on neighbors in each same cluster.
Figure 9.
The difference in neighborhood expansion between traditional graph convolution and the clustering method used by Cluster-GCN [39]. The left side is the traditional graph convolution, and the right side is the Cluster-GCN.
The scheme of training a deep GCN on large-scale graph data and having another layer-dependent importance sampling (LADIES) [40] was proposed by Zou et al. LADIES selects its neighbor nodes according to the sampling nodes in the previous layer, constructs a bipartite graph between the two layers and calculates the importance probability. Then, a fixed number of nodes are sampled according to the calculated probability, and the process is performed recursively layer by layer to construct the entire calculation graph. As a new layer correlation sampling scheme, LADIES not only avoids the exponential expansion of the receptive field but also ensures the connectivity of the sampling adjacency matrix. LADIES can also solve the redundant calculation problem of Cluster-GCN.
Wang et al. [41] proposed a binary graph convolutional network (Bi-GCN), which binarizes both the network parameters and input node features. They modified the original matrix multiplication to a binary operation for acceleration. The Bi-GCN's network parameters and memory consumption of input node attributes can be significantly reduced by ~30 times, which can efficiently handle large-scale attributed graphs.
3.2. Directed graph
For directed graph processing, MotifNet [42] utilizes local graph motifs and uses motif-induced adjacencies in deep learning of graphs. Topology adaptive graph convolutional networks (TAGCN) [43], based on graph signal processing theory [3,4], process graph signals defined in the vertex domain and classify the entire graph signal. RCNNs [44] that deal with complex networks can also be used for directed networks.
In addition, in 2020, Li et al. proposed a fast directed graph convolutional network (FDGCN) [45], constructing a spectrum-based GCN of directed graphs. The spectrum-based GCN can make good use of the structural information of the graph. FDGCN is a scalable graph convolutional network based on the Laplacian of the directed graph. The convolution operator (fast local filter) has a linear relationship with the edges of the graph, so the directed graph can be directly calculated.
FDGCN combines the first-order Chebyshev polynomial approximation of the directed graph Laplacian spectral filter and the approximation of the fixed Perron vector to export the directed graph convolution operator. Based on the exported fast local convolution operator, a two-layer FDGCN model for semi-supervised classification was designed, as shown in Figure 10. FDGCN can take GCN as a special case. Combined with the graph generation model MMSBM [57] to improve FDGCN.
Figure 10.
Fast directed graph convolutional network [45]. FDGCN consists of two layers: the hidden layer and the output layer. The hidden layer has h units, and each unit has an n-dimensional vector, which is calculated from the input features X∈Rn×c. The output layer has l units, and l represents the number of categories.
For multiscale graphs with different step lengths, Sami et al. proposed a GCN network model, N-GCN [47], which combines graph convolutional networks (GCNs) [7] with a random walk strategy. Figure 11 shows the structure of the N-GCN model. Multiple instantiations are performed for the GCN model on the node pairs found at different distances of the random walk, and the power of the adjacency matrix is provided for each instantiation. Each instantiation runs on a different scale of the graph, performs end-to-end training, directly learns near or far neighbor information and then connects the output of instances of all optimized classification targets to a classification subnetwork. Figure 11 shows the process: Calculate the K power of ˊA, send each power and X together into r GCNs, the output of all K×r GCNs are connected in series according to the column and fed into the fully connected layer, and finally output the row prediction N×C of the known label. Calculate the cross-entropy error between row predictions to update the parameters of the classification subnetwork and all GCNs.
Figure 11.
N-GCN model structure [47]. ˊA is the normalized adjacency matrix, I is the identity matrix, X is the node feature matrix, ⊗ is the matrix multiplication operator, and C is each node output channel, that is, the size of the label space.
Later, Wan et al. proposed a multiscale dynamic GCN (MDGCN) [47] for the hyperspectral image classification problem, as shown in Figure 12. Using the multiscale information of the hyperspectral image, the dynamic map gradually refined in the convolution process is used to establish the multi-input map of different neighborhood scales, which can make full use of the spectral-spatial feature information of multiscale.
Figure 12.
MDGCN model architecture [47]. (a) represents the original hyperspectral image. (b) represents the use of the SLIC algorithm [58] to segment superpixels. (c) represents the multiscale dynamic graph convolutional network layers. Circles represent nodes, and different colors represent different land-cover types. The green line represents the edge, and the edge weight is gradually updated with the convolution operation on the node, thereby dynamically refining the graph. There are three scales in the graph, and softplus(⋅) represents the activation function. (d) The classification result of the multiscale output, using cross-entropy loss to compensate for the labeling error between the output and the seed superpixels.
In addition, LanczosNet was proposed by Liao et al. [48] and Luan et al. [49], offering two GCN architectures that use different scale information. LanczosNet uses the Lanczos algorithm to construct the low-rank approximation of the graph Laplacian using the tridiagonal decomposition of the Lanczos algorithm, effectively collecting multiscale information through the fast approximate calculation of the matrix power. A learnable spectral filter was designed to effectively improve model capacity. Luan et al. [49] stated that any convolution with a well-defined analytical spectrogram can be written as a block Krylov matrix and a special form of learnable parameter matrix. Therefore, spectrogram convolution and deep GCN are extended to the block Krylov subspace form, and two architectures are proposed: Snowball and truncated Krylov, both of which connect multiscale information in different ways. Both can be extended to deeper structures. Under certain conditions, the equivalence of the two architectures can be achieved.
3.4. Dynamic graph
For dynamic graphs, Manessi et al. [50] combined long short-term memory networks (LSTMs) and graph convolutional networks (GCNs) [7] to learn long short-term dependencies and graph structures, which can capture time information and properly manage structured data. LSTM is a special recurrent neural network that can improve long short-term dependent learning. GCN can effectively process graph structure information. Aldo et al. [51] combined GCN and a recurrent neural network to propose EvolveGCN, which uses GCN to learn node representation and GRU or LSTM to learn GCN parameters to capture the dynamics of dynamic graphs. EvolveGCN can cope with frequently changing graphs and does not need to know all the changes of the nodes in the graph in advance. When the amount of node feature information in the dataset is not rich enough, the graph data structure plays a more important role, and the effect of using LSTM will be better; otherwise, the effect of using GRU will be better. In addition, Qiu et al. [52] proposed a dynamic graph convolutional module (DGCM) for 2D multiperson pose estimation to solve large changes in human pose, and Song et al. [53] proposed a dynamic probabilistic graph convolution (DPG) model for facial action unit intensity estimation.
3.5. Relational graph
Michael et al. [54] used GCNs to model relational data (graph data with plenty of relations) and proposed the relational graph convolutional network (R-GCN), which was mainly used to solve two completion tasks of the knowledge base: link prediction and entity classification. Huang et al. [55] proposed a multirelational graph convolutional network (MR-GCN) to solve the semi-supervised node classification task of multirelational graphs using the eigendecomposition of Laplacian tensors instead of the discrete Fourier transform to perform graph convolution in the spectral domain. The spectral domain can be any unitary transform. Chen et al. [56] proposed a relational graph convolutional network (mRGCN) to study the problem of zero-shot food ingredient recognition. mRGCN encodes a multirelationship graph (including various relations between ingredient hierarchy, ingredient cooccurrence and ingredient attribute), which is convenient for better transfer of knowledge learned from familiar classes to unfamiliar classes in zero-training sample learning to recognize unseen ingredients.
4.
Applications of graph convolutional neural networks
4.1. Computer vision
Graph convolutional neural networks for action recognition are a research hotspot in the computer vision field, mainly by using skeletons as edges and joints as vertices so that the human skeleton can be operated as a graph. Figure 13 shows the spatial temporal graph of the skeleton sequence. Zheng et al. used the OpenPose method to extract human skeleton information from video and construct a skeleton spatial temporal graph, and they proposed a spatial and temporal graph convolutional network (ST-GCN) [59] to extract skeleton features from continuous video frames for video classification. Tian et al. [60] introduced an attention mechanism and cooccurrence feature learning to improve ST-GCN, forming a multitask framework that includes the attention branch, cooccurrence feature learning branch and ST-GCN. To capture the high-level interactive features between the five parts of the human body and distinguish the subtle differences between some confusing actions, Chen et al. proposed a graph convolutional network [61] with a structure-based graph pooling (SGP) scheme and a joint-wise channel attention (JCA) module. SGP integrates the human skeleton graph based on prior knowledge of the human body type. The JCA module selectively focuses on distinctive joints and assigns different degrees of attention to different channels to improve the network's classification performance for confusing behavior. Dong et al. proposed action recognition based on the fusion of graph convolutional networks with high-order features [62], using high-order spatial features of skeletal data, the relative distances between 3D joints, and high-order temporal features, velocity features and acceleration features. They used deep learning to extract the relative distances between 3D joints, learning velocity and acceleration from deep networks. Chen et al. [63] proposed a multiscale spatiotemporal graph convolutional network (MST-GCN) for action recognition by learning effective action representations. MST-GCN is composed of a stack of basic blocks that combine a multiscale spatial graph convolution (MS-GC) module and a multiscale temporal graph convolution (MT-GC) module. The MS-GC module captures local joint connections and nonlocal joint relationships for spatial modeling, and the MT-GC module effectively expands the temporal receptive field of long-term temporal dynamics. Table 3 presents the results of the above five methods on the NTU RGB+D dataset. The NTU RGB+D dataset includes two evaluation metrics: cross-subject (CS) and cross-view (CV). In the cross-subject evaluation, there are 40,320 and 16,560 clips for training and evaluation, respectively. In the cross-view evaluation, there are 37,920 and 18,960 clips for training and evaluation, respectively. Table 3 summarizes the top-1 classification accuracy on two evaluation metrics.
Figure 13.
Multi-frame skeleton graph [59]. The blue nodes represent the human joints, the green lines represent the human bones (the edges where the joints are connected), and the pink lines represent the connections of the same joint between successive frames.
Human pose estimation, as a basic task in many computer vision applications, involves locating key points of the human body in still images. The pose graph convolutional network (PGCN) [64] establishes a directed graph between key points of the human body according to the structure of the human body, using an attention mechanism to focus on the structured relationship between the key points. PGCN is trained to map the graph into a set of key point representations of structural perception, which can encode both the body structure and the appearance information of specific key points. The global reasoning graph convolutional network (GRR-GCN) [65] projects features of the original space to non-Euclidean space to form a fully connected graph composed of vertices, uses a graph convolutional network to perform global relational reasoning on the fully connected graph and globally learns the relationships between still image regions. Table 4 presents the results of PGCN and GRR-GCN on the MPII dataset. The MPII Human Pose datasets use the percentage of correct keypoints (PCK) metric for evaluation. PCK reports the percentage of keypoints that fall into a normalized distance of the ground truth. A space-time separable graph convolutional network (STS-GCN) for pose forecasting was proposed by Theodoros et al. [66]. STS-GCN is the first network to model human pose dynamics with only one graph convolutional network, which is the first to factorize the graph adjacency matrix, rather than depthwise. Zou et al. [67] proposed a new modulated GCN for 3D human pose estimation. Modulated GCN can disentangle the feature transformations of different nodes while keeping the model size small. In addition, this method can model edges other than the human skeleton, which significantly reduces the estimation error. Table 5 presents the results of STS-GCN and modulated GCN on the Human3.6M dataset. The MPJPE error metric was adopted, which measures the average Euclidean distance (in millimeters) between the ground truth and the prediction after aligning the root joint (the hip joint). In Table 5, STS-GCN_short-term (400) and STS-GCN_long-term (880) represent the MPJPE errors (in millimeters) of short-term (10 frames, 400 ms) and long-term (22 frames, 880 ms) predictions of 3D joint positions on the Human3.6M dataset, respectively.
Table 4.
Comparisons of PCKh@0.5 scores of PGCN and GRR-GCN on the MPII test set.
Traffic flow prediction is of great significance to traffic management and public safety. Traditional traffic flow prediction methods often ignore the temporal and spatial dependencies of traffic flow. For this reason, a spatiotemporal graph convolutional network [68] was proposed for the time series prediction problem in traffic networks. The spatiotemporal graph convolutional network metro (STGCNmetro) [69] was developed by constructing a deep structure composed of GCNs to obtain the spatiotemporal dependencies of the city-wide metro network and integrating it into three temporal modes (recent, daily and weekly), forming the final predicted value. In contrast to a model that only uses data directly related to traffic flow for GCN training, Zhao et al. proposed a spatiotemporal data fusion (STDF) architecture [70]. Data indirectly related to traffic flow is input into spatial embedding by temporal convolution (SETON), and each feature of time and space dimensions is encoded. The data directly related to the traffic flow are used as the input of the graph convolutional network. Through the fusion module, all features are combined for the final prediction.
Traffic speed prediction is a crucial component of intelligent transportation systems. Ge et al. proposed a temporal graph convolutional network (GTCN) [71] composed of a spatiotemporal component and an external component to solve the problem of traffic speed prediction. The spatiotemporal component combines k-order spectrogram convolution and extended random convolution to obtain spatiotemporal dependencies, and the external component is used to consider influencing factors such as road structure characteristics and social factors such as the day of the week. The global spatial-temporal graph convolutional network (CSTGCN) [72] consists of three spatiotemporal components with the same structure and one external component. The spatiotemporal component models the recent, daily and weekly spatiotemporal correlations of traffic data in the time dimension and combines the local graph convolution and the global correlation mechanism to realize the local and global spatial correlation in the space dimension. The external component is used to extract factors that affect traffic speed, such as holidays and weather conditions. Table 6 presents the results of GTCN and CSTGCN on the PeMSD4 and PeMSD7 datasets. PeMSD4 and PeMSD7 were collected by the California Department of Transportation's Performance Measurement System (CalTrans PeMS) every 30 seconds. The traffic speed data were aggregated from the raw data into 5-min windows. Three widely used metrics were used to evaluate the performance of GTCN and CSTGCN: mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE).
Table 6.
Comparison of GTCN and CSTGCN for traffic prediction on PeMSD7 and PeMSD4 datasets.
In biomedicine, graph convolutional networks can be used in multiple tasks, such as disease-gene association identification, disease prediction, protein interface prediction and drug discovery. GCN-MF [73] is a framework for discovering disease-gene associations by combining graph convolutional networks and matrix factorization. Using GCN extracts the nonlinear relationship between diseases and genes and exploits measured similarities. In addition, the fully connected graph convolutional network FCGCNMDA [74] has been used to predict potential miRNA-disease associations, and GCNDA [75], which is based on the deep learning method fast learning with graph convolutional networks, has been used to predict circRNA-disease associations. GCGCN [76] is a cancer survival prediction method based on graph convolutional networks that combine multigenome data and clinical data. It considers the heterogeneity between different types of data and makes full use of abstract high-level representations of different data sources. Chen et al. [77] proposed a graph convolutional network structure based on a multilayer aggregation mechanism, which can suppress oversmoothness while obtaining deep structural information and increasing the similarities between nodes of the same type. This method achieves better performance in brain analysis and breast cancer prediction. Karthik et al. [78] proposed a fully learning brain surface analysis approach for processing multiple surface-valued data to output topic-based information. This method has the most advanced performance for ADNI stage classification and brain age prediction using cortical surface data.
4.4. Recommender systems
Different from traditional deep learning models, GCNs can simultaneously make use of content information and graph structure, so GCN-based methods can be better applied to recommender systems, where the main difficulty is how to extend GCN-based node embedding into graphs with billions of nodes and tens of billions of edges. Ying et al. proposed a GCN framework PinSage [79] based on a random walk. The scalability of GCNs was improved by adopting dynamic convolution, a producer-consumer architecture was developed, and an efficient MapReduce pipeline was designed. The use of importance pooling and curriculum training was also introduced to greatly improve embedding performance.
Regarding how to improve the recommendation system's ability to mine and analyze user preferences and behaviors, Xia et al. [80] proposed a two-channel hypergraph convolutional network (DHCN) for session recommendation, which innovatively integrates self-supervision into network training and improves the recommendation task by maximizing the information between sessions of network learning. Chen et al. [81] proposed the SGCN method, which explicitly deleted irrelevant neighbors in the message-passing stage of GCNs and reduced the negative impact of noise on the recommendation system to a large extent.
The Multi-Social Graph Convolutional Network (MSGCN) [82] is a friend recommendation model based on multisocial networks that learns the latent characteristics of users, solving the friend recommendation problem with sparse data. Figure 14 shows a multisocial graph. By integrating topology information from multiple social networks to enrich the target user representation, the friend recommendation problem is converted into a personalized recommendation sorting problem using Bayesian theory. The hybrid graph convolutional neural network with multi-head attention for POI recommendation (HGMAP) [83] is a location recommendation framework that recommends points of interest (POIs) that users are interested in but have not yet visited. Two independent GCNs are used to learn social influence and geographical constraints and introduce the multi-head attention encoder to adaptively calculate the calculation score for each check-in to obtain the user's latent preference for nonvisited POIs. The user preference representation and social representation were used to simulate the user influence on POI recommendation.
Figure 14.
A multisocial graph with multisocial relationships [82], where the friend relationship is a directed graph, and the others are undirected graphs. Chat and team relationships reflect the relationship between friends to a certain extent, increase recommendation interference, meet the diversification of people's needs and avoid the oneness of the recommendation system.
The information extraction tasks of natural language processing include event detection, relationship extraction and entity linking. Event detection aims to identify specific types of instances in text. Thien et al. proposed a neural network model based on graph convolutional network dependency trees and entity mention-guide pooling [84] for event detection. Relation extraction aims to detect the relationship between entities in the text. Attention Guided Graph Convolutional Networks (AGGCNs) [85] use a "soft pruning" strategy, which runs on a full dependency tree; and through the self-attention mechanism, structural information useful for relation extraction is learned in an end-to-end manner. Hong et al. proposed an end-to-end model based on GCN [86] for joint extraction of entities and relations. MSBD [87], based on a character graph convolutional network (CGCN) and multi-head self-attention mechanism (MS), is an improved end-to-end model that realizes the joint extraction of entities and relations. The entity and relation extraction problem is transformed into a text tagging problem, which simplifies the complexity of entity and relationship extraction and improves time efficiency.
In addition, text classification and sentiment analysis are also crucial issues in natural language processing. Text classification using a graph convolutional network is Text GCN [88], which builds a heterogeneous word document graph for a corpus based on word co-occurrence and document word relations and transforms the text classification problem into a node classification problem. Manish et al. [89] proposed NIP-GCN, which introduces the node interaction patterns (NIPs) derived from the linked structure of the graph into the GCN framework and provides prior information for the graph convolutional layers. Combining prior information with an additional pairwise document similarity objective yields superior results compared to Text GCN.
Graph convolutional neural networks have been applied to sentiment analysis. Xiao et al. [90] considered that semantic information, syntactic information and interactive information are crucial to target semantic analysis and proposed an attentional-encoding-based graph convolutional network (AEGCN). AEGCN combines semantic information and syntactic information to predict the emotional polarity of the target. Zhao et al. [91] proposed an aspect-level sentiment classification model (SDGCN) based on a graph convolutional network which can effectively capture the sentiment dependencies between multiple aspects in one sentence. Jiang et al. [92] proposed HDGCN, which enhances multihop graph inference by aggregating the information of direct dependence and long-term dependence into a convolution layer and connecting the premise and hypothesis sentences together to form a long sentence. The DualGCN [93] model of aspect-based sentiment analysis integrates syntactic knowledge and semantic information through SynGCN and SemGCN modules and accurately captures semantic relevance between words by constraining attention scores in the SemGCN module to determine the polarity of emotions of a given aspect in a sentence. Table 7 presents the results of the above four methods on two datasets of SemEval 2014 Task4, which contains reviews in the laptop domain and restaurant domain. Each review (one sentence) contains one or more aspects and their corresponding sentiment polarities, i.e., positive, neutral and negative. In Table 7, BERT represents the BERT model by feeding the sentence-aspect pair and using the representation of [CLS] for prediction. The accuracy and macro average F1 were selected for the evaluation.
Table 7.
Test accuracy (%) and macro-F1 score with four sentiment analysis methods on aspect-based sentiment classification.
Graph convolutional neural networks are also widely used in social networks. For example, a graph convolutional network based on an autoencoder is used for online financial anti-fraud [94]. By using an autoencoder module to replace the graph convolution matrix, the depth of the graph convolutional network is increased without causing oversmoothing. A graph convolutional network was used to encode social information for the political perspective detection of news media [95], and a graph convolutional network was used to capture social information spread in social networks. A spatially aware graph convolutional network (SAGCN) [96] is used for multipedestrian trajectory prediction in autonomous driving, building an attention graph, where the nodes represent the time information of pedestrians, and the edges represent the pairwise correspondence between pedestrians. Additionally, used for position prediction is a graph convolutional network based on the seq2seq framework [97], which uses the seq2seq framework to generate the hidden state and cell state of the historical trajectories. The graph convolutional network is used to capture spatial correlation and temporal dynamics. Graph convolutional neural networks can also be used to measure the privacy metrics of online social networks [98].
5.
Conclusions
This paper presents an overview of graph convolutional networks and their related approaches. First, the advantages and disadvantages of basic models of graph convolutional networks are analyzed. Graph convolutional networks (GCNs) have poor scalability, are limited to shallow networks and undirected graphs and cannot handle complex graph data well. In view of these drawbacks, this paper introduces recent works of graph convolutional networks in complex graph tasks (such as large-scale graphs, directed graphs, multiscale graphs, etc.) and introduces the research progress of graph convolutional networks. In addition, the achievements of graph convolutional networks in some application domains are summarized.
Graph convolutional networks can be applied to many domains, and the effect is remarkable. However, for the processing of complex graph data such as large-scale deep graphs, directed graphs and multiscale dynamic graphs, there is still room for improvement in related work of graph convolutional networks. Therefore, extending graph convolutional networks to complex graph tasks such as large-scale graphs, directed graphs, multiscale graphs and dynamic graphs is the key research direction for graph convolutional networks in the future.
Acknowledgments
This work was supported by the Fundamental Research Funds of Central Universities (No. 2019XKQYMS87).
Conflict of interest
The authors declare there is no conflict of interest.
The author is very grateful to all the children, adolescents and their families who across the past years have shared their experiences as part of my clinical practice. Their experiences have inspired my theoretical research, thereby contributing to the development of the dynamic model, including the hypotheses guiding the model.
Conflict of interest
The author declares no conflicts of interest.
References
[1]
Baird G, Simonoff E, Pickles A, et al. (2006) Prevalence of disorders of the autism spectrum in a population cohort of children in South Thames: the Special Needs and Autism Project (SNAP). Lancet 368: 210-215. doi: 10.1016/S0140-6736(06)69041-7
[2]
Baxter AJ, Brugha TS, Erskine HE, et al. (2015) The epidemiology and global burden of autism spectrum disorders. Psychol Med 45: 601-613. doi: 10.1017/S003329171400172X
[3]
Brugha TS, Spiers N, Bankart J, et al. (2016) Epidemiology of autism in adults across age groups and ability levels. Br J Psychiatry 209: 498-503. doi: 10.1192/bjp.bp.115.174649
[4]
Elsabbagh M, Divan G, Koh Y-J, et al. (2012) Global prevalence of autism and other pervasive developmental disorders. Autism Res 5: 160-179. doi: 10.1002/aur.239
[5]
Atladottir HO, Gyllenberg D, Langridge A, et al. (2015) The increasing prevalence of reported diagnoses of childhood psychiatric disorders: a descriptive multinational comparison. Eur Child Adolesc Psychiatry 24: 173-183. doi: 10.1007/s00787-014-0553-8
[6]
Hansen SN, Schendel DE, Parner ET (2015) Explaining the increase in the prevalence of autism spectrum disorders. The proportion attributable to changes in reporting practices. JAMA Pediatr 169: 56-62. doi: 10.1001/jamapediatrics.2014.1893
[7]
Simonoff E (2012) Autism spectrum disorder: prevalence and cause may be bound together. Br J Psych 201: 88-89. doi: 10.1192/bjp.bp.111.104703
[8]
Gillberg C (2010) The ESSENCE in child psychiatry: Early Symptomatic Syndromes Eliciting Neurodevelopmental Clinical Examinations. Res Dev Dis 31: 1543-1551. doi: 10.1016/j.ridd.2010.06.002
[9]
Grove J, Ripke S, Als TD, et al. (2019) Identification of common genetic risk variants for autism spectrum disorder. Nat Genet 51: 431-444. doi: 10.1038/s41588-019-0344-8
[10]
Simonoff E, Pickles A, Charman, et al. (2008) Psychiatric disorders in children with autism spectrum disorders: Prevalence, comorbidity, and associated factors in a population-derived sample. J Am Acad Child Adolesc Psychiatry 47: 921-929. doi: 10.1097/CHI.0b013e318179964f
[11]
Wender CLA, Veenstra-VanderWeele J (2017) Challenge and Potential for Research on Gene-Environment Interactions in Autism Spectrum Disorder. Gene-Environment Transactions in Developmental Psychopathology, Advances in Development and Psychopathology: Brain Research Foundation Symposium Series 2 Springer International Publishing AG, 157-176. doi: 10.1007/978-3-319-49227-8_9
[12]
Sandin S, Lichtenstein P, Kuja-Halkola R, et al. (2014) The familial risk of autism. JAMA 311: 1770-1777. doi: 10.1001/jama.2014.4144
[13]
Robinson EB, St Pourcain B, Anttila V, et al. (2016) Genetic risk for autism spectrum disorders and neuropsychiatric variation in the general population. Nat Genet 48: 552-555. doi: 10.1038/ng.3529
[14]
Craddock N, Owen M (2010) The Kraepelinian dichotomy – going, going…but still not gone. Br J Psychiatry 196: 92-95. doi: 10.1192/bjp.bp.109.073429
[15]
Cross-Disorder Group of the Psychiatric Genomics Consortium (2013) Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet 381: 1371-1379.
[16]
Aggernæs B (2018) Autism: A transdiagnostic, dimensional construct of reasoning? Eur J Neurosci 47: 515-533. doi: 10.1111/ejn.13599
[17]
Aggernæs B (2016) Rethinking the concept of psychosis and the link between autism and Schizophrenia. Scand J Child Adol Psychiat Psychol 4: 4-11.
[18]
Bleuler E (1911) Dementia Praecox oder Gruppe der Schizophrenien Leipzig: Deuticke.
[19]
Piaget J (1967) La psychologie de l'intelligence Paris: A. Colin.
[20]
Alberoni F (1984) Movement and Institution New York: Columbia University Press.
[21]
Aggernæs A, Haugsted R (1976) Experienced reality in three to six year old children: A study of direct reality testing. J Child Psychol Psychiat 17: 323-335. doi: 10.1111/j.1469-7610.1976.tb00407.x
[22]
Aggernæs A, Paikin H, Vitger J (1981) Experienced reality in schizophrenia: How diffuse is the defect in reality testing? Indian J Psychol Med 4: 1-13. doi: 10.1177/0975156419810101
[23]
Clemmensen L, van Os J, Skovgaard AM, et al. (2014) Hyper-Theory-of-Mind in children with psychotic experiences. PLoS ONE 9: e113082. doi: 10.1371/journal.pone.0113082
[24]
Hallmayer J, Cleveland S, Torres A, et al. (2011) Genetic heritability and shared environmental factors among twin pairs with autism. Arch Gen Psychiatry 68: 1095-1102. doi: 10.1001/archgenpsychiatry.2011.76
[25]
Latham KE, Sapienza C, Engel N (2012) The epigenetic lorax: gene–environment interactions in human health. Epigenomics 4: 383-402. doi: 10.2217/epi.12.31
[26]
Miake K, Hirasawa T, Koide T, et al. (2012) Epigenetics in autism and other neurodevelopmental diseases. Adv Exp Med Biol 724: 91-98. doi: 10.1007/978-1-4614-0653-2_7
[27]
Shanen NC (2006) Epigenetics of autism spectrum disorders. Human Molecular Genetics 15: R138-R150. doi: 10.1093/hmg/ddl213
[28]
Siniscalco D, Cirillo A, Bradstreet JJ, et al. (2013) Epigenetic findings in autism: New perspectives for therapy. Int J Environ Res Public Health 10: 4261-4273. doi: 10.3390/ijerph10094261
[29]
Belmonte MK, Cook EH, Anderson GM, et al. (2004) Autism as a disorder of neural information processing: directions for research and targets for therapy. Mol Psychiatry 9: 646-663. doi: 10.1038/sj.mp.4001499
[30]
Courchesne E (2002) Abnormal early brain development in autism. Mol Psychiatry 7: S21-S23. doi: 10.1038/sj.mp.4001169
[31]
Menon V (2011) Large-scale brain networks and psychopathology: A unifying triple network model. Trends Cogn Sci 15: 483-506. doi: 10.1016/j.tics.2011.08.003
[32]
Szatmari P (2000) The classification of autism, Asperger's syndrome, and pervasive developmental disorder. Can J Psychiat 45: 731-738. doi: 10.1177/070674370004500806
[33]
Szyf M (2009) Epigenetics, DNA methylation, and chromatin modifying drugs. Annu Rev Pharmacol Toxicol 49: 243-263. doi: 10.1146/annurev-pharmtox-061008-103102
[34]
Plomin R (2011) Commentary: Why are children in the same family so different? Non-shared environment three decades later. Int J Epidemiol 40: 582-592. doi: 10.1093/ije/dyq144
[35]
Larsson HJ, Eaton WW, Madsen KM, et al. (2005) Risk factors for autism: perinatal factors, parental psychiatric history, and socioeconomic status. Am J Epidemiol 161: 916-925. doi: 10.1093/aje/kwi123
[36]
Lauritsen MB, Pedersen CB, Mortensen PB (2005) Effects of familial risk factors and place of birth on the risk of autism: a nationwide registerbased study. J Child Psychol Psyc 46: 963-971. doi: 10.1111/j.1469-7610.2004.00391.x
Ramaswami G, Geschwind DH (2018) Chapter 21 Genetics of autism spectrum disorder. Handbook of Clinical Neurology, Vol. 147 (3rd series) Neurogenetics, Part I Elsevier, 321-329.
[39]
Betancur C (2011) Etiological heterogeneity in autism spectrum disorders: More than 100 genetic and genomic disorders and still counting. Brain Res 1380: 42-77. doi: 10.1016/j.brainres.2010.11.078
[40]
Nava C, Keren B, Mignot C, et al. (2014) Prospective diagnostic analysis of copy number variants using SNP microarrays in individuals with autism spectrum disorders. Eur J Hum Genet 22: 71-78. doi: 10.1038/ejhg.2013.88
[41]
The Autism Genome Project Consortium (2007) Mapping autism risk loci using genetic linkage and chromosomal rearrangements. Nat Genet 39: 319-328.
[42]
Weiner DJ, Wigdor EM, Ripke S, et al. (2017) Polygenic transmission disequilibrium confirms that common and rare variation act additively to create risk for autism spectrum disorders. Nat Genet 49: 978-985. doi: 10.1038/ng.3863
[43]
Gaugler T, Klei L, Sanders SJ, et al. (2014) Most genetic risk for autism resides with common variation. Nat Genet 46: 881-885. doi: 10.1038/ng.3039
[44]
Guilmatre A, Dubourg C, Mosca AL, et al. (2009) Recurrent rearrangements in synaptic and neurodevelopmental genes and shared biologic pathways in schizophrenia, autism, and mental retardation. Arch Gen Psychiat 66: 947-956. doi: 10.1001/archgenpsychiatry.2009.80
[45]
McCarthy SE, Gillis J, Kramer M, et al. (2014) De novo mutations in schizophrenia implicate chromatin remodeling and support a genetic overlap with autism and intellectual disability. Mol Psychiatry 19: 652-658. doi: 10.1038/mp.2014.29
[46]
Hannon E, Schendel D, Ladd-Acosta C, et al. (2018) Elevated polygenic burden for autism is associated with differential DNA methylation at birth. Genome Med 10: 19. doi: 10.1186/s13073-018-0527-4
[47]
Tran NQV, Miyake K (2017) Neurodevelopmental disorders and environmental toxicants: epigenetics as an underlying mechanism. Int J Genomics 2017: 7526592.
[48]
Waddington CH (2012) The epigenotype. Int J Epidemiol 41: 10-13. doi: 10.1093/ije/dyr184
[49]
Fraga MF, Ballestar E, Paz MF, et al. (2005) Epigenetic differences arise during the lifetime of monozygotic twins. Proc Natl Acad Sci USA 102: 10604-10609. doi: 10.1073/pnas.0500398102
[50]
Lee R, Avramopoulos D (2014) Introduction to Epigenetics in Psychiatry. Epigenetics in Psychiatry Elsevier Inc., 3-25. doi: 10.1016/B978-0-12-417114-5.00001-2
Gos M (2013) Epigenetic mechanisms of gene expression regulation in neurological diseases. Acta Neurobiol Exp 73: 19-37.
[53]
James SJ, Melnyk S, Jernigan S, et al. (2006) Metabolic endophenotype and related genotypes are associated with oxidative stress in children with autism. Am J Med Genet Part B 141B: 947-956. doi: 10.1002/ajmg.b.30366
[54]
Deth R, Muratore C, Benzecry J, et al. (2008) How environmental and genetic factors combine to cause autism: A redox/methylation hypothesis. Neurotoxicology 29: 190-201. doi: 10.1016/j.neuro.2007.09.010
[55]
Homs A, Codina-Solà M, Rodríguez-Santiago B, et al. (2016) Genetic and epigenetic methylation defects and implication of the ERMN gene in autism spectrum disorders. Transl Psychiatry 6: e855. doi: 10.1038/tp.2016.120
[56]
Ginsberg MR, Rubin RA, Falcone T, et al. (2012) Brain transcriptional and epigenetic associations with autism. PLoS ONE 7: e44736. doi: 10.1371/journal.pone.0044736
[57]
Wong CCY, Meaburn EL, Ronald A, et al. (2014) Methylomic analysis of monozygotic twins discordant for autism spectrum disorder and related behavioural traits. Mol Psychiatry 19: 495-503. doi: 10.1038/mp.2013.41
[58]
Minnis H (2013) Maltreatment-Associated Psychiatric Problems: An Example of Environmentally Triggered ESSENCE? Sci World J 2013: 148468. doi: 10.1155/2013/148468
[59]
Pritchett R, Pritchett J, Marshall E, et al. (2013) Reactive attachment disorder in the general population: a hidden ESSENCE disorder. Sci World J 2013: 81815.
[60]
Dinkler L, Lundström S, Gajwani R, et al. (2017) Maltreatment-associated neurodevelopmental disorders: a co-twin control analysis. J Child Psychol Psychiatr 58: 691-701. doi: 10.1111/jcpp.12682
[61]
McEwen BS (2008) Central effects of stress hormones in health and disease: understanding the protective and damaging effects of stress and stress mediators. Eur J Pharmacol 583: 174-185. doi: 10.1016/j.ejphar.2007.11.071
[62]
Rostène W, Sarrieau A, Nicot A, et al. (1995) Steroid effects on brain functions: An example of the action of glucocorticoids on central dopaminergic and neurotensinergic systems. J Psychiatry Neurosci 20: 349-356.
[63]
Meaney MJ, Szyf M (2005) Environmental programming of stress responses through DNA methylation: life at the interface between a dynamic environment and a fixed genome. Dialogues Clin Neurosci 7: 103-123. doi: 10.31887/DCNS.2005.7.2/mmeaney
[64]
Weaver ICG, Champagne FA, Brown SE, et al. (2005) Reversal of maternal programming of stress responses in adult offspring through methyl supplementation: Altering epigenetic marking later in life. J Neurosci 25: 11045-11054. doi: 10.1523/JNEUROSCI.3652-05.2005
[65]
McGowan PO, Matthews SG (2018) Prenatal stress, glucocorticoids, and developmental programming of the stress response. Endocrinology 159: 69-82. doi: 10.1210/en.2017-00896
[66]
Stenz L, Schechter DS, Serpa SR, et al. (2018) Intergenerational transmission of DNA methylation signatures associated with early life stress. Curr Genomics 19: 665-675. doi: 10.2174/1389202919666171229145656
[67]
Oberlander TF, Weinberg J, Papsdorf M, et al. (2008) Prenatal exposure to maternal depression, neonatal methylation of human glucocorticoid receptor gene (NR3C1) and infant cortisol stress responses. Epigenetics 3: 97-106. doi: 10.4161/epi.3.2.6034
[68]
Corbett BA, Mendoza S, Wegelin JA, et al. (2008) Variable cortisol circadian rhythms in children with autism and anticipatory stress. J Psychiatry Neurosci 33: 227-234.
[69]
Corbett BA, Schupp CW, Levine S, et al. (2009) Comparing cortisol, stress, and sensory sensitivity in children with autism. Autism Res 2: 39-49. doi: 10.1002/aur.64
[70]
Corbett BA, Schupp CW, Lanni KE (2012) Comparing biobehavioral profiles across two social stress paradigms in children with and without autism spectrum disorders. Mol Autism 3: 13. doi: 10.1186/2040-2392-3-13
[71]
Corbett BA, Muscatello RA, Blain SD (2016) Impact of sensory sensitivity on physiological stress response and novel peer interaction in children with and without autism spectrum disorder. Front Neurosci 10: 278. doi: 10.3389/fnins.2016.00278
[72]
Tordjman S, Anderson GM, Kermarrec S, et al. (2014) Altered circadian patterns of salivary cortisol in low-functioning children and adolescents with autism. Psychoneuroendocrinology 50: 227-245. doi: 10.1016/j.psyneuen.2014.08.010
[73]
Bishop-Fitzpatrick L, Mazefsky CA, Minshew NJ, et al. (2015) The relationship between stress and social functioning in adults with autism spectrum disorder and without intellectual disability. Autism Res 8: 164-173. doi: 10.1002/aur.1433
[74]
Bishop-Fitzpatrick L, Minshew NJ, Mazefsky CA, et al. (2017) Perception of life as stressful, not biological response to stress, is associated with greater social disability in adults with autism spectrum disorder. Autism Dev Disord 47: 1-16. doi: 10.1007/s10803-016-2910-6
[75]
Sønderby IE, Gústafsson Ó, Doan NT, et al. (2018) Dose response of the 16p11.2 distal copy number variant on intracranial volume and basal ganglia. Mol Psychiatry 25: 584-602. doi: 10.1038/s41380-018-0118-1
[76]
Turner TN, Coe BP, Dickel DE, et al. (2017) Genomic patterns of de novo mutation in simplex autism. Cell 171: 710-722. doi: 10.1016/j.cell.2017.08.047
[77]
Schork AJ, Won H, Appadurai V, et al. (2019) A genome-wide association study of shared risk across psychiatric disorders implicates gene regulation during fetal neurodevelopment. Nat Neurosci 22: 353-361. doi: 10.1038/s41593-018-0320-0
[78]
Gandal MJ, Haney JR, Parikshak NN, et al. (2018) Shared molecular neuropathology across major psychiatric disorders parallels polygenic overlap. Science 359: 693-697. doi: 10.1126/science.aad6469
[79]
Shulha HP, Cheung I, Guo Y, et al. (2013) Coordinated cell type–specific epigenetic remodeling in prefrontal cortex begins before birth and continues into early adulthood. PLoS Genet 9: e1003433. doi: 10.1371/journal.pgen.1003433
[80]
Belmonte MK, Yurgelun-Todd DA (2003) Functional anatomy of impaired selective attention and compensatory processing in autism. Brain Res Cogn Res 17: 651-664. doi: 10.1016/S0926-6410(03)00189-7
[81]
Livingston LA, Happé F (2017) Conceptualising compensation in neurodevelopmental disorders: Reflections from autism spectrum disorders. Neurosci Biobehav Rev 80: 729-742. doi: 10.1016/j.neubiorev.2017.06.005
[82]
Weems CF (2015) Biological correlates of child and adolescent responses to disaster exposure: A bio-ecological model. Curr Psychiatry Rep 17: 1-7. doi: 10.1007/s11920-015-0588-7
[83]
Weems CF, Russel JD, Neill EL, et al. (2019) Annual Research Review: Pediatric posttraumatic stress disorder from a neurodevelopmental network perspective. J Child Psychol Psychiatr 60: 395-408. doi: 10.1111/jcpp.12996
[84]
Ouss-Ryngaert L, Golse B (2010) Linking neuroscience and psychoanalysis from a developmental perspective: Why and how? J Physiol 104: 303-308.
[85]
Hosman CMH, van Doesum KTM, van Santvoort F (2009) Prevention of emotional problems and psychiatric risks in children of parents with a mental illness in the Netherlands: I. The scientific basis to a comprehensive approach. Aust e-J Advancement Ment Health 8: 250-263. doi: 10.5172/jamh.8.3.250
[86]
Sarovic D (2018) A framework for neurodevelopmental disorders: Operationalization of a pathogenetic triad for clinical and research use. Gillberg Neuropsychiatry Centre, Institute of Neuroscience and Physiology, Sahlgrenska Academy University of Gothenburg Available from: https://psyarxiv.com/mbeqh/.
[87]
Caspi A, Houts RM, Belsky DW, et al. (2014) The p factor: One general psychopathology factor in the structure of psychiatric disorders? Clin Psychol Sci 2: 119-137. doi: 10.1177/2167702613497473
[88]
Sameroff AJ, Mackenzie MJ (2003) Research strategies for capturing transactional models of development: The limits of the possible. Dev Psychopathol 15: 613-640. doi: 10.1017/S0954579403000312
[89]
Bishop-Fitzpatrick L, Mazefsky CA, Eack SM (2018) The combined impact of social support and perceived stress on quality of life in adults with autism spectrum disorder and without intellectual disability. Autism 22: 703-711. doi: 10.1177/1362361317703090
[90]
Bishop-Fitzpatrick L, Smith LE, Greenberg JS, et al. (2017) Participation in recreational activities buffers the impact of perceived stress on quality of life in adults with autism spectrum disorder. Autism Res 10: 973-982. doi: 10.1002/aur.1753
[91]
Yehuda R, Lehrner A, Bierer LM (2018) The public reception of putative epigenetic mechanisms in the transgenerational effects of trauma. Environ Epigenet 4: 1-7. doi: 10.1093/eep/dvy018
[92]
Clarke T-K, Lupton MK, Fernandez-Pujals AM, et al. (2016) Common polygenic risk for autism spectrum disorder (ASD) is associated with cognitive ability in the general population. Mol Psychiatry 21: 419-425. doi: 10.1038/mp.2015.12
[93]
Robinson EB, Samocha KE, Kosmicki JA, et al. (2014) Autism spectrum disorder severity reflects the average contribution of de novo and familial influences. Proc Natl Acad Sci USA 111: 15161-15165. doi: 10.1073/pnas.1409204111
[94]
Sanders SJ, He X, Willsey AJ, et al. (2015) Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron 87: 1215-1233. doi: 10.1016/j.neuron.2015.09.016
[95]
Bakermans-Kranenburg MJ, van Ijzendoorn MH (2006) Gene-environment interaction of the dopamine D4 receptor (DRD4) and observed maternal insensitivity predicting externalizing behavior in preschoolers. Dev Psychobiol 48: 406-409. doi: 10.1002/dev.20152
[96]
Kumsta R, Stevens S, Brookes K, et al. (2010) 5HTT genotype moderates the influence of early institutional deprivation on emotional problems in adolescence: evidence from the English and Romanian Adoptee (ERA) study. J Child Psychol Psychiatry 51: 755-762. doi: 10.1111/j.1469-7610.2010.02249.x
[97]
Yang C-J, Tan H-P, Yang F-Y (2015) The cortisol, serotonin and oxytocin are associated with repetitive behavior in autism spectrum disorder. Res Autism Spectr Disord 18: 12-20. doi: 10.1016/j.rasd.2015.07.002
[98]
Meier SM, Petersen L, Schendel DE, et al. (2015) Obsessive-compulsive disorder and autism spectrum disorders: Longitudinal and offspring risk. PLoS ONE 10: e0141703. doi: 10.1371/journal.pone.0141703
[99]
Valla JM, Belmonte MK (2013) Detail-oriented cognitive style and social communicative deficits, within and beyond the autism spectrum: independent traits that grow into developmental interdependence. Dev Rev 33: 371-398. doi: 10.1016/j.dr.2013.08.004
[100]
Lai MC, Lombardo MV, Ruigrok AN, et al. (2017) Quantifying and exploring camouflaging in men and women with autism. Autism 21: 690-702. doi: 10.1177/1362361316671012
[101]
Atladóttir HO, Thorsen P, Schendel DE, et al. (2010) Association of hospitalization for infection in childhood with diagnosis of autism spectrum disorders. Arch Pediatr Adolesc Med 164: 470-477. doi: 10.1001/archpediatrics.2010.9
[102]
Nudel R, Wang Y, Appadurai V, et al. (2019) A large-scale genomic investigation of susceptibility to infection and its association with mental disorders in the Danish population. Transl Psychiatry 9: 283. doi: 10.1038/s41398-019-0622-3
[103]
Danese A, Moffitt TE, Arseneault L, et al. (2017) The origins of cognitive deficits in victimized children: Implications for neuroscientists and clinicians. Am J Psychiatry 174: 349-361. doi: 10.1176/appi.ajp.2016.16030333
[104]
Kim SY, Bottema-Beutel K (2019) A meta regression analysis of quality of life correlates in adults with ASD. Res Autism Spectr Disord 63: 23-33. doi: 10.1016/j.rasd.2018.11.004
[105]
Plana-Ripoll O, Pedersen CB, Holtz Y, et al. (2019) Exploring comorbidity within mental disorders among a danish national population. JAMA Psychiatry 76: 259-270. doi: 10.1001/jamapsychiatry.2018.3658
[106]
Cuthbert N, Insel R (2013) Toward the future of psychiatric diagnosis: the seven pillars of RDoC. BMC Med 11: 126. doi: 10.1186/1741-7015-11-126
[107]
(2013) American Psychiatric AssociationDiagnostic and Statistical Manual of Mental Disorders. American Psychiatric Publishing.
[108]
(1993) WHOThe ICD-10 Classification of Mental and Behavioural Disorders: Diagnostic Criteria for Research. Geneva: World Health Organization.
[109]
Polanczyk GV, Casella C, Jaffee SR (2019) Commentary: ADHD lifetime trajectories and the relevance of the developmental perspective to Psychiatry: reflections on Asherson and Agnew-Blais. J Child Psychol Psychiatry 60: 353-355. doi: 10.1111/jcpp.13050
[110]
Narusyte J, Neiderhiser JM, D'Onofrio BM, et al. (2008) Testing different types of genotype–environment correlation: An extended children-of-twins model. Dev Psychol 44: 1591-1603. doi: 10.1037/a0013911
[111]
Sykes NH, Lamb JA (2007) Autism: The quest for the genes. Expert Rev Mol Med 9: 1-15. doi: 10.1017/S1462399407000452
This article has been cited by:
1.
Zachary A. Rollins, Alan C. Cheng, Essam Metwally,
MolPROP: Molecular Property prediction with multimodal language and graph fusion,
2024,
16,
1758-2946,
10.1186/s13321-024-00846-9
2.
Anass B. El-Yaagoubi, Moo K. Chung, Hernando Ombao,
Topological Data Analysis for Multivariate Time Series Data,
2023,
25,
1099-4300,
1509,
10.3390/e25111509
Wenbo Wu, Masitah Ghazali, Sharin Hazlin Huspi,
A Review of User Profiling Based on Social Networks,
2024,
12,
2169-3536,
122642,
10.1109/ACCESS.2024.3430987
Ryuichiro Ishitani, Mizuki Takemoto, Kentaro Tomii, Mannix P. Balanay,
Protein ligand binding site prediction using graph transformer neural network,
2024,
19,
1932-6203,
e0308425,
10.1371/journal.pone.0308425
11.
Habib Taha Kose, Jose Nunez-Yanez, Robert Piechocki, James Pope,
A Survey of Computationally Efficient Graph Neural Networks for Reconfigurable Systems,
2024,
15,
2078-2489,
377,
10.3390/info15070377
12.
Yunyi Wang, Shiyu Feng, Renjie Wang, Ping Feng,
2024,
Research on Knowledge Graph Completion Method Based on Graph Convolutional Neural Networks,
979-8-3503-7654-8,
658,
10.1109/ICSP62122.2024.10743659
13.
Siqi Bai,
2024,
Anomaly Detection of Electrical Load in Power Systems Based on Graph Neural Networks,
9798400718137,
529,
10.1145/3689236.3689283
14.
Lei Huang, Jianxin Qin, Tao Wu, Zeybek Mustafa,
Multisource Data Fusion With Graph Convolutional Neural Networks for Node‐Level Traffic Flow Prediction,
2024,
2024,
0197-6729,
10.1155/atr/7109780
15.
Brahami Menaouer, Safa Fairouz, Mohammed Boulekbachi Meriem, Sabri Mohammed, Matta Nada,
A sentiment analysis of the Ukraine-Russia War tweets using knowledge graph convolutional networks,
2025,
2511-2104,
10.1007/s41870-024-02357-0
16.
Yao Wang, Xiujuan Lei, Yuli Chen, Ling Guo, Fang-Xiang Wu,
Circular RNA-Drug Association Prediction Based on Multi-Scale Convolutional Neural Networks and Adversarial Autoencoders,
2025,
26,
1422-0067,
1509,
10.3390/ijms26041509
17.
Youxu Dai, Aiping Han, Huijun Ma, Xuebo Jin, Danyang Zhu, Shiguang Sun, Ruiheng Li,
Binding Affinity Prediction and Pesticide Screening against Phytophthora sojae Using a Heterogeneous Interaction Graph Attention Network–Based Model,
2025,
1549-9596,
10.1021/acs.jcim.4c02295
18.
Fuqiang Zhang, Weichen Liu, Jizhuang Hui, Junyan Bai, Hui Mu, Xueliang Zhou, Jiewu Leng,
Social manufacturing supply-demand service matching design based on knowledge graphs and graph convolutional networks,
2025,
0954-4828,
1,
10.1080/09544828.2025.2500861
19.
Fail Gafarov, Daniyar Mingazov, Pavel Ustin,
2025,
Prediction of Online Social Network's Users Features and Social Behavior by Graph Neural Networks,
979-8-3315-1124-1,
322,
10.1109/SmartIndustryCon65166.2025.10986088
20.
Zilu Li, Qingyun Li, Xiaoqing Li, Wei Luo, Haiyan Guo, Chunyan Zhao, Canzhen Yang, Anke Xie, Kai Hu, Yangfan Guo, Tao Huang,
AD-GCN: A novel graph convolutional network integrating multi-omics data for enhanced Alzheimer’s disease diagnosis,
2025,
20,
1932-6203,
e0325050,
10.1371/journal.pone.0325050
Bodil Aggernæs. Suggestion of a dynamic model of the development of neurodevelopmental disorders and the phenomenon of autism[J]. AIMS Molecular Science, 2020, 7(2): 122-182. doi: 10.3934/molsci.2020008
Bodil Aggernæs. Suggestion of a dynamic model of the development of neurodevelopmental disorders and the phenomenon of autism[J]. AIMS Molecular Science, 2020, 7(2): 122-182. doi: 10.3934/molsci.2020008
Table 1.
An overview of the advantages and disadvantages of the classic GCN models.
Type of model
Description
Advantages
Disadvantages
GCN
For the first time, the convolution operation in image processing is applied to graph-structured data processing
●Non-Euclidean structured data ●Capture global information of graphs ●Well characterized node features and edge features
●Poor flexibility, poor scalability, limited to undirected graphs and shallow layers ●GCN training needs to know the structural information of the entire graph
GraphSAGE
Improve GCN scalability and training methods
●Overcome the limitations of memory and video memory during GCNs training ●Able to handle large scale graphs ●The parameters of the aggregator and weight matrix are shared for all nodes
●Unable to process weighted graph ●Neighbor sampling will lead to large gradient variance during backpropagation ●The limited number of samples will lead to the loss of important local information of some nodes
GAT(Graph attention network)
Add attention mechanism to give importance to edges between nodes
●Prone to oversmoothing ●Poor training mode ●Large number of parameters
●High computational efficiency ●Attention mechanism ●No need to know the whole graph structure ●Can be used for transductive learning and inductive learning
For the first time, the convolution operation in image processing is applied to graph-structured data processing
●Non-Euclidean structured data ●Capture global information of graphs ●Well characterized node features and edge features
●Poor flexibility, poor scalability, limited to undirected graphs and shallow layers ●GCN training needs to know the structural information of the entire graph
GraphSAGE
Improve GCN scalability and training methods
●Overcome the limitations of memory and video memory during GCNs training ●Able to handle large scale graphs ●The parameters of the aggregator and weight matrix are shared for all nodes
●Unable to process weighted graph ●Neighbor sampling will lead to large gradient variance during backpropagation ●The limited number of samples will lead to the loss of important local information of some nodes
GAT(Graph attention network)
Add attention mechanism to give importance to edges between nodes
●Prone to oversmoothing ●Poor training mode ●Large number of parameters
●High computational efficiency ●Attention mechanism ●No need to know the whole graph structure ●Can be used for transductive learning and inductive learning
Figure 1. The figure illustrates key symbols applied in the subsequent figures. (A): A simple sketch of the dynamic model illustrating an individual in interaction with their environment across time. (B): An outline of the key symbols applied in the subsequent figures
Figure 2. A simple dynamic model of neurotypical development. The model illustrates a neurotypical individual interacting with their physical/chemical and social environment across time. Normal events related to development may cause transient symptoms within the normal range, e.g., anxiety or compulsive behavior. (A): Early time course with no stress and no clinical symptoms; (B): Intermediate time course with transient stress and a few transient clinical symptoms within the normal range; (C): Later time course with no stress and no clinical symptoms. (Green time course: no clinical symptoms; yellow time course: transient clinical symptoms; downward orange arrows: transient stress; upward green arrows: normal events related to development)
Figure 3a. A simple dynamic trauma model of the development of clinical symptoms. Neurotypical course. The model illustrates a neurotypical individual interacting with their physical/chemical and social environment, suggesting different stages over time for a neurotypical course. Transient stress may cause either no clinical symptoms or a few transient clinical symptoms, e.g., insomnia or anxiety; (A): Early time course with no stress and no clinical symptoms; (B): Intermediate time course with transient stress and a few transient clinical symptoms within the normal range; (C): Later time course with no stress and no clinical symptoms. (Green time course: no clinical symptoms; yellow time course: transient clinical symptoms; downward orange arrows: transient stress; upward green arrows: normal events related to development, upward tall red arrow: trauma/adverse effect)
Figure 3b. A simple dynamic trauma model of the development of clinical symptoms resulting from enduring or recurrent stress. Pathological courses. The model illustrates a neurotypical individual interacting with their social environment, suggesting different stages across time for two different pathological courses. I. Enduring stress may cause increased stress reactivity and trigger the development of more enduring clinical manifestations, e.g., insomnia, anxiety or depressive symptoms; (A): Early time course with no stress and no clinical symptoms; (B): Intermediate time course with transient stress and a few transient clinical symptoms; (C): Later time course with enduring stress triggering epigenetic mechanisms and resulting in enduring clinical symptoms or manifest illness. II. Recurrent stress may result in increased stress reactivity and transient or enduring clinical manifestations, e.g., insomnia, anxiety or depressive symptoms; (D): Early time course with no stress and no clinical symptoms; (E): Intermediate time course with transient stress and a few transient clinical symptoms; (F): Later time course with recurrent stress triggering epigenetic mechanisms and resulting in enduring clinical symptoms or manifest illness. (Green time course: no clinical symptoms; yellow time course: transient clinical symptoms; red time course: manifest clinical symptoms; downward orange arrows: transient stress; downward red arrows: recurrent or enduring stress; upward green arrows: normal events related to development; upward tall red arrow: trauma/adverse effect; yellow star: epigenetic effect)
Figure 4a. A simple dynamic developmental model of the basic psychodynamics involved in the development of neurodevelopmental disorders. Illustration of the significance of the neurodevelopmental psychopathological load. The model illustrates a genetic vulnerable individual interacting with their social and physical/chemical environment at different stages across time. The course of illness may depend on the neurodevelopmental psychopathological load. At an early stage with no compensation, the appearance of clinical symptoms is dependent on the degree and character of the neurobiological vulnerability: I. (A): A large and severe neurodevelopmental pathological load may challenge an individual right from the beginning of life, with normal events related to development causing enduring stress that may result in enduring clinical symptoms and manifest neurodevelopmental disease, e.g. intellectual disability or low functioning autism spectrum disorder (ASD). II. A smaller, less severe neurodevelopmental pathological load and normal events related to development may result in either of two outcomes; (B): Symptoms within the normal range, e.g., transient anxiety or compulsive behavior; (C): Transient or more enduring stress and transient or more enduring clinical symptoms above the cutoff for a stress-related mental disorder, e.g., depression, social anxiety or obsessive compulsive disorder (OCD). The risk of clinical symptoms increases across time along with increases in social and cognitive challenges. At a later stage across time, merely resulting from events related to normal development, transient stress may cause transient neurodevelopmental symptoms, e.g., autistic or psychotic symptoms, outside the normal range, increasing the risk for more enduring clinical symptoms and the development of manifest clinical illness. (D): Enduring stress may result in more enduring clinical symptoms resulting in manifest clinical illness above the diagnostic cut off for a neurodevelopmental disorder, e.g. high functioning ASD or schizophrenia. (Green time course: no clinical symptoms; yellow time course: transient clinical symptoms; red time course: manifest clinical symptoms; downward orange arrows: transient stress; downward red arrows: recurrent or enduring stress; upward green arrows: normal events related to development; upward red arrow: normal events that challenge an individual reaching their limit of cognitive ability; yellow star: epigenetic effect)
Figure 4b. A simple dynamic developmental model of the basic psychodynamics involved in the development of neurodevelopmental disorders. Illustration of the significance of compensation. The model illustrates a vulnerable individual interacting with their social environment at different stages over time. The course of illness may depend on the ability to compensate. At an early stage with no compensation, clinical symptoms may depend on the degree and character of the neurobiological vulnerability and the social and physical/chemical environments. Normal events related to development may result in (A): Symptoms within the normal range, e.g., anxiety or compulsive behavior; or (B): Transient or more enduring stress and transient or more enduring clinical symptoms above the diagnostic cutoff for a stress-related disorder, e.g., depression or obsessive compulsive disorder (OCD). The risk of clinical symptoms increases over time along with increases in social and cognitive challenges. (C): Individual compensation may reduce the risk of clinical manifestations. (D): At a later stage, despite individual compensation, and merely resulting from events relating to normal development, transient stress may cause transient neurodevelopmental clinical symptoms outside the normal range, e.g., autistic or psychotic symptoms, increasing the risk for more enduring clinical symptoms and the development of manifest clinical illness above the diagnostic treshold for a neurodevelopmental disorder. (E): Enduring stress may affect the ability to compensate and result in more enduring clinical symptoms, resulting in manifest neurodevelopmental disorder above the diagnostic treshold. (Green time course: no clinical symptoms; yellow time course: transient clinical symptoms; red time course: manifest clinical symptoms; downward orange arrows: transient stress; downward red arrows: recurrent or enduring stress; upward green arrows: normal events related to development; upward red arrow: normal events that challenge an individual reaching their limit of cognitive ability; yellow star: epigenetic effect; horizontal backward blue arrow: individual compensation; horizontal backward red arrow: affected individual compensation)
Figure 5a. The vulnerability model. A simple model illustrating brain dynamics, gene-environment interactions, and the transdiagnostic and dimensional challenge. An individual represented by the brain, with the brain illustrated as a bar in the center of the figure, interacts with their social and physical/chemical environments, illustrated to the left. The information processing of sensory inputs presented to the brain results in a psychological interpretation of data, illustrated to the right. The brain acts as a filter of information by translating environmental and internal sensory stimuli into experiences of the mind. A cognitively vulnerable individual, with a susceptible brain (illustrated by the dotted lines) is at risk of misinterpreting sensory inputs. This risk may depend on environmental risk or resilience factors (red and green arrows pointing to the individual) or genetic influences related to the individual (e.g., the expression of susceptible genes associated with de novo mutations and/or common variants) (red and green arrows pointing from the individual). In the model, three different interpretations of the same stimuli are illustrated: a real interpretation (green text), a complex or abstract misinterpretation (yellow text), and finally a simple or concrete misinterpretation (red text). The two illustrated misinterpretations are possible psychopathological expressions, i.e., clinical symptoms observed across several neurodevelopmental disorders, and as such represent transdiagnostic psychopathological expressions. From a categorical point of view, observations of concrete misinterpretations are frequent in ASD, whereas elaborated abstract misinterpretations are frequent in psychotic disorders, including schizophrenia. From a dimensional point of view, the illustrated misinterpretations represent observations across a neuropsychopathological spectrum of reasoning, with real interpretations at the one end of the spectrum and concrete misinterpretations at the other end. The severity of disordered reasoning increases with increased neurodevelopmental pathological load, illustrated to the right in the figure with a black arrow
Figure 5b. The vulnerability model. Transdiagnostic dimensional approach. Examples of three psychopathological courses. Three psychopathological courses of illness are illustrated with clinical symptoms reflecting different dimensional expressions of the same brain related phenomena. (A): An example of a neurotypical course with transient at risk behavior observed at the time shortly after leaving school; (B): An example of a psychopathological course illustrating at risk behavior until adolescence, and onset of manifest neurodevelopmental clinical behavior shortly after leaving school with clinical symptoms including predominantly positive symptoms and Hyper-Theory-of Mind above the level of manifest neurodevelopmental illness; (C): An example of a psychopathological course illustrating at risk behavior until early childhood, and onset of manifest neurodevelopmental clinical behavior shortly after the start in child care with clinical symptoms including predominantly negative symptoms and Hypo-Theory-of Mind above the level of of manifest neurodevelopmental illness. (Blue twisted line: example of a neurotypical course; orange twisted line: example of a course of neurodevelopmental illness with predominantly positive symptoms and Hyper-Theory-of Mind; red twisted line: example of a course of neurodevelopmental illness with predominantly negative symptoms and Hypo-Theory-of Mind; upward red arrow: environmental risk factors; downward green arrow: environmental resilience factors)
Figure 5c. The vulnerability model. Categorical approach. Examples of three psychopathological courses. Three psychopathological courses of illness are illustrated with clinical symptoms reflecting different dimensional expressions of the same brain related phenomena. (A): An example of a neurotypical course with transient at risk behavior observed at the time shortly after leaving school; (B): An example of a psychopathological course illustrating at risk behavior until adolescence, and onset of manifest neurodevelopmental clinical behavior shortly after leaving school with clinical symptoms above the level of diagnostic cut off classified as schizophrenia; (C): An example of a psychopathological course illustrating at risk behavior until early childhood, and onset of manifest neurodevelopmental clinical behavior shortly after the start in child care with clinical symptoms above the level of diagnostic cut off classified as autism spectrum disorder. (Blue twisted line: example of a neurotypical course; orange twisted line: example of a course of schizophrenia; red twisted line: example of a course of autism spectrum disorder; upward red arrow: environmental risk factors; downward green arrow: environmental resilience factors)
Figure 6. A simple dynamic model of the development of neurodevelopmental illness. Gene-environment interaction across time affect time of clinical onset depending on the genetic and environmental risk load. The model illustrates a vulnerable individual interacting with their physical, chemical and social environment at different stages across development, which may trigger epigenetic mechanisms, alter stress-sensitivity and affect neurodevelopment and behavior. The figure illustrates how time of onset may depend on genetic load with environmental factors kept constant across cases except for the changes related to development across time. Depending on the kind of genetic susceptibility involved, prior epigenetic modification, and environmental influences affecting the development of manifest clinical symptoms, different courses of illness may result. The epigenetic effect of an exposure is added to the epigenetic effects of previous exposures across development throughout life. Three clinical courses are illustrated. (A): A course of illness with an early onset of neurodevelopmental illness; e.g., low functioning ASD (genetic vulnerability: de novo mutation); (B): A course of illness with a late onset of neurodevelopmental illness; e.g., high functioning ASD (genetic vulnerability: common genetic variants “at risk”); (C): A typical course of development with no onset of neurodevelopmental clinical illness (genetic vulnerability: neurotypical vulnerability). (Downward orange arrows: transient stress; downward red arrows: recurrent or enduring stress; yellow star: epigenetic effect)