Inhibition of absence seizures in a reduced corticothalamic circuit via closed-loop control

Yan Xie; Rui Zhu; Xiaolong Tan; Yuan Chai; Yan Xie; Rui Zhu; Xiaolong Tan; Yuan Chai

doi:10.3934/era.2023134

Electronic Research Archive

2023, Volume 31, Issue 5: 2651-2666. doi: 10.3934/era.2023134

Previous Article Next Article

Research article

Inhibition of absence seizures in a reduced corticothalamic circuit via closed-loop control

School of Mathematics and Physics, Shanghai University of Electric Power, Shanghai 201306, China

Academic Editor: Yanxiang Zhao

Received: 18 October 2022 Revised: 16 December 2022 Accepted: 27 February 2023 Published: 09 March 2023

Inhibition of spike-wave discharges (SWD) was thought to be associated with remission of seizure symptoms in absence epilepsy. In the previous studies, pulse stimulation was applied directly to the brain as an effective means to inhibit SWD. However, this method not only failed to provide a real-time tracking control of the disease, but also caused incalculable damage to the patient's brain tissue. To fill the gap, this work studied the mitigation and elimination effects of SWD by applying single-pulse coordinated resetting stimulation (SCRS) to three different neurons, firstly. Secondly, based on the 2I:2O cortical-thalamic model (2I:2O SCT model), four kinds of m:n on-off SCRS with the same period were compared, and the control efficiency and impulse energy consumption were combined to investigate these different stimulations. Finally, we further optimized the regulation strategies, including the weighted structure of stimulation and closed-loop control. Simulation results show that the weighted stimulation and closed-loop control strategy proposed here further improve control performance by reducing energy consumption, which may be more reliable in applications. Moreover, this study provides a new method for optimizing SCRS by the weighted processing and closed-loop control of electrical pulses to alleviate the absence epileptic state.

Keywords:

spike-wave discharges,
weighted stimulation,
single-pulse coordinated reset stimulation,
closed-loop control

Citation: Yan Xie, Rui Zhu, Xiaolong Tan, Yuan Chai. Inhibition of absence seizures in a reduced corticothalamic circuit via closed-loop control[J]. Electronic Research Archive, 2023, 31(5): 2651-2666. doi: 10.3934/era.2023134

Related Papers:

[1]	Hui Xu, Jun Kong, Mengyao Liang, Hui Sun, Miao Qi . Video behavior recognition based on actional-structural graph convolution and temporal extension module. Electronic Research Archive, 2022, 30(11): 4157-4177. doi: 10.3934/era.2022210
[2]	Zhiming Cai, Liping Zhuang, Jin Chen, Jinhua Jiang . Lightweight high-performance pose recognition network: HR-LiteNet. Electronic Research Archive, 2024, 32(2): 1145-1159. doi: 10.3934/era.2024055
[3]	Longkui Jiang, Yuru Wang, Weijia Li . Regress 3D human pose from 2D skeleton with kinematics knowledge. Electronic Research Archive, 2023, 31(3): 1485-1497. doi: 10.3934/era.2023075
[4]	Chengyong Yang, Jie Wang, Shiwei Wei, Xiukang Yu . A feature fusion-based attention graph convolutional network for 3D classification and segmentation. Electronic Research Archive, 2023, 31(12): 7365-7384. doi: 10.3934/era.2023373
[5]	Yunfei Tan, Shuyu Li, Zehua Li . A privacy preserving recommendation and fraud detection method based on graph convolution. Electronic Research Archive, 2023, 31(12): 7559-7577. doi: 10.3934/era.2023382
[6]	Bingsheng Li, Na Li, Jianmin Ren, Xupeng Guo, Chao Liu, Hao Wang, Qingwu Li . Enhanced spectral attention and adaptive spatial learning guided network for hyperspectral and LiDAR classification. Electronic Research Archive, 2024, 32(7): 4218-4236. doi: 10.3934/era.2024190
[7]	Muhammad Ahmad Nawaz Ul Ghani, Kun She, Muhammad Usman Saeed, Naila Latif . Enhancing facial recognition accuracy through multi-scale feature fusion and spatial attention mechanisms. Electronic Research Archive, 2024, 32(4): 2267-2285. doi: 10.3934/era.2024103
[8]	Wenrui Guan, Xun Wang . A Dual-channel Progressive Graph Convolutional Network via subgraph sampling. Electronic Research Archive, 2024, 32(7): 4398-4415. doi: 10.3934/era.2024198
[9]	Xinzheng Xu, Xiaoyang Zhao, Meng Wei, Zhongnian Li . A comprehensive review of graph convolutional networks: approaches and applications. Electronic Research Archive, 2023, 31(7): 4185-4215. doi: 10.3934/era.2023213
[10]	Yongsheng Lei, Meng Ding, Tianliang Lu, Juhao Li, Dongyue Zhao, Fushi Chen . A novel approach for enhanced abnormal action recognition via coarse and precise detection stage. Electronic Research Archive, 2024, 32(2): 874-896. doi: 10.3934/era.2024042

Abstract

1. Introduction

Human action recognition is one of the main research areas in the study of computer vision and machine learning, aiming to understand and assign a label to each action ^[1,2]. It appears in a wide range of applications such as video surveillance systems ^[3,4], human-computer interaction and robotics for human behavior characterization ^[5].

In general, most existing approaches in human action recognition fall into two main categories: RGB video based approaches ^[6,7,8] and skeleton-based methods ^{[9,10,11,12,13,14,15]} with respect to the types of input data. The RGB video based approaches are computationally expensive in processing RGB pixel information of videos, while the skeleton-based models are more efficient as they only need to calculate the 2D or 3D human joint point data. In addition, skeleton-based action recognition stands out due to its great potential in adaptability to dynamic circumstance and illumination variation ^{[16,17,18,19,20]}.

It is well-known that extracting effective features for background separation and representation learning in skeleton-based action recognition is a challenging problem. The traditional deep learning-based methods are good at assembling skeletons as a set of independent joint features ^[21,22] or pseudo-images ^[23,24], i.e., representing skeleton data as vector sequences or two-dimensional vectors. With the constructed features, models and spatial-temporal correlations between them are given and learned. However, such vector-based representation cannot fully exploit the intrinsic relationships of human joints on account of ignoring the dependencies and inherent connections of the related ones. Thereupon, graph convolution networks (GCN) ^[25] have been used to capture inter-joint interactions, in which the natural connections of human joint points can be modeled as a spatial graph that corresponds to a graph convolution layer. As such, GCN-based spatial-temporal model is built at different time steps.

For robust action recognition, the GCN-based skeleton graph prefers to extract multi-scale structural features, which are set subjectively ^[25]. However, the artificially designed graph structures involve connections that may hinder human actions in actual motion, resulting in models that are insufficient to effectively capture the spatio-temporal joint dependencies of complex regions. For example, there is usually a close relationship between hands and legs in the "running" action. However, it is difficult to obtain this relationship by an artificially constructed multi-scale structure. The recent approaches based on the self-attention mechanism (SA) can be used to learn the interactions between joints ^[26,27]. However, these methods only verify the feasibility of SA experimentally. To address these problems, we propose a unified framework, namely SA-GCN, that integrates the SA into the GCN for low-quality motion video data with fixed viewing angle^*. Compared with existing methods, this model fully exhibits the possible positive connections between different nodes, as shown in Figure 1.

^*This dataset includes the raw data without regularization, which is introduced in ST-GCN ^[25].

Figure 1. (a) Illustration of spatial-temporal GCN. (b) Illustration of mapping strategy used in SA, which exhibits the possible positive connections between different nodes.

DownLoad: Full-Size Img PowerPoint

To the best of our knowledge, this is the first study which pursues the impact of the SA mechanism on GCN in the area of action recognition (the latest and most representative study on SA-augmented GCNs was mainly to tackle the over-fitting and over-smoothing problems, e.g., see ^[28]). On the one hand, SA-GCN uses GCN as the basic component of the attention mechanism to extract the weights of joint point data in different frames of the same video. On the other hand, this component is beneficial to deepen the correlations of disconnected nodes of human joints, so that the learned long-distance temporal relationship is reflected in distant frames.

Extensive experiments on NTU-RGBD ^[29] and NTU120 ^[30] datasets demonstrate a superior performance of SA-GCN in terms of accruacy, and in comparison with a range of state of the art models across various skeleton-based action recognition tasks.

In summary, this paper claims three main contributions:

1) SA-GCN is a unified spatio-temporal graph convolution framework, which extracts better features by automatically learning the weight information between joint points at different scales. In addition, SA-GCN introduces a new SA mechanism with different scales for characterizing the multi-scale dependencies of joints, which greatly promotes the effectiveness in the learning of spatio-temporal features.

2) The proposed SA mechanism combines the structural informtion of the graph and the self-learning fusion features.

3) The proposed model achieves competitive performance for skeleton-based action recognition.

2. Related work

2.1. Graph-based neural networks

Graph neural networks (GNNs) have shown a great potential in computing irregular nodes in a graph ^[31]. Existing GNN models are mainly divided into two categories: spectral-based methods ^[32] and spatial-based approaches ^[27]. The former is based on feature decomposition, and performs convolution operations in the graph spectral domain. Thus the computational efficiency is limited due to the large amount of computation brought by feature decomposition. The latter (i.e., spatial-based GNNs) directly implements the operation of convolution on graph nodes and their neighbors in the spatial domain, thus is able to overcome the limitations of spectral-based methods. The method presented in this paper belongs to the second category.

Multi-scale GNNs are often used to capture the features for adjacent points. For example, ^[33] obtains image features by aggregating the higher-order polynomials of graph adjacency matrices. ^[34] represents multi-scale features by utilizing the graph adjacency matrix with a higher power. Nonetheless, these approaches inevitably and undesirably involve the artificial setting of structures for the adjacent points. Moreover, it is insufficient to rely only on the experimental verification that the setting of structures is optimal. Hence, the SA mechanism is proposed to lighten the problem of human interference.

2.2. SA mechanism

Attention mechanism ^[26] is critical in human cognition, which has revolutionized natural language processing and has been applied in image recognition ^[35], image synthesis ^[36] and video processing ^[37]. It is well-known that the human visual system captures structural features through partial glimpse and salient objects, rather than processing the whole at once ^[38].

Recently, attention has been proposed to improve the performance of convolutional neural networks (CNNs) for large-scale classification tasks. For example, a residual attention network (RAN) was introduced in an end-to-end training fashion ^[39], in which the attention modules are stacked and attention-aware features change adaptively as layers go deeper. Similarly, the channel-wise attention is incorporated into squeeze-and-excitation networks, which adaptively recalibrates the global average-pooled features by explicitly using interdependencies between channels ^[40].

However, the attention mechanisms have rarely been used in GCNs for action recognition. Considering the proven advantages of attention in computer vision tasks ^[41], it is natural to integrate SA into GCN as done in this paper, which reinterprets mathematically the positional relationship of each feature fusion in the attention mechanism. Experiments of large-scale dataset further demonstrate that SA mechanism can bring improvements in performance.

2.3. Skeleton-based action recognition

Traditional methods for skeleton-based action recognition can be divided into two types: handcrafted feature-based models ^[42] and deep learning-based approaches ^[43]. However, the performance of handcrafted feature-based methods is often unsatisfactory since local factors are used and the most important structural connections of the human body are ignored. The emergence of deep learning-based methods significantly improve the performance of skeleton recognition. For example, spatial graph convolution and interleaved temporal convolution are used for spatio-temporal characterization of the skeleton data (ST-GCN) ^[25]. Similarly, a multi-scale module is introduced to implement graph convolution modeling by raising the graph adjacency matrix of skeleton to a higher power ^[44]. In addition, the model of multi-scale adjacency matrix can also be performed by generating human poses and adding spatial graph convolution ^[45]. In ^[27], by using video frame segmentation, a set of videos is divided into four groups according to time and space for processing and combination, which expands the receptive field and achieves a great success in computer vision.

Different from the above methods, the graph convolution method proposed in this paper uses a SA mechanism. More specifically, the proposed approach uses GCN as part of the SA mechanism instead of forcing them to be summed as in existing methods ^[26]. In this way, the proposed method can better re-express the weights of the connection matrix.

3. GCN with SA mechanism

In this section, we present the proposed SA-GCN framework in detail.

3.1. ST-GCN for skeleton data

Skeleton data usually consists of a sequence of frames, in which each frame has a set of human joint coordinates in 2D or 3D form ^[25]. Naturally, a spatial-temporal graph for skeleton data can be constructed, in which the nodes are human joint points and the edges include the natural connections of joints in human body structure over time in motion.

The skeleton data is formulated as a undirected spatial temporal graph $G {\rm{ = }} (V, E)$ , where $V = \{ v_{ti} |t = 1, ..., T, i = 1..., N\}$ is the set of nodes including all the joints, $T$ denotes video frames which feature both intra-body and inter-frame connectons, $N$ is the total number of joints and $E$ is the edge set. Usually, $E$ is split into two subsets. One represents the intra-skeleton connection in each frame and the other depicts the inter-frame edges, which corresponds to the same joint point connections in consecutive frames. In fact, the ST-GCN is composed of spatial graph convolution and temporal graph convolution.

In the spatial dimension, the graph convolution is manipulated on each node and the corresponding neighbor set. For each node $v_{ti}$ , the neighbor set is given as:

$B_S (v_{ti} ) = \{ v_{tj} |d(v_{tj} ,v_{ti} ) \le D\},$

where $d(v_{tj}, v_{ti})$ is the shortest distance of any path from $v_{tj}$ to $v_{ti}$ . $D = 1$ means the 1-neighbor set of joint nodes. That is, if there is a connection between $v_{tj}$ and $v_{ti}$ , then $d(v_{tj}, v_{ti})$ is 1, and if there is no connection, it is 0.

According to the spatial partition strategies introduced in ^[25], the above neighbor set is divided into three labels, including: $1)$ the root node itself; $2)$ the centripetal subset which contains the adjacent nodes closer to the center of gravity than the root; $3)$ the centrifugal subset, which includes the adjacent nodes that are further away from the center of gravity than the root, as shown in (a) of Figure 1 (red, blue and orange represent three division strategies respectively). Hence, the spatial graph convolution is defined as:

$\begin{equation} f_{out} \left( {v_{ti} } \right) = \mathop \sum \limits_{v_{tj} \in B_S \left( {v_{ti} } \right)} \frac{1}{{Z_{ti} \left( {v_{tj} } \right)}}f_{in} \left( {v_{tj} } \right) \cdot W \left( {L_{ti} (v_{tj} )} \right) \end{equation}$

(3.1)

where $f_{in} (\cdot)$ and $f_{out} (\cdot)$ represent input and output features of this convolution layer, $v_{tj}$ denotes the $j$ - $th$ node of the graph on frame $t$ , and $B_S \left({v_{ti} } \right)$ is the sampling area of the convolution, characterized by the vertex with a distance of 1 from the target vertex. $W(\cdot)$ is the weighting function similar to convolution, which gives a weight vector based on the given data. $L_{ti}(\cdot)$ is the label function, which allocates a label from $0$ to $K-1$ to each node in $B_S (v_{ti})$ . Usually, we can set $K$ = 3 ^[25]. $Z_{ti}(\cdot)$ denotes the number of nodes in the subset of $B_S (\cdot)$ with the label $L_{ti}$ .

In practice, the skeletal sequence is usually denoted as a 3-order tensor of shape $C\times T\times N$ , where $C, T$ and $N$ are respectively the numbers of channels, frames and joints. Thus, the implementation of Eq (3.1) can be transformed into the tensor format, such as $f_{in}\in \mathbb{R}^{C_{in}\times T\times N}$ and $f_{out}\in \mathbb{R}^{C_{out}\times T\times N}$ that represent the input and output feature maps, respectively. To perform multiplication of tensors, we reorder the elements of a tensor into a matrix through unfolding or flattening.

More technically, the connections between nodes in the skeleton graph are represented as an adjacency matrix $A\in\{ 0, 1\} ^{N \times N}$ . The adjacency matrix corresponding to the $k$ - $th$ subset of the neighbor set $B_S (v_{ti})$ can be denoted as $A_k$ .

Let $F_{in}\in \mathbb{R}^{N\times C_{in}}$ be the input features of all joints in each frame, where $C_{in}$ is the dimensionality of the input feature, and $F_{out} \in \mathbb{R}^{N\times C_{out}}$ is the output feature from the spatial graph convolution, where $C_{out}$ is the dimensionality of output feature. Therefore, the implementation of the above Eq (3.1) can be rewritten as:

$\begin{equation} F_{out} = \mathop \sum \limits_{k = 0}^{K - 1} {\rm{M}}_k \odot \widetilde A_k F_{in} {\rm{W}}_k, \end{equation}$

(3.2)

where $\widetilde A_k = {\bf{\Lambda }}_k ^{ - \frac{1}{2}} {(\bf{A}}_k+I) {\bf{\Lambda }}_k ^{ - \frac{1}{2}}\in \mathbb{R}^{N\times N}$ is the normalized adjacent matrix for each partition label. ${\bf{\Lambda }}\in \mathbb{R}^{N\times N}$ is the diagonal degree matrix with ${\rm{\Lambda }}_k^{ii} = \mathop \sum \limits_j \left({A_k^{ij} } \right) + \alpha$ , and $\alpha$ is a small number to avoid division by zero. $\odot$ is the element-wise multiplication. ${\rm{M}}_k\in \mathbb{R}^{N\times N}$ and ${\rm{W}}_k\in \mathbb{R}^{C_{in}\times C_{out}}$ are learnable weight matrices for each partition label, which capture the strength of connections (i.e., edge weights) and the feature importance respectively.

In the temporal dimension, since the graph is given by corresponding joints in the two adjacent frames, the number of neighbors for each node is 2. Accordingly, the temporal graph convolution is similar to the classical convolutional operation, which is performed on the $T\times N$ output feature matrix mentioned above. Generally, the kernel size is set to 9, as in ^[10,11,25]. However, it should be emphasized that the respective fields for the spatial and temporal dimensions are set artificially. Thus, it is necessary that a SA mechanism is enforced to reduce the dependance on human interference.

3.2. SA module

As illustrated in the previous Section 3.1, the graph convolution can be obtained by a standard 2D convolution and a multiplication between the resulting tensor and the normalized adjacency marix under the spatial temporal cases. To automatically capture the relationships within the matrix, we introduce a SA operation into ST-GCN to further focus on the important regions and increase the model's attention to important joint point information.

The SA function can be viewed as a mapping from the query and a set of key-value pairs to an output ^[8,26]. For an input tensor of shape $(C_{in}, T, N)$ , we flatten it to the corresponding matrix $X\in \mathbb{R}^{C_{in}T\times N}$ and perform the single head attention. Hence, the output of the SA mechanism can be formulated as:

$\begin{equation} Attention(Q,K,V) = softmax (\frac{{(QK^T )}}{{\sqrt{d_k}}})V, \end{equation}$

(3.3)

where $W_Q, W_K\in \mathbb{R}^{C_{in}\times d_k}$ and $W_V\in \mathbb{R}^{C_{in}\times d_k}$ are learned linear transformations that map the input $X$ to queries $Q = XW_Q$ , keys $K = XW_K$ and values $V = XW_V$ . $d_k$ is the depth of queries and keys.

Specifically, SA explores the correlations in a skeletal sequence, which are calculated for each position by the weighted sum over all positions. Moreover, the weight of each position in the similarity matrix is dynamic. Let $x$ denote the input signal and $y$ be the output. The SA module can be written as:

$\begin{equation} y(x) = \sigma (h(f(x) \cdot g(x))+x), \end{equation}$

(3.4)

where $f(x) = softmax (\theta (x)^T \phi (x))$ is used to generate the similarity matrix. $\theta (x) = W_\theta x$ , $\phi (x) = W_\phi x$ , $g(x) = W_g x$ , $h(x) = W_h x$ , and $W_h$ , $W_\theta$ , $W_\phi$ and $W_g$ are learnable. $\sigma (\cdot)$ is the activation function. In fact, here the similarity matrix is a weighted adjacent matrix in Eq (3.2). Hence, Equation (3.4) is named as SA-GCN.

3.3. Spatial graph convolution of SA-GCN

It is common sense for each action recognition category that the motion and nodes in different frames are different. Therefore, it is not enough to express the intra-skeleton connections in each frame and the inter-frame correlations by only learning a total joint point weight matrix as in ST-GCN (Eq (3.4)). To address this problem, we propose to learn and represent all weights of the node for each frame by the attention mechanism.

In traditional methods of attention, the input is usually expressed as the addition of the input matrix and the position variable, as shown in Figure 2(a). It is formally defined as:

$\begin{equation} x_{out} = x_{in} + A \end{equation}$

(3.5)

Figure 2. The difference for transform architecture between traditional methods and the proposed approach.

DownLoad: Full-Size Img PowerPoint

where $A$ can be represented as the positional relationship of the input nodes, and $x_{in}$ and $x_{out}$ represent the original data and the fusion data with the position variable.

However, in this paper we do not use the simple addition to fuse the input data and location variables. Instead, as shown in Figure 1(b), feature fusion is expressed as:

$\begin{equation} x_{out} = D_{norm} x_{in}, \end{equation}$

(3.6)

where $x_{in}$ and $x_{out}$ represent the input and output matrices of the attention mechanism. $D_{norm}$ denotes the Laplacian regularization matrix, which represents the position information that can be formulated as $D_{norm} = {\bf{\Lambda }}^{ - \frac{1}{2}} ({\bf{A}} + {\bf{I}}){\bf{\Lambda }}^{ - \frac{1}{2}}$ . $A$ is the adjacency matrix, $I$ is the identity matrix, and ${\bf{\Lambda }}$ is the degree matrix of the node. In this paper, $D_{norm}$ is implemented as ${\rm{M}}_k\odot \widetilde A_k$ .

Compared to the traditional methods, the regularization matrix $D_{norm}$ in Eq (3.6) guarantees the adaptive choice of the position matrix instead of manual setting. Simultaneously, we utilize the multiplication between input matrix and the position matrix (i.e., $D_{norm} f_{in}$ ), rahter than addition of the two in the traditional transformer architecture.

Next, we process the resulting matrix of Eq (3.6) to find the weight relationship between the different nodes of the input matrix. By integrating the attention module into Eq (3.6), the output result can be converted into multiple data matrices containing different information streams, as shown in Figure 3. Therefore, the conversion formula for input data is denoted as:

$\begin{equation} \begin{split} Q = D_{norm} x_{in} W_Q,\\ K = D_{norm} x_{in} W_K,\\ V = D_{norm} x_{in} W_V, \end{split} \end{equation}$

(3.7)

Figure 3. The flow chart of attention mechanism, which illustrates how input matrix is converted to a data matrix with different information streams.

DownLoad: Full-Size Img PowerPoint

where $W_Q$ , $W_K$ and $W_V$ are the corresponding parameter matrices. $d_k$ in Eq (3.3) is defined as the row vector of the input data, that is, the number of joint points. Specifically, it is proved experimentally that using batchnorm in Eq (3.3) is more effective than the softmax function.

We then set $W_{att} = Attention(Q, K, V)$ . By using Eqs (3.3), (3.4) and (3.6), the proposed network can be written as:

$\begin{equation} f_{out} = \sigma(W_{att}). \end{equation}$

(3.8)

We can see from Eq (3.8) that SA-GCN assembles a single module of the attention mechanism in a form similar to the spatial GCN (Eq (3.4)). Moreover, the proposed attention mechanism is heuristic since the mathematical formula of Eq (3.7) coincides with the input form Eq (3.6) proposed in this paper. The flow chart of SA-GCN is generalized in , in which the proposed weight matrix of each frame is combinated with the original joint point weight matrix $M_k$ of ST-GCN in Eq (3.2).

Figure 4. The spatio-temporal flow chart of SA-GCN.

DownLoad: Full-Size Img PowerPoint

An important theoretical advantage of SA-GCN is that the attention mechanism in SA-GCN is interpretable. Specifically, the standard GCN is used as a component of the attention mechanism, which has a more theoretical derivation in the processing of data location variables. This enhances the dependency of the data on the existence of different joints. Specifically, the existing methods only describe the relationship between key nodes and adjacent joint points. However, the features processed by the proposed SA in this paper more fully demonstrate the positive influence between key points and all other nodes involved in the action (for example, the mapping strategy from (a) to (b) shown in Figure 1).

3.4. Temporal graph convolution of SA-GCN

The previous subsection (i.e., Section 3.3) is focused on improving the spatial method in SA-GCN, for which the attention mechanism is used to extract the joint weights of different frames in a video. On the other hand, it is well recognized that temporal modeling is also essential for video action recognition. For temporal modeling, we adopt the classical 2D graph convolution with kernel size 9 as introduced in the original ST-GCN ^[25]. That is

$\begin{equation} f_{out} = Cov2D[K+1][f_{in}]. \end{equation}$

(3.9)

where $D = 1$ and $K = 9$ .

3.5. Spatial temporal graph convolution of SA-GCN

To deploy the proposed SA-GCN on video data, we build a concrete network structure upon the spatio-temporal graph convolution, as shown in Figure 4(a). Specifically, the network utilizes 10 SA-GCNs to perform feature fusion and feature extraction on the input spatio-temporal skeleton data. And the feature information is aggregated by global average pooling. Finally, the prediction result is given by the fully connected layer and the softmax layer. Among them, each SA-T block in (b) of Figure 4 is composed of a temporal graph convolution and a spatial graph convolution respectively, to extract a mixture of spatio-temporal features, and it also uses the residual structure to to prevent data overfitting and improve the generalization ability of the model, as shown in Figure 4(b).

In summary, SA-GCN jointly learns the total weight matrix and the weight matrix of each frame, which are expected to greatly improve the performance of the original ST-GCN. The next section will present the experimental results and illustrate the potential of SA-GCN.

4. Experiments

4.1. Datasets

NTU-60 RGB+D Dataset. This is currently the largest and most widely used indoor action recognition dataset, which contains 56,000 action clips across 60 action categories ^[29]. The clips were performed by 40 volunteers in various age groups ranging from 10 to 35 years old. Each action was obtained by three cameras at the same height with different horizontal angles: $-45^\circ$ , $0^\circ$ , and $45^\circ$ . This dataset includes location information of 3D joints for each frame detected by Kinect depth sensors, where 25 joints are contained for each subject of the skeleton sequences and each video has no more than 2 subjects. We evaluate the proposed model using two benchmarks derived from the NTU-60 RGB+D dataset, according to the metrics introduced in ^[25]. The two benchmarks are:

1) Cross-subject (X-Sub/CS): The dataset based on this benchmark is divided into a training set including 40,320 videos and a validation set containing 16,560 videos, where the persons within the videos are different.

2) Cross-view (X-View/CV): The dataset based on this benchmark has the cameras with different horizontal angles, where 37,920 videos in the training set are obtained from the angles ( $0^\circ$ and $45^\circ$ ), and 18,960 videos in the validation set are captured from the angle $-45^\circ$ ).

NTU-120 RGB+D Dataset. This is a large-scale dataset for 3D skeleton-based action recognition, which is obtained from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames ^[30]. In fact, this dataset can be seen as an expansion of the NTU-60 RGB+D dataset in the number of performers and action categories. It has 120 different action classes including daily actions, mutual behaviors, and health-related activities. Two specific benchmarks are derived from this dataset, namely X-Sub and cross-setup.

1) X-Sub: The dataset consists of training and testing groups, where each group contains 53 subjects.

2) X-View: The dataset divides samples into training and testing groups, in which the even setup samples are used for training and the odd setup samples are used for testing.

We follow this rule and give the top-1 accuracy about X-Sub and X-View in all experiments.

4.2. Training details

The model proposed in this paper is composed of nine SA-GCN stacks. The first 4 SA-GCN channels of the model are 64, the number of channels is doubled from the 5th to the 7th, and the last three are 256. The convolution stride is set to 2. For fairness in the experimental comparisons, the parameters and number of channels used in different comparative models are the same. Specifically, we train the model for 80 epochs using stochastic gradient descent (SGD) with Nesterov momentum (0.9), batch size 24, and initial learning rate 0.1. In the tenth and fifth iterations, the learning rate is reduced tenfold, respectively. For video data with more than two people, only the first two people are selected, and all skeleton data is filled to T = 300. In addition, this paper chooses a data preprocessing method similar to ST-GCN, in which only skeleton data is selected as the comparison data and the influence of other data streams is not considered. All experiments were conducted on PyTorch with 4 TITANX GPUs.

4.3. Comparison with the state of theart methods

In this section, we perform our proposed method on the NTU-60 and NTU-120 datasets to compare with 3 previous state of the art approaches, which include ST-GCN ^[25], two-stream adaptive GCN (2s-AGCN) ^[27] and GCN-NAS^† ^[46]. For fair comparisons, we use the data extraction method suggested by ST-GCN ^[25] in all experiments, which is raw data without preprocessing such as regularization.

^†GCN-NAS: learning GCN for skeleton-based human action recognition by neural searching.

4.3.1. Baselines

a ST-GCN ^[25]: It is a generic representation of skeleton sequences for action recognition by extending GCNs to a spatial-temporal graph model.

b 2s-AGCN ^[27]: It is the two-stream adaptive GCN for skeleton-based action recognition, which models both the first-order and the second-order information simultaneously to improve the acuuracy of recognition.

c GCN-NAS ^[46]: It is the automatically designed GCN for skeleton-based action recognition based on neural architecture search, which uses a sampling-and memory-efficient evolution strategy to find the optimal architecture for recognition.

d STAR ^[47]: It is an action recognition model based on sparse transformer.

e TSTE ^[48]: It is a two-stream transformer encoder network based on spatio-temporal feature and shape transformation.

f TAG ^[49]: It is a generalization of ST-GCN based on weak feature extraction.

The proposed model improves the spatial convolution in ST-GCN by adding a SA mechanism, which realizes the application of the attention module in GCN and strengthens the ability of the model to extract joint weights.

4.3.2. Experiment results

summarizes the results on the two benchmarks of the NTU-60 dataset. The comparison is based on the same data processing method and the same skeleton data flow. We can find that SA-GCN gives the best results on X-view and X-Sub. For instance, compared to ST-GCN ^[25], the accuracy of SA-GCN is increased by about 3.2 $\%$ on both X-View and X-Sub. On X-View, the accuracy of SA-GCN is 1.7 $\%$ higher than that of 2s-AGCN ^[27] and GCN-NAS ^[46]. On X-Sub, SA-GCN achieves an improvement of 5.6 $\%$ and 2.0 $\%$ than 2s-AGCN ^[27] and GCN-NAS ^[46] respectively.

Table 1. Skeleton-based action recognition performance on NTU-60 dataset. We report the accuracy on both X-Sub and X-View benchmarks.

Methods	X-View	X-Sub
ST-GCN^[25]	88.3	81.5
2s-AGCN^[27]	89.8	79.1
GCN-NAS^[46]	89.8	82.7
STAR^[47]	89.0	83.4
TSTE^[48]	85.3	80.5
TAG^[49]	90.0	82.1
SA-GCN(ours)	$\bf 91.5$	$\bf 84.7$

| Show Table

DownLoad: CSV

The results on the NTU-120 are shown in Table 2. SA-GCN is able to outperform ST-GCN and 2s-AGCN on both X-View and X-Sub. Compared with GCN-NAS, the accuracy of the SA-GCN achieves suboptimal. This is due to the higher complexity of the GCN-NAS model. However, our model does not involve complex parameters and is basically the same as ST-GCN. Meanwhile, it should be emphasized that the attention mechanisms advocated in SA-GCN all share parameters.

Table 2. Skeleton-based action recognition performance on NTU-120 dataset. We report the accuracy on both X-Sub and X-View benchmarks.

Methods	X-View	X-Sub
ST-GCN^[25]	78.1	75.5
2s-AGCN^[27]	77.5	78.0
GCN-NAS^[46]	80.5	78.0
STAR^[47]	80.2	78.3
TSTE^[48]	67.5	66.6
SA-GCN(ours)	79.2	$\bf 78.5$

| Show Table

DownLoad: CSV

In this part, we give the influences of the SA-GCN on NTU60 dataset, which is shown in Table 3. As can be seen from these experiments, the proposed model with SA-GCN generates unanimously more promising results than the other methods.

Table 3. Ablation study of the SA module and GCN on NTU60 dataset.

SAM	GCN	X-View	X-Sub
√	$\times$	80.0	73.0
$\times$	√	88.3	81.5
√	√	91.5	84.7

| Show Table

DownLoad: CSV

5. Conclusions and future work

In this work, we have proposed a unified spatio-temporal SA-GCN for low-quality motion video data with fixed viewing angle, where the designed SA module can be regarded as a GCN. Based on this module, the proposed model can extract features efficiently by learning weight between joint points of different scales. Furthermore, the designed SA mechanism not only characterizes the multi-scale dependencies of joints, but also integrates the structural features of the graph and the ability of self-learning fusion features. Moreover, since the parameters in the attention mechanism are shared, the total number of parameters of the model does not increase significantly. Extensive experiments on NTU-60 RGB+D and NTU-120 RGB+D datasets show that the proposed model achieves substantial improvements over mainstream methods. In the future, we will focus on optimization of the model structure and application in random motion videos.

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgment

This work was supported in part by the National Nature Science Foundation of China (62072312, 62071303, 61972264), in part by Shenzhen Basis Research Project (JCYJ20210324094009026, JCYJ20200109105832261), the Project of Educational Commission of Guangdong Province (2023KTSCX116) and National Nature Science Foundation of Guangdong Province (2023A1515011394).

Conflict of interest

The authors declare there is no conflict of interest.

References

[1]	A. Vezzani, S. Balosso, T. Ravizza, Neuroinflammatory pathways as treatment targets and biomarkers in epilepsy, Nat. Rev. Neurol., 15 (2019), 459-472. https://doi.org/10.1038/s41582-019-0217-x doi: 10.1038/s41582-019-0217-x
[2]	W. O. Pickrell, M. P. Kerr, SUDEP and mortality in epilepsy: The role of routinely collected healthcare data, registries, and health inequalities, Epilepsy Behav., 103 (2020), 106453. https://doi.org/10.1016/j.yebeh.2019.106453
[3]	A. Ali, S. Wu, N. P. Issa, S. Rose, V. L. Towle, P. Warnke, et al., Association of sleep with sudden unexpected death in epilepsy, Epilepsy Behav., 76 (2017), 1-6. https://doi.org/10.1016/j.yebeh.2017.08.021 doi: 10.1016/j.yebeh.2017.08.021
[4]	E. C. Wirrell, R. Nabbout, I. E. Scheffer, T. Alsaadi, A. Bogacz, J. A. French, et al., Methodology for classification and definition of epilepsy syndromes with list of syndromes: Report of the ILAE Task Force on Nosology and Definitions, Epilepsia, 63 (2022), 1333-1348. https://doi.org/10.1111/epi.17237 doi: 10.1111/epi.17237
[5]	M. Bosak, A. Słowik, R. Kacorzyk, W. Turaj, Implementation of the new ILAE classification of epilepsies into clinical practice—A cohort study, Epilepsy Behav., 96 (2019), 28-32. https://doi.org/10.1016/j.yebeh.2019.03.045 doi: 10.1016/j.yebeh.2019.03.045
[6]	C. M. Korff, I. E. Scheffer, Epilepsy classification: a cycle of evolution and revolution, Curr. Opin. Neurol., 26 (2013), 163-167. https://doi.org/10.1097/WCO.0b013e32835ee58e doi: 10.1097/WCO.0b013e32835ee58e
[7]	S. A. Demin, O. Y. Panischev, S. F. Timashev, R. R. Latypov, Analysis of Interictal EEG Signal Correlations for Diagnostics of Epilepsy, Bull. Russ. Acad. Sci. Phys., 84 (2020), 1349-1353. https://doi.org/10.3103/S1062873820110088 doi: 10.3103/S1062873820110088
[8]	R. Sharma, R. B. Pachori, Classification of epileptic seizures in EEG signals based on phase space representation of intrinsic mode functions, Expert Syst. Appl., 42 (2015), 1106-1117. https://doi.org/10.1016/j.eswa.2014.08.030 doi: 10.1016/j.eswa.2014.08.030
[9]	H. Soh, R. Pant, J. J. LoTurco, A. V. Tzingounis, Conditional deletions of epilepsy-associated KCNQ2 and KCNQ3 channels from cerebral cortex cause differential effects on neuronal excitability, J. Neurosci., 34 (2014), 5311-5321. https://doi.org/10.1523/JNEUROSCI.3919-13.2014 doi: 10.1523/JNEUROSCI.3919-13.2014
[10]	P. A. Winkler, C. Vollmar, K. G. Krishnan, T. Pfluger, H. Brückmann, S. Noachtar, Usefulness of 3-D reconstructed images of the human cerebral cortex for localization of subdural electrodes in epilepsy surgery, Epilepsy Res., 41 (2000), 169-178. https://doi.org/10.1016/S0920-1211(00)00137-6 doi: 10.1016/S0920-1211(00)00137-6
[11]	J. T. Paz, T. J. Davidson, E. S. Frechette, B. Delord, I. Parada, K. Peng, et al., Closed-loop optogenetic control of thalamus as a tool for interrupting seizures after cortical injury, Nat. Neurosci., 16 (2013), 64-70. https://doi.org/10.1038/nn.3269 doi: 10.1038/nn.3269
[12]	N. Li, J. E. Downey, A. Bar-Shir, A. A. Gilad, P. Walczak, H. Kim, et al., Optogenetic-guided cortical plasticity after nerve injury, Proc. Natl. Acad. Sci. USA, 108 (2011), 8838-8843. https://doi.org/10.1073/pnas.1100815108 doi: 10.1073/pnas.1100815108
[13]	L. van Veen, K. R. Green, Periodic solutions to a mean-field model for electrocortical activity, Eur. Phys. J. Spec. Top., 223 (2014), 2979-2988. https://doi.org/10.1140/epjst/e2014-02311-y doi: 10.1140/epjst/e2014-02311-y
[14]	P. N. Taylor, Y. Wang, M. Goodfellow, J. Dauwels, F. Moeller, U. Stephani, et al., A computational study of stimulus driven epileptic seizure abatement, PLoS One, 9 (2014), e114316. https://doi.org/10.1371/journal.pone.0114316 doi: 10.1371/journal.pone.0114316
[15]	M. Chen, D. Guo, T. Wang, W. Jing, Y Xia, P. Xu, et al., Bidirectional control of absence seizures by the basal ganglia: A computational evidence, PLoS Comput. Biol., 10 (2014), e1003495. https://doi.org/10.1371/journal.pcbi.1003495 doi: 10.1371/journal.pcbi.1003495
[16]	I. S. Cooper, A. R. Upton, I. Amin, Chronic cerebellar stimulation (CCS) and deep brain stimulation (DBS) in involuntary movement disorders, Appl. Neurophysiol., 45 (1982), 209-217. https://doi.org/10.1159/000101601 doi: 10.1159/000101601
[17]	S. F. Lempka, M. D. Johnson, S. Miocinovic, J. L. Vitek, C. C. McIntyre, Current-controlled deep brain stimulation reduces in vivo voltage fluctuations observed during voltage-controlled stimulation, Clin. Neurophysiol., 121 (2010), 2128-2133. https://doi.org/10.1016/j.clinph.2010.04.026 doi: 10.1016/j.clinph.2010.04.026
[18]	R. G. Cury, V. Fraix, A. Castrioto, M. A. P. Fernández, P. Krack, C. S. habardes, et al., Thalamic deep brain stimulation for tremor in Parkinson disease, essential tremor, and dystonia, Neurology, 89 (2017), 1416-1423. https://doi.org/10.1212/WNL.0000000000004295 doi: 10.1212/WNL.0000000000004295
[19]	V. A. Coenen, N. Allert, B. Mädler, A role of diffusion tensor imaging fiber tracking in deep brain stimulation surgery: DBS of the dentato-rubro-thalamic tract (drt) for the treatment of therapy-refractory tremor, Acta. Neurochir., 153 (2011), 1579-1585. https://doi.org/10.1007/s00701-011-1036-z doi: 10.1007/s00701-011-1036-z
[20]	R. Arya, H. M. Greiner, A. Lewis, F. T. Mangano, C. Gonsalves, K. D. Holland, et al., Vagus nerve stimulation for medically refractory absence epilepsy, Seizure, 22 (2013), 267-270. https://doi.org/10.1016/j.seizure.2013.01.008 doi: 10.1016/j.seizure.2013.01.008
[21]	Z. Ye, L. McQuillan, A. Poduri, T. E. Green, N. Matsumoto, H. C. Mefford, et al., Somatic mutation: the hidden genetics of brain malformations and focal epilepsies, Epilepsy Res., 155 (2019), 106161. https://doi.org/10.1016/j.eplepsyres.2019.106161 doi: 10.1016/j.eplepsyres.2019.106161
[22]	E. A. Van Vliet, E. Aronica, A. Vezzani, T. Ravizza, Neuroinflammatory pathways as treatment targets and biomarker candidates in epilepsy: emerging evidence from preclinical and clinical studies, Neuropath. Appl. Neuro., 44 (2018), 91-111. https://doi.org/10.1111/nan.12444 doi: 10.1111/nan.12444
[23]	T. Lundgren, J. Dahl, N. Yardi, L. Melin, Acceptance and commitment therapy and yoga for drug-refractory epilepsy: A randomized controlled trial, Epilepsy Behav., 13 (2008), 102-108. https://doi.org/10.1016/j.yebeh.2008.02.009 doi: 10.1016/j.yebeh.2008.02.009
[24]	C. Joint, D. Nandi, S. Parkin, R. Gregory, T. Aziz, Hardware-related problems of deep brain stimulation, Movement Disord., 17 (2002), S175-S180. https://doi.org/10.1002/mds.10161 doi: 10.1002/mds.10161
[25]	Z. Peng-Chen, T. Morishita, D. Vaillancourt, C. Favilla, K. D. Foote, M. S. Okun, et al., Unilateral thalamic deep brain stimulation in essential tremor demonstrates long-term ipsilateral effects, Parkinsonism Relat. Disord., 19 (2013), 1113-1117. https://doi.org/10.1016/j.parkreldis.2013.08.001 doi: 10.1016/j.parkreldis.2013.08.001
[26]	M. K. Hosain, A. Kouzani, S. Tye, Closed loop deep brain stimulation: an evolving technology, Australas. Phys. Eng. Sci. Med., 37 (2014), 619-634. https://doi.org/10.1007/s13246-014-0297-2 doi: 10.1007/s13246-014-0297-2
[27]	J. Volkmann, E. Moro, R. Pahwa, Basic algorithms for the programming of deep brain stimulation in Parkinson's disease, Movement Disord., 21 (2006), S284-S289. https://doi.org/10.1002/mds.20961 doi: 10.1002/mds.20961
[28]	D. Fan, Q. Wang, Closed-loop control of absence seizures inspired by feedback modulation of basal ganglia to the corticothalamic circuit, IEEE Trans. Neur. Sys. Reh., 28 (2020), 581-590. https://doi.org/10.1109/TNSRE.2020.2969426 doi: 10.1109/TNSRE.2020.2969426
[29]	C. Liu, J. Wang, B. Deng, X. Wei, H. Yu, H. Li, et al., Closed-loop control of tremor-predominant parkinsonian state based on parameter estimation, IEEE Trans. Neur. Sys. Reh., 24 (2016), 1109-1121. https://doi.org/10.1109/TNSRE.2016.2535358 doi: 10.1109/TNSRE.2016.2535358
[30]	B. Rosin, M. Slovik, R. Mitelman, M. Rivlin-Etzion, S. N. Haber, Z. Israel, et al., Closed-loop deep brain stimulation is superior in ameliorating parkinsonism, Neuron, 72 (2011), 370-384. https://doi.org/10.1016/j.neuron.2011.08.023 doi: 10.1016/j.neuron.2011.08.023
[31]	B. Hu, Q. Wang, Controlling absence seizures by deep brain stimulus applied on substantia nigra pars reticulata and cortex, Chaos Soliton. Fract., 80 (2015), 13-23. https://doi.org/10.1016/j.chaos.2015.02.014 doi: 10.1016/j.chaos.2015.02.014
[32]	C. Liu, J. Wang, H. Li, C. Fietkiewicz, K. A. Loparo, Modeling and analysis of beta oscillations in the basal ganglia, IEEE Trans. Neur. Net. Lear. Sys., 29 (2017), 1864-1875. https://doi.org/10.1109/TNNLS.2017.2688426 doi: 10.1109/TNNLS.2017.2688426
[33]	P. A. Robinson, P. N. Loxley, S. C. O'connor, C. J. Rennie, Modal analysis of corticothalamic dynamics, electroencephalographic spectra, and evoked potentials, Phys. Rev. E, 63 (2001), 041909. https://doi.org/10.1103/PhysRevE.63.041909
[34]	V. K. Jirsa, H. Haken, Field theory of electromagnetic brain activity, Phys. Rev. Lett., 77 (1996), 960. https://doi.org/10.1103/PhysRevLett.77.960 doi: 10.1103/PhysRevLett.77.960
[35]	D. L. Rowe, P. A. Robinson, C. J. Rennie, Estimation of neurophysiological parameters from the waking EEG using a biophysical model of brain dynamics, J. Theor. Biol., 231 (2004), 413-433. https://doi.org/10.1016/j.jtbi.2004.07.004 doi: 10.1016/j.jtbi.2004.07.004
[36]	S. Coombes, Large-scale neural dynamics: simple and complex, NeuroImage, 52 (2010), 731-739. https://doi.org/10.1016/j.neuroimage.2010.01.045 doi: 10.1016/j.neuroimage.2010.01.045
[37]	D. Terman, J. E. Rubin, A. C. Yew, C. J. Wilson, Activity patterns in a model for the subthalamopallidal network of the basal ganglia, J. Neurosci., 22 (2002), 2963-2976. https://doi.org/10.1523/JNEUROSCI.22-07-02963.2002 doi: 10.1523/JNEUROSCI.22-07-02963.2002
[38]	M. E. Anderson, N. Postupna, M. Ruffo, Effects of high-frequency stimulation in the internal globus pallidus on the activity of thalamic neurons in the awake monkey, J. Neurophysi., 89 (2003), 1150-1160. https://doi.org/10.1152/jn.00475.2002 doi: 10.1152/jn.00475.2002
[39]	A. Benazzouz, D. M. Gao, Z. G. Ni, B. Piallat, R. Bouali-Benazzouz, A. L. Benabid, Effect of high-frequency stimulation of the subthalamic nucleus on the neuronal activities of the substantia nigra pars reticulata and ventrolateral nucleus of the thalamus in the rat, Neuroscience, 99 (2000), 289-295. https://doi.org/10.1016/S0306-4522(00)00199-8 doi: 10.1016/S0306-4522(00)00199-8
[40]	Z. Wang, Q. Wang, Stimulation strategies for absence seizures: targeted therapy of the focus in coupled thalamocortical model, Nonlinear Dyn., 96 (2019), 1649-1663. https://doi.org/10.1007/s11071-019-04876-z doi: 10.1007/s11071-019-04876-z
[41]	D. Fan, Y. Zheng, Z. Yang, Q. Wang, Improving control effects of absence seizures using single-pulse alternately resetting stimulation (SARS) of corticothalamic circuit, Appl. Math. Mech., 41 (2020), 1287-1302. https://doi.org/10.1007/s10483-020-2644-8 doi: 10.1007/s10483-020-2644-8
[42]	M. Chen, D. Guo, M. Li, T. Ma, S. Wu, J. Ma, et al., Critical roles of the direct GABAergic pallido-cortical pathway in controlling absence seizures, PLoS Comput. Biol., 11 (2015), e1004539. https://doi.org/10.1371/journal.pcbi.1004539 doi: 10.1371/journal.pcbi.1004539

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Electronic Research Archive

1.1 1.7

Metrics

Article views(1496) PDF downloads(68) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(7)

Electronic Research Archive

Inhibition of absence seizures in a reduced corticothalamic circuit via closed-loop control

Related Papers:

Abstract

1. Introduction

2. Related work

2.1. Graph-based neural networks

2.2. SA mechanism

2.3. Skeleton-based action recognition

3. GCN with SA mechanism

3.1. ST-GCN for skeleton data

3.2. SA module

3.3. Spatial graph convolution of SA-GCN

3.4. Temporal graph convolution of SA-GCN

3.5. Spatial temporal graph convolution of SA-GCN

4. Experiments

4.1. Datasets

4.2. Training details

4.3. Comparison with the state of theart methods

4.3.1. Baselines

4.3.2. Experiment results

5. Conclusions and future work

Use of AI tools declaration

Acknowledgment

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Electronic Research Archive

Inhibition of absence seizures in a reduced corticothalamic circuit via closed-loop control

Related Papers:

Abstract

1. Introduction

2. Related work

2.1. Graph-based neural networks

2.2. SA mechanism

2.3. Skeleton-based action recognition

3. GCN with SA mechanism

3.1. ST-GCN for skeleton data

3.2. SA module

3.3. Spatial graph convolution of SA-GCN

3.4. Temporal graph convolution of SA-GCN

3.5. Spatial temporal graph convolution of SA-GCN

4. Experiments

4.1. Datasets

4.2. Training details

4.3. Comparison with the state of theart methods

4.3.1. Baselines

4.3.2. Experiment results

5. Conclusions and future work

Use of AI tools declaration

Acknowledgment

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog