Features gradient-based signals selection algorithm of linear complexity for convolutional neural networks

Yuto Omae; Yusuke Sakai; Hirotaka Takahashi; Yuto Omae; Yusuke Sakai; Hirotaka Takahashi

doi:10.3934/math.2024041

AIMS Mathematics

2024, Volume 9, Issue 1: 792-817. doi: 10.3934/math.2024041

Previous Article Next Article

Research article Special Issues

Features gradient-based signals selection algorithm of linear complexity for convolutional neural networks

1.
College of Industrial Technology, Nihon University, 1-2-1, Izumi, Narashino, Chiba 275-8575, Japan
2.
Research Center for Space Science, Advanced Research Laboratories and Department of Design and Data Science, Tokyo City University, Kanagawa 224-8551, Japan

Received: 19 August 2023 Revised: 06 November 2023 Accepted: 15 November 2023 Published: 04 December 2023
MSC : 68T01, 68T07, 68T20

Recently, convolutional neural networks (CNNs) for classification by time domain data of multi-signals have been developed. Although some signals are important for correct classification, others are not. The calculation, memory, and data collection costs increase when data that include unimportant signals for classification are taken as the CNN input layer. Therefore, identifying and eliminating non-important signals from the input layer are important. In this study, we proposed a features gradient-based signals selection algorithm (FG-SSA), which can be used for finding and removing non-important signals for classification by utilizing features gradient obtained by the process of gradient-weighted class activation mapping (grad-CAM). When we defined $n_ \mathrm{s}$ as the number of signals, the computational complexity of FG-SSA is the linear time $\mathcal{O}(n_ \mathrm{s})$ (i.e., it has a low calculation cost). We verified the effectiveness of the algorithm using the OPPORTUNITY dataset, which is an open dataset comprising of acceleration signals of human activities. In addition, we checked the average of 6.55 signals from a total of 15 signals (five triaxial sensors) that were removed by FG-SSA while maintaining high generalization scores of classification. Therefore, FG-SSA can find and remove signals that are not important for CNN-based classification. In the process of FG-SSA, the degree of influence of each signal on each class estimation is quantified. Therefore, it is possible to visually determine which signal is effective and which is not for class estimation. FG-SSA is a white-box signal selection algorithm because it can understand why the signal was selected. The existing method, Bayesian optimization, was also able to find superior signal sets, but the computational cost was approximately three times greater than that of FG-SSA. We consider FG-SSA to be a low-computational-cost algorithm.

Keywords:

Citation: Yuto Omae, Yusuke Sakai, Hirotaka Takahashi. Features gradient-based signals selection algorithm of linear complexity for convolutional neural networks[J]. AIMS Mathematics, 2024, 9(1): 792-817. doi: 10.3934/math.2024041

Related Papers:

[1]	Pei-Chang Guo . New regularization methods for convolutional kernel tensors. AIMS Mathematics, 2023, 8(11): 26188-26198. doi: 10.3934/math.20231335
[2]	Youseef Alotaibi, Veera Ankalu. Vuyyuru . Electroencephalogram based face emotion recognition using multimodal fusion and 1-D convolution neural network (ID-CNN) classifier. AIMS Mathematics, 2023, 8(10): 22984-23002. doi: 10.3934/math.20231169
[3]	Yuehong Zhang, Zhiying Li, Wangdong Jiang, Wei Liu . The stability of anti-periodic solutions for fractional-order inertial BAM neural networks with time-delays. AIMS Mathematics, 2023, 8(3): 6176-6190. doi: 10.3934/math.2023312
[4]	Sultanah M. Alshammari, Nofe A. Alganmi, Mohammed H. Ba-Aoum, Sami Saeed Binyamin, Abdullah AL-Malaise AL-Ghamdi, Mahmoud Ragab . Hybrid arithmetic optimization algorithm with deep learning model for secure Unmanned Aerial Vehicle networks. AIMS Mathematics, 2024, 9(3): 7131-7151. doi: 10.3934/math.2024348
[5]	Mashael Maashi, Mohammed Abdullah Al-Hagery, Mohammed Rizwanullah, Azza Elneil Osman . Deep convolutional neural network-based Leveraging Lion Swarm Optimizer for gesture recognition and classification. AIMS Mathematics, 2024, 9(4): 9380-9393. doi: 10.3934/math.2024457
[6]	Mohsen Bakouri, Abdullah Alqarni, Sultan Alanazi, Ahmad Alassaf, Ibrahim AlMohimeed, Mohamed Abdelkader Aboamer, Tareq Alqahtani . Robust dynamic control algorithm for uncertain powered wheelchairs based on sliding neural network approach. AIMS Mathematics, 2023, 8(11): 26821-26839. doi: 10.3934/math.20231373
[7]	Jianwei Jiao, Keqin Su . A new Sigma-Pi-Sigma neural network based on $L_1$ and $L_2$ regularization and applications. AIMS Mathematics, 2024, 9(3): 5995-6012. doi: 10.3934/math.2024293
[8]	Changgui Wu, Liang Zhao . Finite-time adaptive dynamic surface control for output feedback nonlinear systems with unmodeled dynamics and quantized input delays. AIMS Mathematics, 2024, 9(11): 31553-31580. doi: 10.3934/math.20241518
[9]	Ilyos Abdullayev, Elvir Akhmetshin, Irina Kosorukova, Elena Klochko, Woong Cho, Gyanendra Prasad Joshi . Modeling of extended osprey optimization algorithm with Bayesian neural network: An application on Fintech to predict financial crisis. AIMS Mathematics, 2024, 9(7): 17555-17577. doi: 10.3934/math.2024853
[10]	Hanan T. Halawani, Aisha M. Mashraqi, Yousef Asiri, Adwan A. Alanazi, Salem Alkhalaf, Gyanendra Prasad Joshi . Nature-Inspired Metaheuristic Algorithm with deep learning for Healthcare Data Analysis. AIMS Mathematics, 2024, 9(5): 12630-12649. doi: 10.3934/math.2024618

Abstract

1. Introduction

Recently, many convolutional neural networks (CNNs) have been developed for classification in the time domain of signals, e.g., recognizing movement-intention using electroencephalography (EEG) signals ^[1], human activity recognition using inertial signals ^[2,3], and swimming stroke detection using acceleration signals ^[4]. Although these studies used all signals, signals that are not important for correct classification may be included in the CNN input layer. These signals worsen the accuracy of the classification and increase calculation costs, required memory, and data collection costs. Therefore, finding and removing non-important signals from the CNN input layer and creating a classification model for the minimum signals possible are crucial. However, many CNNs use all signals ^[4,5,6] or manually select signals ^[7,8].

We can visually find unimportant regions using gradient-weighted class activation mapping (grad-CAM) embedded in CNNs. Grad-CAM ^[9] uses the gradient of the feature maps output from the convolutional layer immediately before the fully connected layer to visualize which regions on the image influenced class estimation in the form of a heat map. For example, Figure 1 (A) is the generation process of grad-CAM when the image of 2 is input to CNN. First, the gradient of the output vector with respect to the feature maps is computed. Then, the feature maps are weighted by the sum of the gradients and they are stacked. A heat map is generated by passing the weighted feature map through the ReLu function. Considering this, we can understand that the CNN used the upper side of the image to estimate "2". Grad-CAM was proposed by improving CAM ^[10]. CAM is the white-box method for only CNNs, which do not have fully connected layers. In contrast, Grad-CAM is applicable to various CNN model-families ^[9]. The method is primarily used to input image data, e.g., chest CT images ^[11], 3D positron emission tomography images ^[12] and chest X-ray images ^[13,14,15].

Figure 1. Schematic diagram of (A) image data-based grad-CAM and (B) multi signals-based grad-CAM. "C" and "FC" mean "convolution" and "fully-connected", respectively. The MNIST dataset is used for the figure (A). The red color represents high activation and the dark blue color represents low activation.

DownLoad: Full-Size Img PowerPoint

Cases of applying grad-CAM to CNNs that input time domain data of signals, as shown in Figure 1 (B), are continuously increasing. For example, the classification of sleep stages ^[16], prognostication of comatose patients after cardiac arrest ^[17], classification of schizophrenia and healthy participants ^[18], and classification of motor imagery ^[19] are performed using EEG signal(s). Electrocardiogram (ECG) signals are used to detect nocturnal hypoglycemia ^[20] and predict 1-year mortality ^[21]. Acceleration signals have been used to detect hemiplegic gait ^[22] and human activity recognition ^[23].

From these studies, we visually find unimportant signals for correct classification by applying grad-CAM to CNN, inputting the time domain of the signals. In the example shown in , signals $\mathrm{s}_1$ and $\mathrm{s}_n$ are strongly activated, but signal $\mathrm{s}_2$ is not. Therefore, $\mathrm{s}_2$ can be interpreted as not important for class estimation. However, observing all of them and finding unimportant signals is extremely difficult because one grad-CAM for one input data is generated. Therefore, we propose features gradient-based signals selection algorithm (FG-SSA), which can be used to find and remove unimportant signals from a CNN input layer by utilizing features gradient obtained by the calculation process of grad-CAM. The algorithm provides a signal subset consisting of only important signals. The computational complexity of the proposed algorithm is linear order $\mathcal{O}(n_ \mathrm{s})$ (i.e., FG-SSA has a low calculation cost) when we define $n_ \mathrm{s}$ as the number of all signals:

The academic contributions of this study are twofold, which are specified as follows.

● We propose a method for quantifying the effect of signals on classes estimation.

● We propose an algorithm to remove signals of low importance.

This algorithm can remove unnecessary signals and enable CNNs and datasets to be smaller in size.

2. Related works

The proposed algorithm, FG-SSA, is applied to CNNs that estimate class labels. It selects signals to be used in the input layer. This is similar to the feature selection problem of which features to use among all the features. The large number of patterns in feature combinations makes it difficult to find a globally optimal solution that maximizes verification performance for a subset of features. For example, a combination of 10 out of 100 features exceeds $10^{14}$ patterns. Therefore, the feature selection problem is known to be NP-hard ^[24,25]. For NP-hard problems, finding a good feasible solution is better than finding a global optimal solution ^[26]. Therefore, many methods for finding approximate solutions to the feature selection problem have been proposed, e.g., within-class variance between-class variance ratio ^[27,28], classification and regression tree ^[29], Lasso ^[30], ReliefF ^[31,32], out-of-bag error ^[33], and others. Recently, some studies have used Bayesian optimization for feature selection ^[34,35,36], which expresses whether to use a feature as a binary variable and evaluates the combination of features. These feature selection methods reduce the computation time of machine learning and improve the prediction accuracy ^[37,38].

The solution to the signal selection problem in CNNs also has a large number of patterns, which is obtained through several signal combinations. However, the feature selection algorithm described above cannot be used for signal selection problems and is not suitable for CNNs. Compared to other machine learning methods (support vector machine, decision tree, k-nearest neighbor, and others), CNNs require more computation time to construct a model. In addition, CNNs have many hyperparameters (number of layers, size of kernel filters, etc.), which also require computational resources for tuning. Therefore, instead of searching for an appropriate signals set, all of the prepared signals are often used such as ^[39,40]. Hence, a method that can find an approximate solution at a small computational cost is needed for the feature selection and signal selection problems.

Several studies have attempted to address the signal selection problem. Some studies applied Bayesian optimization to signal selection ^[41,42]. Because Bayesian optimization is a method for finding optimal solutions in black box functions, it can be applied to both feature selection and signal selection. However, it is not suitable for CNNs. Another approach is signal selection by group Lasso ^[43]. These signal selection algorithms are important research, but they are difficult to visually interpret because they do not utilize Grad-CAM. To the best of the authors' knowledge, only these previous studies involved signal selection. We feel that developing signal selection algorithms for CNNs is an important issue but has not been sufficiently studied. Therefore, we propose a signal selection algorithm, FG-SSA, that provides a visual interpretation of which signals contribute to which class estimation and is computationally inexpensive.

3. Proposed method

3.1. Initial signals set

We consider a situation in which the task is to estimate a class $c$ belonging to class set $\mathit{\boldsymbol{C}}$ defined by

$\begin{align} \mathit{\boldsymbol{C}} = \{c_i \mid i = 1, \cdots, n_{ \mathrm{cm}} \}, \end{align}$

(3.1)

where $n_ \mathrm{cm}$ is the number of classes. The measurement data of the initial signal set is defined by

$\begin{align} \mathit{\boldsymbol{S}} = \{s_i \mid i = 1, \cdots, n_ \mathrm{s} \}, \end{align}$

(3.2)

where $n_ \mathrm{s}$ denotes the number of signals and $s_i$ denotes the signal identification names. Some of the $n_ \mathrm{s}$ signals are important for classification, whereas others are not. Therefore, finding a signal subset $\mathit{\boldsymbol{S}}_ \mathrm{use}$ that has removed the unimportant signals from all signal sets $\mathit{\boldsymbol{S}}$ is crucial. In this study, we provide an algorithm for finding such a subset of signals $\mathit{\boldsymbol{S}}_ \mathrm{use} \subseteq \mathit{\boldsymbol{S}}$ . The applicable targets of the proposed method are all tasks of solving classification problems using CNNs by inputting the time domain of multisignals.

3.2. Class estimation

An example of CNN with multiple signals as input is shown in . This figure shows the CNN solving the problem of classifying 10 human motion labels from 15 acceleration signals mentioned in Subsection 4.2; however, the proposed algorithm can be applied to various problems. The data for the input layer are in the form of a matrix $w \times n_ \mathrm{s}$ . $w$ is the window length of the sliding window method ^[44] and $n_ \mathrm{s} = |\mathit{\boldsymbol{S}}|$ is the number of signals.

Figure 2. An example of CNN with multiple signals as input. This is the CNN using the initial signals set

$\mathit{\boldsymbol{S}}$ and classes set

$\mathit{\boldsymbol{C}}$ defined by Equations (4.1) and (4.2) described in Subsection 4.2. In other words, the CNN structure for 10 human activities recognition from 15 acceleration signals.

DownLoad: Full-Size Img PowerPoint

Subsequently, the input data are convoluted using kernel filters of the time-directional convolution size $s_ \mathrm{c}$ . The number of generated feature maps is $n_ \mathrm{f}$ because the number of filters is $n_ \mathrm{f}$ . The reason for adopting only time-directional convolution is to avoid mixing a signal and others in the convolution process. The input data are convoluted by these $n_ \mathrm{c}$ convolution layers, and the CNN generates the feature maps shown in "extracted feature maps" in . Subsequently, the CNN generates the output vector using fully connected layers and the SoftMax function. We define the output vector $\mathit{\boldsymbol{y}}$ as

$\begin{align} \mathit{\boldsymbol{y}} = [y_c] \in \mathbb{R}^{|\mathit{\boldsymbol{C}}|}, \ c \in \mathit{\boldsymbol{C}}, \ \sum\limits_{c \in \mathit{\boldsymbol{C}}} y_c = 1. \end{align}$

(3.3)

The estimation class $c^{ \prime}$ is

$\begin{align} c^{ \prime} = \mathop{{\rm{argmax}}}\limits_{c \in \mathit{\boldsymbol{C}}} \ \{y_c \mid c \in \mathit{\boldsymbol{C}}\}. \end{align}$

(3.4)

Herein, we only define the output vector $\mathit{\boldsymbol{y}}$ because it appears in the grad-CAM definition. We explain grad-CAM for time domain signals in the next subsection.

3.3. Time-directional grad-CAM

We defined the vertical size of the feature map before the fully connected layer as $s_ \mathrm{f}$ , as shown in . Let us denote the $k$ th feature map of signal $s$ as

$\begin{align} \mathit{\boldsymbol{f}}^{s, k} = [f^{s, k}_1 \ \ f^{s, k}_2 \ \ \cdots \ \ f^{s, k}_{s_ \mathrm{f}}]^{\top} \in \mathbb{R}^{s_ \mathrm{f}}, \ \ s \in \mathit{\boldsymbol{S}}, \ k \in \{1, \cdots, n_ \mathrm{f}\}. \end{align}$

(3.5)

In the case of CNNs for image-data-based classification, the form of feature maps is a matrix. However, in the case of CNNs consisting of time-directional convolution layers, the form of the feature maps is a vector. We defined the effect of the feature map $\mathit{\boldsymbol{f}}^{s, k}$ on the estimation class $c^{ \prime}$ as

$\begin{align} \alpha^{s, k}_{c^{ \prime}} = \frac{1}{s_ \mathrm{f}}\sum\limits_{j = 1}^{s_ \mathrm{f}}\frac{\partial y_{c^{ \prime}}}{\partial f^{s, k}_j}. \end{align}$

(3.6)

Then, we applied the grad-CAM of signal $s$ to estimation class $c^{ \prime}$ in the vector form of

$\begin{align} \mathit{\boldsymbol{Z}}^s_{c^{ \prime}} = \mathrm{ReLU}\Bigl(\frac{1}{n_ \mathrm{f}}\sum\limits_{k = 1}^{n_ \mathrm{f}} \alpha^{s, k}_{c^{ \prime}} \mathit{\boldsymbol{f}}^{s, k}\Bigr), \ \ \mathit{\boldsymbol{Z}}^s_{c^{ \prime}} \in \mathbb{R}^{s_ \mathrm{f}}_{\geq 0}. \end{align}$

(3.7)

The activated region of each signal can be understood by calculating $\mathit{\boldsymbol{Z}}^s_{c^{ \prime}}$ for all signals $\forall s \in \mathit{\boldsymbol{S}}$ .

In this study, we refer to $\mathit{\boldsymbol{Z}}^s_{c^{ \prime}}$ as "time-directional grad-CAM" because it is calculated by the summing partial differentiations of the time direction. This was defined by a minor change in the basic grad-CAM for image data ^[9]. In the case of the basic grad-CAM ^[9], the CNN generates one grad-CAM for one input data. In contrast, in the case of time-directional grad-CAM, CNN generates grad-CAMs as many as the number of signals $n_ \mathrm{s}$ for one input data.

shows the examples of $\mathit{\boldsymbol{Z}}^s_{c^{ \prime}}$ calculated using the CNN presented in and the dataset described in Subsection 4.2. From top to bottom, these results correspond to $c^ \prime = \text{Stand N}, \text{Stand L}, \text{Stand R}$ , and $\text{Walk N}$ . Notable, an estimation class $c^ \prime$ is an element of $\mathit{\boldsymbol{C}}$ . We can determine the signals that are important by viewing the time-directional grad-CAM shown in Figure 3. For example, signals from the left arm and left shoe are not used for estimating the Stand N class. Moreover, left arm signals are not used for the estimation of the Stand R class. Therefore, the time-directional grad-CAM is effective for finding unimportant signals.

Figure 3. Examples of time-directional grad-CAMs of 15 acceleration signals for four estimation classes (Stand N, Stand L, Stand R, and Walk N) calculated by the CNN illustrated in Figure 2 using the dataset described in Subsection 4.2. The red color represents high activation and the dark blue color represents low activation.

DownLoad: Full-Size Img PowerPoint

3.4. Signals importance index

Although we can find unimportant signals by viewing time-directional grad-CAM, the result varies for each input data. Therefore, viewing all the grad-CAMs is difficult when the data size is large. Herein, we quantify the importance of signal $s$ for classification based on $\alpha^{s, k}_{c^{ \prime}}$ defined in Eq (3.6).

Here, we denote the input dataset of size $n_\text{dat}$ as

$\begin{align} \mathit{\boldsymbol{X}} = \{\mathit{\boldsymbol{X}}_i \mid i = 1, \cdots, n_\text{dat}\}, \ \ \mathit{\boldsymbol{X}}_i \in \mathbb{R}^{w\times n_ \mathrm{s}}. \end{align}$

(3.8)

The size of $i$ th input data $\mathit{\boldsymbol{X}}_i$ is $w \times n_ \mathrm{s}$ , which is the window length $w$ and number of signals $n_ \mathrm{s}$ . When we define the input dataset of estimation class $c^ \prime \in \mathit{\boldsymbol{C}}$ as $\mathit{\boldsymbol{X}}_{c^ \prime}$ , we can represent the input dataset $\mathit{\boldsymbol{X}}$ as

$\begin{align} \mathit{\boldsymbol{X}} = \bigcup\limits_{c^ \prime \in \mathit{\boldsymbol{C}}} \mathit{\boldsymbol{X}}_{c^ \prime}. \end{align}$

(3.9)

Using the set $\mathit{\boldsymbol{X}}_{c^ \prime}$ , we define the importance of the signal $s \in \mathit{\boldsymbol{S}}$ to the estimation class $c^ \prime \in \mathit{\boldsymbol{C}}$ as

$\begin{align} L^{s}(\mathit{\boldsymbol{X}}_{c^ \prime}) = \frac{1}{|\mathit{\boldsymbol{X}}_{c^ \prime}|} \sum\limits_{\mathit{\boldsymbol{X}}_i \in \mathit{\boldsymbol{X}}_{c^ \prime}} g^s_{c^{ \prime}} (\mathit{\boldsymbol{X}}_i), \end{align}$

(3.10)

where

$\begin{align} g^s_{c^{ \prime}}(\mathit{\boldsymbol{X}}_i) = \frac{1}{n_ \mathrm{f}} \sum\limits_{k = 1}^{n_ \mathrm{f}} \beta^{s, k}_{c^{ \prime}}(\mathit{\boldsymbol{X}}_i), \ \ \beta^{s, k}_{c^{ \prime}}(\mathit{\boldsymbol{X}}_i) = \begin{cases} \alpha^{s, k}_{c^{ \prime}}(\mathit{\boldsymbol{X}}_i), \ \ \text{if } \alpha^{s, k}_{c^{ \prime}}(\mathit{\boldsymbol{X}}_i) \ge 0 \\ 0 , \ \ \text{otherwise} \end{cases}, \ \ \mathit{\boldsymbol{X}}_i \in \mathit{\boldsymbol{X}}_{c^ \prime}. \end{align}$

(3.11)

In addition, $\alpha^{s, k}_{c^{ \prime}}(\mathit{\boldsymbol{X}}_i)$ is $\alpha^{s, k}_{c^{ \prime}}$ of the input data $\mathit{\boldsymbol{X}}_i$ to a CNN, and $c^{ \prime}$ is the estimated class. Therefore, $L^{s}(\mathit{\boldsymbol{X}}_{c^ \prime})$ represents the importance of signal $s$ to class $c^{ \prime}$ based on the grad-CAM. Notably, we ignore terms with negative partial derivatives to extract the positive effect on classification, as shown in Eq (3.11). Moreover, using $L^{s}(\mathit{\boldsymbol{X}}_{c^ \prime})$ , we can obtain the matrix

$\begin{align} \mathit{\boldsymbol{I}}_\mathrm{mat}(\mathit{\boldsymbol{X}}_{c_1}, \mathit{\boldsymbol{X}}_{c_2}, \cdots, \mathit{\boldsymbol{X}}_{c_{n_\mathrm{cm}}}) & = \begin{bmatrix} L^{s_1}(\mathit{\boldsymbol{X}}_{c_1}) & L^{s_1}(\mathit{\boldsymbol{X}}_{c_2}) &\cdots & L^{s_1}(\mathit{\boldsymbol{X}}_{c_{n_\mathrm{cm}}}) \\ \vdots & \vdots & \ddots & \vdots \\ L^{s_{n_\mathrm{s}}}(\mathit{\boldsymbol{X}}_{c_1}) & L^{s_{n_\mathrm{s}}}(\mathit{\boldsymbol{X}}_{c_2}) & \cdots & L^{s_{n_\mathrm{s}}}(\mathit{\boldsymbol{X}}_{c_{n_\mathrm{cm}}}) \end{bmatrix} \\ & = [L^s(\mathit{\boldsymbol{X}}_{c^\prime})] \in \mathbb{R}^{|\mathit{\boldsymbol{S}}| \times |\mathit{\boldsymbol{C}}|}_{\ge 0}, \ \ s \in \mathit{\boldsymbol{S}}, c^\prime \in \mathit{\boldsymbol{C}}. \end{align}$

(3.12)

As shown in Eq (3.9), since $\mathit{\boldsymbol{X}} = \mathit{\boldsymbol{X}}_{c_1} \cup \mathit{\boldsymbol{X}}_{c_2} \cup \cdots \cup \mathit{\boldsymbol{X}}_{c_{n_\mathrm{cm}}}$ is satisfied, we define

$\begin{align} \mathit{\boldsymbol{I}}_\mathrm{mat}(\mathit{\boldsymbol{X}}) \stackrel{\mathrm{def}}{ = } \mathit{\boldsymbol{I}}_\mathrm{mat}(\mathit{\boldsymbol{X}}_{c_1}, \mathit{\boldsymbol{X}}_{c_2}, \cdots, \mathit{\boldsymbol{X}}_{c_{n_\mathrm{cm}}}). \end{align}$

(3.13)

We refer to $\mathit{\boldsymbol{I}}_ \mathrm{mat}(\mathit{\boldsymbol{X}})$ as the "signals importance matrix (SIM)" because the matrix comprises the importance of all signals and classes using the input dataset $\mathit{\boldsymbol{X}}$ . The effect of each signal to all classes can be understood by calculating and viewing SIM $\mathit{\boldsymbol{I}}_ \mathrm{mat}(\mathit{\boldsymbol{X}})$ .

Although $\mathit{\boldsymbol{I}}_ \mathrm{mat}(\mathit{\boldsymbol{X}})$ includes important information, summarizing the SIM is necessary to find signals that are unimportant to all classes. Therefore, by calculating the summation of the row values of SIM, we denote the importance of signal $s$ as

$\begin{align} I^s(\mathit{\boldsymbol{X}}) = \frac{1}{|\mathit{\boldsymbol{C}}|}\sum\limits_{c^ \prime \in \mathit{\boldsymbol{C}}} I^{s, c^ \prime}_ \mathrm{mat}(\mathit{\boldsymbol{X}}), \end{align}$

(3.14)

where $I^{s, c^ \prime}_ \mathrm{mat}(\mathit{\boldsymbol{X}})$ is the value of row $s$ and column $c^ \prime$ of $\mathit{\boldsymbol{I}}_ \mathrm{mat}(\mathit{\boldsymbol{X}})$ . By calculating $I^s(\mathit{\boldsymbol{X}})$ for all signals $s$ , we obtain the vector as

$\begin{align} \mathit{\boldsymbol{I}}_ \mathrm{vec}(\mathit{\boldsymbol{X}}) = [I^s(\mathit{\boldsymbol{X}})] \in \mathbb{R}^{|\mathit{\boldsymbol{S}}|}_{\ge 0}. \end{align}$

(3.15)

We refer to $\mathit{\boldsymbol{I}}_ \mathrm{vec}(\mathit{\boldsymbol{X}})$ as the "signals importance vector (SIV)" because it is the vector that consists of the signal importance.

The calculated examples of SIM and SIV are shown in Figure 4. This is the effect of the each individual signal on the class estimation by the CNN. These values were calculated using the CNN for Condition A described in Section 5. The result of SIM (columns 1 to 10 in Figure 4) includes various important data. First, we can confirm that the Right arm X signal is used for right hand movement (Stand R, Walk R, and Sit R) estimation. In the case of the left hand movement (Stand L, Walk L) estimation, the signals from the Back sensor responds strongly. Since the Back sensor is attached to the body trunk, it is assumed that the body trunk movement is used to estimate the movement of the left hand. The Left arm X responds to another left hand movement (Sit L) estimation. In addition, the Left shoe Y and Right shoe Y respond strongly to the motion estimation of Walk N. Based on these results, the SIM is considered reasonable. However, since some results are not comfortable, it is important to remove unnecessary signals by FG-SSA.

Figure 4. Example of signals importance to all classes expressed by SIM

$\mathit{\boldsymbol{I}}_ \mathrm{mat}(\mathit{\boldsymbol{X}})$ and SIV

$\mathit{\boldsymbol{I}}_ \mathrm{vec}(\mathit{\boldsymbol{X}})$ calculated to CNNs of Condition A described in Section 5. Columns 1 to 10 are SIM and column 11 is SIV ("All classes"). The values are standardized from zero to one in each column. The greater the cell's value, the greater is the relevance of signal

$s$ on estimation class

$c^ \prime$ .

DownLoad: Full-Size Img PowerPoint

Moreover, we can identify the signals that are important for all classes by viewing SIV (the column 11 in ). From SIV, we denote the minimum importance signal $s_ \mathrm{min}$ and maximum signal $s_ \mathrm{max}$ as

$\begin{align} (s_ \mathrm{min}, \ s_ \mathrm{max}) = \Bigl( \mathop{{\rm{argmin}}}\limits_{s \in \mathit{\boldsymbol{S}}} {I^s(\mathit{\boldsymbol{X}}}), \ \mathop{{\rm{argmax}}}\limits_{s \in \mathit{\boldsymbol{S}}} I^s(\mathit{\boldsymbol{X}}) \Bigr). \end{align}$

(3.16)

We expect to maintain the estimation accuracy even when removing $s_ \mathrm{min}$ because it is a minimum importance signal. In contrast, we expect the accuracy to decrease when we remove $s_ \mathrm{max}$ and re-learn the CNN because it is the most important signal.

Algorithm 1 Features gradient-based signals selection algorithm (FG-SSA)

Input:

Training dataset

$\mathit{\boldsymbol{X}}_ \mathrm{train}$

Validation dataset

$\mathit{\boldsymbol{X}}_ \mathrm{valid}$

Initial signals set

$\mathit{\boldsymbol{S}}$

Maximum number of using signals

$\gamma$

Output: Using signals set

$\mathit{\boldsymbol{S}}_ \mathrm{use}$

1: Initialization of a signals set

$\mathit{\boldsymbol{S}}_0 \leftarrow \mathit{\boldsymbol{S}}$

2: for

$t = 0$ to

$|\mathit{\boldsymbol{S}}| - 1$ do

3: Learning a CNN by the training dataset

$\mathit{\boldsymbol{X}}_ \mathrm{train}^{\mathit{\boldsymbol{S}}_t}$

4: Calculating validation accuracy

$A(\mathit{\boldsymbol{X}}_ \mathrm{valid}^{\mathit{\boldsymbol{S}}_t})$

5: Finding a minimum importance signal

$s_ \mathrm{min} \leftarrow \mathop{{\rm{argmin}}}\limits_{s \in \mathit{\boldsymbol{S}}_t} I^s(\mathit{\boldsymbol{X}}_ \mathrm{valid}^{\mathit{\boldsymbol{S}}_t})$ , see Equation (3.16)

6: Removing a signal:

$\mathit{\boldsymbol{S}}_{t+1} \leftarrow \mathit{\boldsymbol{S}}_t \setminus s_ \mathrm{min}$

7: end for

$\mathit{\boldsymbol{S}}_ \mathrm{use} \leftarrow \mathop{{\rm{argmax}}}\limits_{\mathit{\boldsymbol{S}}^ \prime \in \{\mathit{\boldsymbol{S}}_0, \cdots, \mathit{\boldsymbol{S}}_{|\mathit{\boldsymbol{S}}|-1}\} } A(\mathit{\boldsymbol{X}}_ \mathrm{valid}^{\mathit{\boldsymbol{S}}^ \prime}), \ \mathrm{ s.t., } \ |\mathit{\boldsymbol{S}}_ \mathrm{use}| \le \gamma$

9: return

$\mathit{\boldsymbol{S}}_ \mathrm{use}$

Notably,

$\boldsymbol{X}_\text{train}^{\boldsymbol{S}_t}$ and

$\boldsymbol{X}_\text{valid}^{\boldsymbol{S}_t}$ are all the input data belonging to the training and validation datasets using the signal set

$\boldsymbol{S}_t$ , respectively.

| Show Table

DownLoad: CSV

3.5. Signals selection algorithm

In this study, we proposed an algorithm to find a desirable signal subset $\mathit{\boldsymbol{S}}_ \mathrm{use} \subseteq \mathit{\boldsymbol{S}}$ by removing unimportant signals. The proposed method is presented in Algorithm 1. The main inputs to the algorithm are the training dataset $\mathit{\boldsymbol{X}}_ \mathrm{train}$ , validation dataset $\mathit{\boldsymbol{X}}_ \mathrm{valid}$ , and input initial signals set $\mathit{\boldsymbol{S}}$ . To avoid data leakage, the test dataset was not used in the algorithm. The elements of set $\mathit{\boldsymbol{S}}$ are the signal identification names defined in Eq (3.2). The first procedure is to create the initial signal set $\mathit{\boldsymbol{S}}_0$ , which consists of all signals (line 1). The next step is to develop a CNN using the training dataset $\mathit{\boldsymbol{X}}_ \mathrm{train}^{\mathit{\boldsymbol{S}}_0}$ , consisting of $\mathit{\boldsymbol{S}}_0$ (line 3). Subsequently, we measure the validation accuracy $A(\mathit{\boldsymbol{X}}_ \mathrm{valid}^{\mathit{\boldsymbol{S}}_0})$ using the validation dataset $\mathit{\boldsymbol{X}}_ \mathrm{valid}^{\mathit{\boldsymbol{S}}_0}$ (line 4). Next, we determine the most unimportant signal $s_ \mathrm{min} \in \mathit{\boldsymbol{S}}_0$ based on Eq (3.16) (line 5). Finally, we obtain the next signal set $\mathit{\boldsymbol{S}}_1$ by removing $s_ \mathrm{min}$ from $\mathit{\boldsymbol{S}}_0$ (line 6).

This procedure is repeated until the number of signals reaches one (i.e., $t = |\mathit{\boldsymbol{S}}| - 1$ ), and the validation accuracies are recorded. The algorithm returns the signal subset $\mathit{\boldsymbol{S}}_{ \mathrm{use}}$ leading to maximum validation accuracy (lines 8 and 9). Notably, the case wherein the signal subset leading to the maximum accuracy is the initial signal set can occur (i.e., $\mathit{\boldsymbol{S}}_{ \mathrm{use}} = \mathit{\boldsymbol{S}}$ ). In cases wherein the main purpose is to achieve maximum accuracy, adopting the initial signal set as an optimal subset is a better option, if $\mathit{\boldsymbol{S}}_{ \mathrm{use}} = \mathit{\boldsymbol{S}}$ occurs. However, some cases exist that require a decrease in the number of signals. Therefore, we prepared constraints wherein the number of adopted signal sizes is $\gamma$ or less (i.e., $|\mathit{\boldsymbol{S}}_{ \mathrm{use}}| \le \gamma$ ). This is a hyperparameter of the proposed algorithm.

When the size of the initial signal set $\mathit{\boldsymbol{S}}$ is $n_ \mathrm{s}$ , the number of developing CNNs in Algorithm 1 is as follows:

$\begin{align} T(n_ \mathrm{s}) & = n_ \mathrm{s}-1 \\ & \sim n_ \mathrm{s}. \end{align}$

(3.17)

That is, the computational complexity of Algorithm 1 is $\mathcal{O}(n_ \mathrm{s})$ . This means that even if the signal size $n_ \mathrm{s}$ increases, the signal subset is returned in realistic time.

Generally, the total pattern for choosing $m$ from $n$ signals is ${}_{n} \mathrm{C}_{m}$ . Because $m$ , which is the number of signals leading to the maximum validation accuracy, is not known, the total number of combinations of input layers $T_ \mathrm{bs}(n_ \mathrm{s})$ is similar to

$\begin{align} T_ \mathrm{bs}(n_ \mathrm{s}) & = \sum\limits_{m = 1}^{n_ \mathrm{s}} {}_{n_ \mathrm{s}} \mathrm{C}_{m} \\ & \sim \max\limits_{m} {}_{n_ \mathrm{s}} \mathrm{C}_{m} \\ & = {}_{n_ \mathrm{s}} \mathrm{C}_{\lfloor n_ \mathrm{s}/2 \rfloor}. \end{align}$

(3.18)

Therefore, finding the optimal signal subset using a brute-force search is difficult. The computation time tends to be large because the CNN includes other hyperparameters. From this viewpoint, a fast algorithm such as the proposed method is important.

4. Experiment 1: Relationship between validation accuracy and the number of deleted signals

4.1. Objective and outline

Herein, we examine the reliability of $s_ \mathrm{min, \ max}$ defined in Eq (3.16). We expect to maintain estimation accuracy even if the signal $s_ \mathrm{min}$ is removed because it is the most unimportant signal. In contrast, because signal $s_ \mathrm{max}$ is the most important signal, the estimation accuracy may decrease by removing it. We gradually removed signal $s_ \mathrm{min, \ max}$ using the processes of lines 2 to 7 of Algorithm 1 to verify the aforementioned hypothesis. In addition, we repeatedly developed CNNs and recorded their validation accuracies. We performed verification using a total of 100 seeds because the training process of CNN depends on randomness. The adopted layer structure of the CNN is explained in the Appendix section. The number of epochs was 300.

4.2. Dataset

Herein, we define the initial signals set $\mathit{\boldsymbol{S}}$ and the classes set $\mathit{\boldsymbol{C}}$ described in Subsection 3.1 concretely with real data. We used the dataset "OPPORTUNITY Activity Recognition". This is the dataset for activity recognition, and it is used in "Activity Recognition Challenge" held by IEEE in 2011 ^[45,46]. The dataset is regarded as reliable because it is used for the performance evaluation of machine learning in some studies ^[47,48]. The dataset contains data on multiple inertial measurement units, 3D acceleration sensors, ambient sensors, and 3D localization information. Four participants performed a natural execution of daily activities. The activity annotations include locomotions (e.g., sitting, standing, and walking) and left- and right-hand actions (e.g., reach, grasp, release, and etc.). Details of the dataset are described in ^[45,46].

In this study, we used five triaxial acceleration sensors from all the measurement devices (sampling frequency: 32 [Hz]). Attachment points of sensors on the human body are "back, " "right arm, " "left arm, " "right shoe, " and "left shoe." The total number of signals was 15 because we adopted five triaxial sensors ( $5 \times 3 = 15$ ). We adopted data splitting using the sliding window method with window length $w = 60$ and sliding length $60$ . One signal length was approximately 2 s since the sampling frequency was 32 [Hz]. Searching for the optimal window length size is important because it is a hyperparameter that affects the estimation accuracy ^[44,49,50]. However, we did not tune the window length size because the aim of this study was to provide a signal-selection algorithm.

Subsequently, we labeled the motion class label for each dataset based on human activity. The class labels are combinations of locomotions ("Stand, " "Walk, " "Sit, " and "Lie") and three hands activity ("R": moving right hand, "L": moving left hand, and "N": not moving hands). For example, the class label "Sit R" refers to the sitting and right-hand motions.

Table 1 lists the results of applying the described procedures to all the data. This indicates that the data size of the left-hand motion is small. Nearly all participants were right-handed (notably, we could not find a description of the participants' dominant arms in the explanation of the OPPORTUNITY dataset). Moreover, data belonging to Lie R and L are absent. Therefore, these classes were removed from the estimation task (i.e., the total number of classes was 10).

Table 1. Samples size of each class label.

Labels	Samples	Rates [%]
Stand N	1070	20.68
Stand L	193	3.73
Stand R	642	12.41
Walk N	1405	27.16
Walk L	63	1.22
Walk R	150	2.90
Sit N	571	11.04
Sit L	130	2.51
Sit R	536	10.36
Lie N	414	8.00
Lie L	0	0.00
Lie R	0	0.00
Total	5174	100.00

| Show Table

DownLoad: CSV

Subsequently, we randomly split all the data into a training dataset (80%) and a test dataset (20%). Moreover, we assigned 20% of the training dataset to the validation dataset. The training, validation, and test datasets were independent because we adopted the sliding window method for the same window and slide length (60 steps). We defined the signal set $\mathit{\boldsymbol{S}}$ , which has 15 elements, and the class set $\mathit{\boldsymbol{C}}$ , which has 10 elements, as follows:

$\begin{align} \mathit{\boldsymbol{S}} = \{& \mathrm{Back\ X}, \mathrm{Back\ Y}, \mathrm{Back\ Z}, \mathrm{Right\ arm\ X}, \mathrm{Right\ arm\ Y}, \mathrm{Right\ arm\ Z}, \mathrm{Left\ arm\ X}, \mathrm{Left\ arm\ Y}, \\ & \mathrm{Left\ arm\ Z}, \mathrm{Right\ shoe\ X}, \mathrm{Right\ shoe\ Y}, \mathrm{Right\ shoe\ Z}, \mathrm{Left\ shoe\ X}, \mathrm{Left\ shoe\ Y}, \mathrm{Left\ shoe\ Z} \}, \end{align}$

(4.1)

$\begin{align} \mathit{\boldsymbol{C}} = \{& \mathrm{Stand\ N}, \mathrm{Stand\ L}, \mathrm{Stand\ R}, \mathrm{Walk\ N}, \mathrm{Walk\ L}, \mathrm{Walk\ R}, \mathrm{Sit\ N}, \mathrm{Sit\ L}, \mathrm{Sit\ R}, \mathrm{Lie\ N}\}. \end{align}$

(4.2)

In other words, CNNs solve 10 classification problems from 15 multisignals. Moreover, an algorithm for removing unimportant signals was provided while maintaining the estimation accuracy. Although the sets $\mathit{\boldsymbol{S}}$ and $\mathit{\boldsymbol{C}}$ represent acceleration signals and human activities, respectively, the proposed algorithm can be used for other diverse signals, such as EEG and ECG.

4.3. Result and discussion

First, we indicate the validation accuracies when the most unimportant signal, $s_ \mathrm{min}$ , is gradually removed, as shown in . From left to right in , the number of deleted signals increases (i.e., the number of signals in the CNN input layer decreases). The leftmost and rightmost results are obtained using all signals and only one signal as the CNN input layer, respectively. The results show that even if six signals were deleted, the validation accuracy did not decrease. Moreover, the accuracy significantly decreases by removing seven or more signals. In this case, although CNNs estimate class labels from 15 signals in the set $\mathit{\boldsymbol{S}}$ , six signals are unnecessary.

Figure 5. (A) Maximum validation accuracy in 300 epochs when the most unimportant signal

$s_ \mathrm{min}$ is gradually removed and CNNs are re-learned. (B) Average removed timings of each signal. It means the smaller the timing, the earlier the signal is removed. The results of both (A) and (B) are average values of 100 seeds, and the error bars are standard deviations. The

$p$ values in (A) represent the results of two-sided

$t$ test, and the dashed line represents the timing of statistically decreasing validation accuracy.

DownLoad: Full-Size Img PowerPoint

We calculated the average removed timings of the 15 signals to determine the unnecessary signals. Figure 5 (B) shows the results. This means that the lower the value, the earlier the signal is removed in the procedure of Algorithm 1. The results indicate that the signals of the right and left shoes are removed earlier than those of the other sensors. In contrast, some signals from the back, right arm, and left arm did not disappear early. Therefore, we can regard shoe signals as unimportant for classification. In addition, shoe sensors are important for walk motion classification. However, even if shoe sensors disappear, we consider that the back sensor attached to centroids of the human body contributes to the classification of walk motions because these motions are periodic.

Then, we indicate the validation accuracies when the most important signal, $s_ \mathrm{max}$ , is gradually removed in . We can verify that the validation accuracy statistically decreases by removing one of the most important signals. Therefore, removing the most important signal, $s_ \mathrm{max}$ , leads to a worse accuracy. The average timing of the signal removal is shown in . From this figure, we confirm that the signals of back X, right arm X, and left arm Y are removed early. Moreover, we confirm that shoe sensor signals are not removed early. These tendencies are in contrast compared with the case of removing the most unimportant signal $s_ \mathrm{min}$ .

Figure 6. (A) Maximum validation accuracy in 300 epochs when the most important signal

$s_ \mathrm{max}$ is gradually removed and CNNs are re-learned. (B) Average removed timings of each signal. In contrast to that shown in Figure 5, the most important signals are removed.

DownLoad: Full-Size Img PowerPoint

Clearly, (1) even if we remove the most unimportant signal $s_ \mathrm{min}$ , the estimation performance remains, and (2) the performance decreases when we remove the most important signal $s_ \mathrm{max}$ . Therefore, the signal importance $I^s(\mathit{\boldsymbol{X}})$ defined in Eq (3.14) is reliable.

5. Experiment 2: Effectiveness of FG-SSA on generalization scores

5.1. Objective and outline

Herein, we confirmed the effect of FG-SSA indicated in Algorithm 1 on the generalization performance of the classification. Therefore, we developed the following three conditions for CNNs:

● Condition A: CNN using all signals (i.e., FG-SSA is not used).

● Condition B: CNN for applied FG-SSA of $\gamma = n_ \mathrm{s}$ .

● Condition C: CNN for applied FG-SSA of $\gamma = 9$ .

Condition A means that the CNN does not remove signals but uses all signal sets $\mathit{\boldsymbol{S}}$ . Condition B refers to the CNN using the signal subset $\mathit{\boldsymbol{S}}_ \mathrm{use}$ obtained by the FG-SSA, given $n_ \mathrm{s}$ as the constraint parameter $\gamma$ . Under this condition, we allow the algorithm to return $\mathit{\boldsymbol{S}}$ as $\mathit{\boldsymbol{S}}_ \mathrm{use}$ when the maximum validation accuracy is achieved using all signal sets $\mathit{\boldsymbol{S}}$ . In other words, a case wherein no signals are removed can occur. Condition C implies that the number of adopted signals is nine or less for a CNN input layer (i.e., $|\mathit{\boldsymbol{S}}_ \mathrm{use}| \le 9$ ). In other words, six or more signals were deleted because the initial number of signals $n_ \mathrm{s}$ was 15. The value of hyper-parameter $\gamma = 9$ was determined by referring to the result shown in Figure 5 (A).

We developed CNNs using the signal set, which was determined by FG-SSA. Moreover, the optimal epochs leading to a maximum validation accuracy were adopted. The search range of epochs was from 1 to 300. Subsequently, the generalization performance is measured using the test dataset. The test dataset was used only at this time (i.e., it was not used for the parameter search). We developed CNNs for Conditions A, B, and C with a total of 100 seeds to remove the randomness effect.

5.2. Result and discussion

By developing CNNs under Conditions A, B, and C on a total of 100 seeds, the average number of signals used was 15.00, 11.94, and 8.35, respectively. In other words, an average of 3.06 and 6.65 signals were removed by FG-SSA in the cases of Conditions B and C, respectively. The generalization performance (F score, precision, and recall) measured by the test dataset for each condition is shown in Figure 7. These are histograms of 100 CNNs developed using 100 random seeds.

Figure 7. Histograms of estimation score using the test dataset (total 100 seeds). The macro averages of 10 classes of F score, precision, and recall from left to right, and Conditions A, B, and C from top to bottom.

DownLoad: Full-Size Img PowerPoint

The results indicated that the generalization scores were nearly the same for Conditions A, B, and C. Moreover, $p$ values of the two-sided $t$ -test were not statistically significant. The confusion matrices obtained by CNNs for each condition are shown in Figure 8. The values were averaged by 100 seed results and standardized, where the summation in each row was 1. Consequently, the confusion matrices were nearly the same. Although the proposed algorithm removed some signals, generalization errors did not increase. Therefore, we consider that FG-SSA can find and remove signals that are not important for CNN-based classification.

Figure 8. Confusion matrices of the Conditions A, B, and C using the test dataset. The values were averaged by 100 seed results and standardized, where the summation in each row was 1.

DownLoad: Full-Size Img PowerPoint

From another viewpoint, the number of correct classifications of left hand motions (Stand L, Walk L, and Sit L) was small in all conditions. We adopted weighted cross-entropy-based learning for CNNs because the original data size of the left hand motions is small, as shown in Table 1. However, we consider that an appropriate classification cannot be performed because the data diversity of these motions is low. Although we believe that nearly all participants are right-handed, the explanation of participant's dominant arm in the OPPORTUNITY dataset is insufficient to the best of our knowledge.

Note that this study also validates FG-SSA on a dataset other than OPPORTUNITY. This dataset is for an experiment that classifies swimming styles with signals measured from a single inertial sensor. The details of this experiment and the results obtained are presented in Appendix B. The experimental results also indicate that FG-SSA is effective at signal selection.

6. Comparison of FG-SSA and other methods

6.1. Objective and outline

From the described results, it was confirmed that signals can be removed without degrading the generalization performance using FG-SSA. In this section, we compare the generalization performance obtained by signal subsets selected by Condition C mentioned in Section 5 and signal subsets obtained by other signal selection methods. Then, we evaluate the effectiveness of the proposed method. Since Condition C has a problem of selecting a maximum of nine signals from 15 signals, this condition was also adopted in existing methods.

Bayesian optimization (BO) and random search (RS) were adopted as existing methods. BO is a search algorithm that solves combinatorial optimization problems and is sometimes applied to hyperparameter tuning in machine learning and deep learning ^[51,52]. In this approach, the next hyperparameters to be verified are determined based on validation accuracies obtained with the previously set hyperparameters. This can be applied to feature selection problems in classification problems ^[34,35,36]. Feature selection is a method of selecting features to be used for classification from all features prepared in advance. This is similar to the method of choosing signal subset from all signals. Therefore, BO can also be applied to signal selection and has been applied in several studies ^[41,42]. Therefore, in this study, we adopted BO as an existing method for comparison with FG-SSA.

As shown in Eq (3.18), when selecting some signals from $n_ \mathrm{s}$ signals, the number of selectable combinations becomes enormous. In contrast, as shown in Algorithm 1, FG-SSA can obtain the signal subset to be used only by constructing $n_ \mathrm{s}$ CNNs. On the other hand, the greater the number of iterations for BO, the higher the possibility of discovering a desirable signal subset. However, in comparing FG-SSA and BO, it is fairer to set the number of CNN constructions to be the same. Therefore, we set the number of iterations of BO to $n_ \mathrm{s}$ . Note that $n_ \mathrm{s}$ is 15 in this experiment, as mentioned in Section 5. We adopted tree-structured Parzen estimator (TPE) ^[53] as a specific method of BO and Optuna ^[54] as a framework for implementation. TPE has the property of trying to obtain the desired solution with a small number of iterations ^[55].

Moreover, RS was adopted as a baseline. This is a method that repeats the process of randomly selecting signals and constructing a CNN for a specified number of times and uses the signal subset with the highest validation accuracy. This time, we decided to repeat the process of selecting up to nine signals from 15 signals $n_ \mathrm{s}$ times. In other words, the numbers of constructing CNNs for RS, BO, and FG-SSA are the same.

The details of the process of existing methods are described in Algorithm 2. The inputs for this algorithm are the training dataset $\mathit{\boldsymbol{X}}_ \mathrm{train}$ , validation dataset $\mathit{\boldsymbol{X}}_ \mathrm{valid}$ , initial signals set $\mathit{\boldsymbol{S}}$ defined in Eq (4.1), maximum number of selecting signals $S_ \mathrm{max}$ , number of iterations $T$ , and searching method $M \in \{ \mathrm{BO}, \mathrm{RS}\}$ . BO means the TPE-based Bayesian optimization and RS means random search. In Algorithm 2, the signal subset $\mathit{\boldsymbol{S}}_t$ consisting of $S_ \mathrm{max}$ signals is extracted by "SignalsSelection" function (line 2 in Algorithm 2, BO or RS method). We developed a CNN using the training dataset $\mathit{\boldsymbol{X}}_ \mathrm{train}^{\mathit{\boldsymbol{S}}_t}$ consisting of the signals subset $\mathit{\boldsymbol{S}}_t$ and measured the validation accuracy $A(\mathit{\boldsymbol{X}}_ \mathrm{valid}^{\mathit{\boldsymbol{S}}_t})$ using the validation dataset $\mathit{\boldsymbol{X}}_ \mathrm{valid}^{\mathit{\boldsymbol{S}}_t}$ . We repeated it from $t = 1$ to $t = T$ (i.e., $T$ times) and selected the signal subset leading to the maximum validation accuracy. This subset is $\mathit{\boldsymbol{S}}_ \mathrm{use} \subseteq \mathit{\boldsymbol{S}}$ and it is returned by Algorithm 2 to users. Then, we developed a CNN using the signal subset $\mathit{\boldsymbol{S}}_ \mathrm{use}$ and measured the generalization scores (F score, precision, and recall) using the test dataset. The layer structure and learning condition of the CNN are the same as the previous experiment described in Section 5.

In general, for both BO and RS, better signal subsets can be found as the number of iterations $T$ increases. However, CNN requires a long time to build; thus, it is not desirable to increase $T$ . As shown in Eq (3.17), the strength of FG-SSA is that it is a fast algorithm. Therefore, we decided to make the number of iterations $T$ of BO and RS and the maximum number of selected signals $S_ \mathrm{max}$ the same as those of FG-SSA (Section 5, Condition C). In other words, we set $T = 15, S_ \mathrm{max} = 9$ . We ran Algorithm 2 100 times while changing the random seed to remove the effect of randomness (FG-SSA in Condition C is also the same).

Algorithm 2 Existing signals selection algorithm

Input:

Training dataset

$\mathit{\boldsymbol{X}}_ \mathrm{train}$

Validation dataset

$\mathit{\boldsymbol{X}}_ \mathrm{valid}$

Initial signals set

$\mathit{\boldsymbol{S}}$

Maximum number of using signals

$S_ \mathrm{max}$

Number of iterations

$T$ Search method

$M \in \{ \mathrm{BO}, \mathrm{RS} \}$

Output: Using signals set

$\mathit{\boldsymbol{S}}_ \mathrm{use}$

1: for

$t = 1$ to

$T$ do

$\mathit{\boldsymbol{S}}_t$

$\leftarrow$ SignalsSelection(

$\mathit{\boldsymbol{S}}$ ,

$S_ \mathrm{max}$ ,

$M$ )

3: Learning a CNN by the training dataset

$\mathit{\boldsymbol{X}}_ \mathrm{train}^{\mathit{\boldsymbol{S}}_t}$

4: Calculating validation accuracy

$A(\mathit{\boldsymbol{X}}_ \mathrm{valid}^{\mathit{\boldsymbol{S}}_t})$

5: end for

$\mathit{\boldsymbol{S}}_ \mathrm{use} \leftarrow \mathop{{\rm{argmax}}}\limits_{\mathit{\boldsymbol{S}}^ \prime \in \{\mathit{\boldsymbol{S}}_1, \cdots, \mathit{\boldsymbol{S}}_{T}\} } A(\mathit{\boldsymbol{X}}_ \mathrm{valid}^{\mathit{\boldsymbol{S}}^ \prime})$

7: return

$\mathit{\boldsymbol{S}}_ \mathrm{use}$

Note: "SignalsSelection(

$\boldsymbol{S}$ ,

$S_{\max}$ ,

$M$ )" selects and returns subset

$\boldsymbol{S}_t \subseteq \boldsymbol{S}$ consisting of

$S_{\max}$ signals from the initial signals set

$\boldsymbol{S}$ based on method

$M$ . When

$M$ is "Bayesian optimization (BO)", tree-structured Parzen estimator is used for signals selection. When

$M$ is "random search (RS)", signals are randomly selected.

6.2. Result and discussion

The generalization scores of FG-SSA, BO, and RS measured by the test dataset are shown in . The results consist of F score, precision, and recall of 10 classes belonging to the class set $\mathit{\boldsymbol{C}}$ defined by Eq (4.2). The bold and underlined numbers indicate the highest and lowest scores, respectively. The values in the cells indicate the average scores of 100 runs, and $p$ values are the result of the two-sided $t$ test between FG-SSA and BO.

Table 2. Estimation scores of FG-SSA, BO, and RS obtained by the test dataset (average results of 100 seeds).

$p$ values mean the two-sided

$t$ test between FG-SSA and BO. The bold numbers mean the highest values and the underlined numbers mean the lowest values.

	F score				Precision				Recall
Class	FG-SSA	BO	RS	$p$	FG-SSA	BO	RS	$p$	FG-SSA	BO	RS	$p$
Stand N	.671	.668	.642	n.s.	.721	.702	.701	*	.635	.643	.600	n.s.
Stand L	.281	.260	.235	*	.282	.280	.244	n.s.	.299	.259	.252	**
Stand R	.597	.585	.530	*	.560	.549	.503	n.s.	.653	.638	.577	n.s.
Walk N	.859	.852	.838	**	.882	.880	.847	n.s.	.840	.828	.832	*
Walk L	.247	.233	.184	n.s.	.259	.234	.192	n.s.	.254	.244	.195	n.s.
Walk R	.402	.380	.289	*	.345	.322	.257	**	.501	.483	.357	n.s.
Sit N	.586	.594	.534	n.s.	.679	.686	.644	n.s.	.531	.536	.476	n.s.
Sit L	.349	.352	.312	n.s.	.426	.427	.373	n.s.	.322	.319	.302	n.s.
Sit R	.686	.688	.657	n.s.	.627	.619	.593	n.s.	.772	.785	.756	n.s.
Lie N	.989	.983	.975	**	.988	.981	.973	**	.990	.986	.976	**
Macro ave.	.567	.560	.520	*	.577	.568	.533	**	.580	.572	.532	*
*: $p < .01$ , : $p < .05$ , n.s.: otherwise (FG-SSA vs. BO)

| Show Table

DownLoad: CSV

Based on Table 2, the baseline RS has the lowest score. A possible reason for this is that random signal selection is not strategic in terms of improving the generalization performance. In addition, FG-SSA has obtained better scores in more classes than BO. Although some results have better scores with BO, there is no significant difference in comparison with FG-SSA. Additionally, looking at the macro averages, we confirmed that FG-SSA scores higher than BO and RS in all three indicators. Considering this, it can be interpreted that FG-SSA can discover signal subsets leading to higher generalization performance than BO and RS. Therefore, FG-SSA is an excellent method from the viewpoint of performing good signal selection with a small number of iterations.

Since the number of CNN constructions in FG-SSA is 15, the number of Bayesian optimization iterations was set to 15 in the previous analyses. As shown in Table 2, FG-SSA demonstrated higher estimation performance than BO. However, by increasing the number of iterations of BO, it may be possible to find a signal set that yields scores similar to those of FG-SSA. To test this hypothesis, the generalization performance was measured when the number of BO iterations was 15, 30, and 45 (BO15, BO30, and BO45). Table 3 shows the obtained results. It can be seen that the generalization performance increases slightly as the number of BO iterations increases. When comparing FG-SSA and BO15, FG-SSA performed significantly higher; however, when comparing FG-SSA and BO45, no significant differences were observed for all indices. Therefore, the two signal sets obtained with BO45 and FG-SSA could be similar. Conversely, the computation time for BO45 was approximately three times longer than that for FG-SSA. Therefore, FG-SSA can find a good signals set at a lower computational cost than BO, which is an existing method. Note that the computational environment used in this experiment was OS: Ubuntu 20.04 LTS, CPU: Xeon W-2225 (4.1 GHz), RAM: 32 GB DDR4-2933, GPU: NVIDIA RTX A6000 (48 GB).

Table 3. Estimation scores of FG-SSA, BO15, BO30, and BO45 obtained by the test dataset and the corresponding calculation times (average and standard deviation of 100 seeds). The bold numbers indicate the highest values, and the underlined numbers indicate the lowest values.

$p$ values are obtained from the two-sided

$t$ test between FG-SSA and BO.

	Macro F score		Macro precision		Macro recall		Calculation time [sec]
	Ave.	$p$	Ave.	$p$	Ave.	$p$	Calculation time [sec]
FG-SSA	.567 $\pm$ .021	-	.577 $\pm$ .022	-	.580 $\pm$ .022	-	$2179.53 \pm 8.37$
BO15	.560 $\pm$ .023	*	.568 $\pm$ .024	**	.572 $\pm$ .025	*	$2024.54 \pm 9.27$
BO30	.563 $\pm$ .020	n.s.	.572 $\pm$ .022	n.s.	.573 $\pm$ .022	*	$4101.60 \pm 29.97$
BO45	.564 $\pm$ .020	n.s.	.572 $\pm$ .022	n.s.	.574 $\pm$ .021	n.s.	$6174.26 \pm 41.47$
*: $p < .01$ , : $p < .05$ , n.s.: otherwise (FG-SSA vs. other methods)

| Show Table

DownLoad: CSV

7. Conclusion, limitations, and future works

In this study, we described the following topics to find and remove unimportant signals for CNN-based classification:

● (1) The signal importance indices SIM $\mathit{\boldsymbol{I}}_ \mathrm{mat}(\mathit{\boldsymbol{X}})$ and SIV $\mathit{\boldsymbol{I}}_ \mathrm{vec}(\mathit{\boldsymbol{X}})$ are explained in Subsection 3.4.

● (2) The algorithm of the linear complexity $\mathcal{O}(n_ \mathrm{s})$ for obtaining the signals subset $\mathit{\boldsymbol{S}}_ \mathrm{use}$ from the initial signals set $\mathit{\boldsymbol{S}}$ by finding and removing unimportant signals (see Algorithm 1).

Although the proposed algorithm performed well in the case of the OPPORTUNITY dataset, it had some limitations. This is explained in the following sections. Future work will confirm these findings.

● (A) The results described in this paper were obtained from limited cases. Although we assume that the algorithm can be applied to data other than acceleration signals (e.g., EEG and ECG), it is not validated.

● (B) In this experiment, the initial signals set size was 15. The number of signals may be much higher depending on some situations. It is important to determine the relationship between the number of unimportant signals and the effectiveness of FG-SSA.

● (C) We assume that the algorithm may extract the signals' subset weighted specific target class by providing class weights to the signal importance $I^{s}(\mathit{\boldsymbol{X}})$ defined in Eq (3.14). In other words, we denote the importance of signal $s$ weighted classes as

$\begin{align} I^s(\mathit{\boldsymbol{X}}; \mathit{\boldsymbol{w}}) = \frac{1}{|\mathit{\boldsymbol{C}}|}\sum\limits_{c^ \prime \in \mathit{\boldsymbol{C}}} w^{c^ \prime} I^{s, c^ \prime}_ \mathrm{mat}(\mathit{\boldsymbol{X}}), \end{align}$

(7.1)

where $w^{c^ \prime}$ is the weight of class $c^ \prime$ and $\mathit{\boldsymbol{w}}$ is the class weight vector, which is defined as

$\begin{align} \mathit{\boldsymbol{w}} & = [w^{c^ \prime}] \in \mathbb{R}^{|\mathit{\boldsymbol{C}}|}, \text{ s.t., } \sum\limits_{c^ \prime \in \mathit{\boldsymbol{C}}} w^{c^ \prime} = 1. \end{align}$

(7.2)

We note that the algorithm based on $I^s(\mathit{\boldsymbol{X}}; \mathit{\boldsymbol{w}})$ can return the signals' subset weighted specific classes. However, this effect is not confirmed in this study. This will be studied in the future.

A. Appendix: Layers structure of CNNs

The CNN's layer structure is illustrated in . First, an input layer exists for inputting $w \times n_ \mathrm{s}$ size data ( $w$ : window length and $n_ \mathrm{s}$ : the number of signals). Then, $n_ \mathrm{c} = 3$ convolution layers exist to generate $n_ \mathrm{f}$ feature maps using kernel filters of convolution size $s_ \mathrm{c}$ . The convolutions are only in the direction of time. Subsequently, the generated feature maps are transformed into a vector form by the flattened layer. Vector dimensions are gradually reduced to 200, 100, and 50. Finally, an output vector $\mathit{\boldsymbol{y}}$ of 10 dimensions is generated, and classification is performed. The total number of CNN layers is nine (one input layer, three convolution layers, four dense layers, and one output layer). The activation function of each layer is ReLU.

We performed a hyperparameter search for convolution size $s_ \mathrm{c}$ , the number of kernel filters $n_ \mathrm{f}$ , and learning rate $r$ . Particularly, CNNs were trained using the training and validation datasets described in Subsection 3.1 and 300 epochs. We adopted a weighted cross-entropy-based loss function based on the inverse values of class sample sizes because the sample sizes of each class are imbalanced, as shown in Table 1.

The parameter candidates are as follows:

$\begin{align} s_ \mathrm{c} \in \{5, 10, 15 \}, \ \ n_ \mathrm{f} \in \{5, 10\}, \ \ r \in \{10^{-3}, 10^{-4}\}. \end{align}$

(8.1)

The combination of hyperparameters leading to the maximum validation accuracy was $(s_ \mathrm{c}, n_ \mathrm{f}, r) = (10, 10, 10^{-4})$ (accuracy: 0.726). Therefore, the adopted CNN architecture is as follows:

● Layer 1: Conv. layer, the kernel filter: $(s_ \mathrm{c}, n_ \mathrm{f}) = (10, 10)$

● Layer 2: Conv. layer, the kernel filter: $(s_ \mathrm{c}, n_ \mathrm{f}) = (10, 10)$

● Layer 3: Conv. layer, the kernel filter: $(s_ \mathrm{c}, n_ \mathrm{f}) = (10, 10)$

● Layer 4: Flatten layer based on the final conv. layer's output size

● Layer 5: Dense layer of 200 dimensions

● Layer 6: Dense layer of 100 dimensions

● Layer 7: Dense layer of 50 dimensions

● Layer 8: Classification layer

● Note: the activation function for all layers except the classification layer is ReLU.

We adopted this architecture for all CNNs used in this study. Although CNNs have many other hyperparameters, we avoided excessive tuning because the aim of this study was to provide a solution to the signal selection problem.

B. Appendix: Effectiveness of FG-SSA on swimming style classification

B.1. Outline

The effectiveness of the proposed method should be validated by multiple datasets, not only by a single dataset. Therefore, in this study, FG-SSA was applied to the task of automatically classifying swimming style (backstroke, breaststroke, butterfly, and front crawl) based on signals measured by an inertial sensor and CNN. Swimming style classification with inertial sensors is a typical research topic in sports engineering (e.g., ^[56,57]). The subjects were 16 Japanese university students who selected two swimming styles from four swimming styles. The pool had a length of 25 m, and each swimmer made one round trip of swimming (total 50 m). A sensor was attached to the lower back of the swimmers, and it measured the tri-axial acceleration and angular velocity (six signals in total). The sampling frequency of the inertial sensor was 100 Hz, the acceleration range was $\pm 5$ G, and the angular velocity range was $\pm 1500$ dps. All signals were standardized to a minimum of 0 and a maximum of 1. To verify whether FG-SSA can eliminate unnecessary signals and select only important signals, nine meaningless signals generated by uniform random numbers in the 0 to 1 range were prepared. Considering the nine meaningless signals, the total number of signals was $3+3+9 = 15$ .

B.2. Preprocessing

Of the 32 swims (16 subjects $\times$ 2 swims), there was a measurement error (sensor fell off while swims) in 5 swims. Therefore, usable data were available for 27 swimmings. The sliding window method was applied to the data with the window length and sliding length set to 1 sec. Since window length and sliding length are of the same size, the cut waveforms did not overlap. This process resulted in sample sizes of 91, 145, 332, and 205 for backstroke, breaststroke, butterfly, and front crawl, respectively. Of the available data, 80 % was used as training data and 20 % as test data. In addition, 20 % of the training data was used as validation data. Since the data splitting is random, the performance may be higher (or lower) than the average tendency due to random effects. To obtain reliable results, the experiment was run with a total of 100 random seeds.

B.3. Result and discussion

Figure 9 (a) shows the maximum validation accuracies in 300 epochs when FG-SSA is applied to the aforementioned task and unnecessary signals are removed one by one. In the initial half of FG-SSA, the estimation performance improves as more unnecessary signals are deleted, reaching a maximum when the number of deletions is 9. Then, in the later half of FG-SSA, the estimation performance degrades. This indicates that the estimation performance is worse when there are unnecessary signals in the input layer or when important signals are removed. Figure 9 (b) shows the deletion timing of each signal. Randomly generated meaningless signals are removed early in the algorithm, and the tri-axial acceleration and angular velocity signals are removed later. This result indicates that FG-SSA can efficiently find and remove only meaningless signals while keeping important signals.

Figure 9. (A) Maximum validation accuracy in 300 epochs when the most unimportant signal

$s_ \mathrm{min}$ is gradually removed by FG-SSA and CNNs are re-learned. (B) Average removed timings of each signal. The results of both (A) and (B) are average values of 100 seeds, and the error bars present standard deviations. The

$p$ values in (A) represent the results of two-sided

$t$ test.

DownLoad: Full-Size Img PowerPoint

The number of signals deleted by processing line 8 of Algorithm 1 was $9.66 \pm 1.88$ signals (average and standard deviation of the results of 100 seeds). In other words, the number of signals selected as the input layer of the CNN by FG-SSA was $15 - 9.66 = 5.34$ . Next, the generalization performance was measured by the test dataset for CNNs using all signals (Condition A, 15 signals) and for CNNs using the signal set selected by FG-SSA (Condition B, average 5.34 signals). The result is shown in Figure 10. When all signals were used (Condition A), confusion in the estimation of "breaststroke" and "butterfly" was observed. On the other hand, when unnecessary signals were removed by FG-SSA (Condition B), misclassification was greatly reduced. Therefore, it can be concluded that signal reduction by FG-SSA is beneficial for improving the generalization performance.

Figure 10. Confusion matrices of the Conditions A and B using the test dataset ("Ba": backstroke, "Br": breaststroke, "Bu": butterfly, and "Fr": front crawl). The values were averaged over 100 seed results and standardized, where the summation in each row was 1.

DownLoad: Full-Size Img PowerPoint

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

This work was supported in part by JSPS Grant-in-Aid for Scientific Research (C) (Grant Nos. 21K04535 and 23K11310), and JSPS Grant-in-Aid for Young Scientists (Grant No. 19K20062). This work was also supported by the Tokyo City University Prioritized Studies.

Conflict of interest

The authors declare no conflicts of interest.

References

[1]	N. Shahini, Z. Bahrami, S. Sheykhivand, S. Marandi, M. Danishvar, S. Danishvar, et al., Automatically identified EEG signals of movement intention based on CNN network (end-to-end), Electronics, 11 (2022), 3297. https://doi.org/10.3390/electronics11203297 doi: 10.3390/electronics11203297
[2]	T. Zebin, P. J. Scully, K. B. Ozanyan, Human activity recognition with inertial sensors using a deep learning approach, Proceedings IEEE Sensors, (2017), 1–3. https://doi.org/10.1109/ICSENS.2016.7808590
[3]	W. Xu, Y. Pang, Y. Yang, Y. Liu, Human activity recognition based on convolutional neural network, Proceedings of the International Conference on Pattern Recognition, (2018), 165–170. https://doi.org/10.1109/ICPR.2018.8545435
[4]	Y. Omae, M. Kobayashi, K. Sakai, T. Akiduki, A. Shionoya, H. Takahashi, Detection of swimming stroke start timing by deep learning from an inertial sensor, ICIC Express Letters Part B: Applications ICIC International, 11 (2020), 245–251. https://doi.org/10.24507/icicelb.11.03.245 doi: 10.24507/icicelb.11.03.245
[5]	D. Sagga, A. Echtioui, R. Khemakhem, M. Ghorbel, Epileptic seizure detection using EEG signals based on 1D-CNN approach, Proceedings of the 20th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering, (2020), 51–56. https://doi.org/10.1109/STA50679.2020.9329321
[6]	N. Dua, S. N. Singh, V. B. Semwal, Multi-input CNN-GRU based human activity recognition using wearable sensors, Computing, 103 (2021), 1461–1478. https://doi.org/10.1007/s00607-021-00928-8 doi: 10.1007/s00607-021-00928-8
[7]	Y. H. Yeh, D. P. Wong, C. T. Lee, P. H. Chou, Deep learning-based real-time activity recognition with multiple inertial sensors, Proceedings of the 2022 4th International Conference on Image, Video and Signal Processing, (2022), 92–99. https://doi.org/10.1145/3531232.3531245
[8]	J. P. Wolff, F. Grützmacher, A. Wellnitz, C. Haubelt, Activity recognition using head worn inertial sensors, Proceedings of the 5th International Workshop on Sensor-based Activity Recognition and Interaction, (2018), 1–7. https://doi.org/10.1145/3266157.3266218
[9]	R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: Visual explanations from deep networks via gradient-based localization, Int. J. Comput. Vision, 128 (2016), 336–359. https://doi.org/10.1109/ICCV.2017.74 doi: 10.1109/ICCV.2017.74
[10]	B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 2921–2929.
[11]	M. Kara, Z. Öztürk, S. Akpek, A. A. Turupcu, P. Su, Y. Shen, COVID-19 diagnosis from chest ct scans: A weakly supervised CNN-LSTM approach, AI, 2 (2021), 330–341. https://doi.org/10.3390/ai2030020 doi: 10.3390/ai2030020
[12]	M. Kavitha, N. Yudistira, T. Kurita, Multi instance learning via deep CNN for multi-class recognition of Alzheimer's disease, 2019 IEEE 11th International Workshop on Computational Intelligence and Applications, (2019), 89–94. https://doi.org/10.1109/IWCIA47330.2019.8955006
[13]	J. G. Nam, J. Kim, K. Noh, H. Choi, D. S. Kim, S. J. Yoo, et al., Automatic prediction of left cardiac chamber enlargement from chest radiographs using convolutional neural network, Eur. Radiol., 31 (2021), 8130–8140. https://doi.org/10.1007/s00330-021-07963-1 doi: 10.1007/s00330-021-07963-1
[14]	T. Matsumoto, S. Kodera, H. Shinohara, H. Ieki, T. Yamaguchi, Y. Higashikuni, et al., Diagnosing heart failure from chest X-ray images using deep learning, Int. Heart J., 61 (2020), 781–786. https://doi.org/10.1536/ihj.19-714 doi: 10.1536/ihj.19-714
[15]	Y. Hirata, K. Kusunose, T. Tsuji, K. Fujimori, J. Kotoku, M. Sata, Deep learning for detection of elevated pulmonary artery wedge pressure using standard chest X-ray, Can. J. Cardiol., 37 (2021), 1198–1206. https://doi.org/10.1016/j.cjca.2021.02.007 doi: 10.1016/j.cjca.2021.02.007
[16]	M. Dutt, S. Redhu, M. Goodwin, C. W. Omlin, SleepXAI: An explainable deep learning approach for multi-class sleep stage identification, Appl. Intell., 53 (2023), 16830–16843. https://doi.org/10.1007/s10489-022-04357-8 doi: 10.1007/s10489-022-04357-8
[17]	S. Jonas, A. O. Rossetti, M. Oddo, S. Jenni, P. Favaro, F. Zubler, EEG-based outcome prediction after cardiac arrest with convolutional neural networks: Performance and visualization of discriminative features, Human Brain Mapp., 40 (2019), 4606–4617. https://doi.org/10.1002/hbm.24724 doi: 10.1002/hbm.24724
[18]	C. Barros, B. Roach, J. M. Ford, A. P. Pinheiro, C. A. Silva, From sound perception to automatic detection of schizophrenia: An EEG-based deep learning approach, Front. Psychiatry, 12 (2022), 813460. https://doi.org/10.3389/fpsyt.2021.813460 doi: 10.3389/fpsyt.2021.813460
[19]	Y. Yan, H. Zhou, L. Huang, X. Cheng, S. Kuang, A novel two-stage refine filtering method for EEG-based motor imagery classification, Front. Neurosci., 15 (2021), 657540. https://doi.org/10.3389/fnins.2021.657540 doi: 10.3389/fnins.2021.657540
[20]	M. Porumb, S. Stranges, A. Pescapè, L. Pecchia, Precision medicine and artificial intelligence: A pilot study on deep learning for hypoglycemic events detection based on ECG, Sci. Rep-UK., 10 (2020), 170. https://doi.org/10.1038/s41598-019-56927-5 doi: 10.1038/s41598-019-56927-5
[21]	S. Raghunath, A. E. U. Cerna, L. Jing, D. P. vanMaanen, J. Stough, D. N. Hartzel, et al., Prediction of mortality from 12-lead electrocardiogram voltage data using a deep neural network, Nat. Med., 26 (2020), 886–891. https://doi.org/10.1038/s41591-020-0870-z doi: 10.1038/s41591-020-0870-z
[22]	H. Shin, Deep convolutional neural network-based hemiplegic gait detection using an inertial sensor located freely in a pocket, Sensors, 22 (2022), 1920. https://doi.org/10.3390/s22051920 doi: 10.3390/s22051920
[23]	G. Aquino, M. G. Costa, C. F. C. Filho, Explaining one-dimensional convolutional models in human activity recognition and biometric identification tasks, Sensors, 22 (2022), 5644. https://doi.org/10.3390/s22155644 doi: 10.3390/s22155644
[24]	R. Ge, M. Zhou, Y. Luo, Q. Meng, G. Mai, D. Ma, et al, , Mctwo: A two-step feature selection algorithm based on maximal information coefficient, BMC Bioinformatics, 17 (2016), 142. https://doi.org/10.1186/s12859-016-0990-0 doi: 10.1186/s12859-016-0990-0
[25]	T. Naghibi, S. Hoffmann, B. Pfister, Convex approximation of the NP-hard search problem in feature subset selection, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, (2013), 3273–3277. https://doi.org/10.1109/ICASSP.2013.6638263
[26]	D. S. Hochba, Approximation algorithms for NP-hard problems, ACM SIGACT News, 28 (1997), 40–52. https://doi.org/10.1145/261342.571216 doi: 10.1145/261342.571216
[27]	C. Yun, J. Yang, Experimental comparison of feature subset selection methods, Seventh IEEE International Conference on Data Mining Workshops, (2007), 367–372. https://doi.org/10.1109/ICDMW.2007.77
[28]	W. C. Lin, Experimental study of information measure and inter-intra class distance ratios on feature selection and orderings, IEEE T. Syst. Man Cy-S, 3 (1973), 172–181. https://doi.org/10.1109/TSMC.1973.5408500 doi: 10.1109/TSMC.1973.5408500
[29]	W. Y. Loh, Classification and regression trees, Data Mining and Knowledge Discovery, 1 (2011), 14–23. https://doi.org/10.1002/widm.8 doi: 10.1002/widm.8
[30]	M. R. Osborne, B. Presnell, B. A. Turlach, On the lasso and its dual, J. Comput. Graph. Stat., 9 (2000), 319–337. https://doi.org/10.1080/10618600.2000.10474883
[31]	R. J. Palma-Mendoza, D. Rodriguez, L. de Marcos, Distributed Relieff-based feature selection in spark, Knowl. Inf. Syst., 57 (2018), 1–20. https://doi.org/10.1007/s10115-017-1145-y doi: 10.1007/s10115-017-1145-y
[32]	Y. Huang, P. J. McCullagh, N. D. Black, An optimization of Relieff for classification in large datasets, Data Knowl. Eng., 68 (2009), 1348–1356. https://doi.org/10.1016/j.datak.2009.07.011 doi: 10.1016/j.datak.2009.07.011
[33]	R. Yao, J. Li, M. Hui, L. Bai, Q. Wu, Feature selection based on random forest for partial discharges characteristic set, IEEE Access, 8 (2020), 159151–159161. https://doi.org/10.1109/ACCESS.2020.3019377 doi: 10.1109/ACCESS.2020.3019377
[34]	M. Mori, R. G. Flores, Y. Suzuki, K. Nukazawa, T. Hiraoka, H. Nonaka, Prediction of Microcystis occurrences and analysis using machine learning in high-dimension, low-sample-size and imbalanced water quality data, Harmful Algae, 117 (2022), 102273. https://doi.org/10.1016/j.hal.2022.102273 doi: 10.1016/j.hal.2022.102273
[35]	Y. Omae, M. Mori, E2H distance-weighted minimum reference set for numerical and categorical mixture data and a Bayesian swap feature selection algorithm, Mach. Learn. Know. Extr., 5 (2023), 109–127. https://doi.org/10.3390/make5010007 doi: 10.3390/make5010007
[36]	R. Garriga, J. Mas, S. Abraha, J. Nolan, O. Harrison, G. Tadros, et al., Machine learning model to predict mental health crises from electronic health records, Nat. Med., 28 (2022), 1240–1248. https://doi.org/10.1038/s41591-022-01811-5 doi: 10.1038/s41591-022-01811-5
[37]	G. Chandrashekar, F. Sahin, A survey on feature selection methods, Comput. Electr. Eng., 40 (2014), 16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024 doi: 10.1016/j.compeleceng.2013.11.024
[38]	N. Gopika, M. Kowshalaya, Correlation based feature selection algorithm for machine learning, Proceedings of the 3rd International Conference on Communication and Electronics Systems, (2018), 692–695. https://doi.org/10.1109/CESYS.2018.8723980
[39]	L. Fu, B. Lu, B. Nie, Z. Peng, H. Liu, X. Pi, Hybrid network with attention mechanism for detection and location of myocardial infarction based on 12-lead electrocardiogram signals, Sensors, 20 (2020), 1020. https://doi.org/10.3390/s20041020 doi: 10.3390/s20041020
[40]	F. M. Rueda, R. Grzeszick, G. A. Fink, S. Feldhorst, M. T. Hompel, Convolutional neural networks for human activity recognition using body-worn sensors, Informatics, 5 (2018), 26. https://doi.org/10.3390/informatics5020026 doi: 10.3390/informatics5020026
[41]	T. Thenmozhi, R. Helen, Feature selection using extreme gradient boosting bayesian optimization to upgrade the classification performance of motor imagery signals for BCI, J. Neurosci. Meth., 366 (2022), 109425. https://doi.org/10.1016/j.jneumeth.2021.109425 doi: 10.1016/j.jneumeth.2021.109425
[42]	R. Garnett, M. A. Osborne, S. J. Roberts, Bayesian optimization for sensor set selection, Proceedings of the 9th ACM/IEEE International Conference on Information Processing in Sensor Networks, (2019), 209–219. https://doi.org/10.1145/1791212.1791238
[43]	E. Kim, Interpretable and accurate convolutional neural networks for human activity recognition, IEEE T. Ind. Inform., 16 (2020), 7190–7198. https://doi.org/10.1109/TII.2020.2972628 doi: 10.1109/TII.2020.2972628
[44]	M. Jaén-Vargas, K. M. R. Leiva, F. Fernandes, S. B. Goncalves, M. T. Silva, D. S. Lopes, et al., Effects of sliding window variation in the performance of acceleration-based human activity recognition using deep learning models, PeerJ Comput. Sci., 8 (2022), e1052. https://doi.org/10.7717/peerj-cs.1052 doi: 10.7717/peerj-cs.1052
[45]	R. Chavarriaga, H. Sagha, A. Calatroni, S. T. Digumarti, G. Tröster, J. D. R. Millán, et al., The opportunity challenge: A benchmark database for on-body sensor-based activity recognition, Pattern Recogn. Lett., 34 (2013), 2033–2042. https://doi.org/10.1016/j.patrec.2012.12.014 doi: 10.1016/j.patrec.2012.12.014
[46]	H. Sagha, S. T. Digumarti, J. D. R. Millán, R. Chavarriaga, A. Calatroni, D. Roggen, et al., Benchmarking classification techniques using the opportunity human activity dataset, 2011 IEEE International Conference on Systems, Man and Cybernetics, (2011), 36–40. doi: 10.1109/ICSMC.2011.6083628
[47]	A. Murad, J. Y. Pyun, Deep recurrent neural networks for human activity recognition, Sensors, 17 (2017), 2556. https://doi.org/10.3390/s17112556 doi: 10.3390/s17112556
[48]	J. B. Yang, M. N. Nguyen, P. P. San, X. L. Li, S. Krishnaswamy, Deep convolutional neural networks on multichannel time series for human activity recognition, Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, (2015), 3995–4001.
[49]	O. Banos, J. M. Galvez, M. Damas, H. Pomares, I. Rojas, Window size impact in human activity recognition, Sensors, 14 (2014), 6474–6499. https://doi.org/10.3390/s140406474 doi: 10.3390/s140406474
[50]	T. Tanaka, I. Nambu, Y. Maruyama, Y. Wada, Sliding-window normalization to improve the performance of machine-learning models for real-time motion prediction using electromyography, Sensors, 22 (2022), 5005. https://doi.org/10.3390/s22135005 doi: 10.3390/s22135005
[51]	J. Wu, X. Y. Chen, H. Zhang, L. D. Xiong, H. Lei, S. H. Deng, Hyperparameter optimization for machine learning models based on bayesian optimization, J. Electron. Sci. Technol., 17 (2019), 26–40. https://doi.org/10.11989/JEST.1674-862X.80904120 doi: 10.11989/JEST.1674-862X.80904120
[52]	P. Doke, D. Shrivastava, C. Pan, Q. Zhou, Y. D. Zhang, Using CNN with bayesian optimization to identify cerebral micro-bleeds, Mach. Vision Appl., 31 (2020), 1–14. https://doi.org/10.1007/s00138-020-01087-0 doi: 10.1007/s00138-020-01087-0
[53]	J. Bergstra, R. Bardenet, Y. Bengio, B. Kegl, Algorithms for hyper-parameter optimization, Adv. Neural Inf. Process. Syst., 24 (2011), 2546–2554.
[54]	T. Akiba, S. Sano, T. Yanase, T. Ohta, M. Koyama, Optuna: A next-generation hyperparameter optimization framework, Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, (2019), 2623–2631, https://optuna.readthedocs.io/en/stable/. doi: 10.1145/3292500.3330701
[55]	H. Makino, E. Kita, Stochastic schemata exploiter-based AutoML, 2021 IEEE International Conference on Data Mining Workshops, (2021), 238–245. https://doi.org/10.1109/ICDMW53433.2021.00037
[56]	P. Siirtola, P. Laurinen, J. Roning and H. Kinnunen, Efficient accelerometer-based swimming exercise tracking, IEEE SSCI 2011: Symposium Series on Computational Intelligence, (2011), 156–161. https://doi.org/10.1109/CIDM.2011.5949430
[57]	G. Brunner, D. Melnyk, B. Sigfússon, R. Wattenhofer, Swimming style recognition and lap counting using a smartwatch and deep learning, 2019 International Symposium on Wearable Computers, (2019), 23–31. https://doi.org/10.1145/3341163.3347719

This article has been cited by:

1.	Yusuke SAKAI, Kenji IWAMOTO, Yuto OMAE, Tsuyoshi MIKAMI, Takuma AKIDUKI, Kazuki SAKAI, Marco MEYER-CONDE, Hirotaka TAKAHASHI, Sleep Stage Classification Using Convolutional Neural Networks with Multichannel Physiological Signals, 2025, 37, 1347-7986, 506, 10.3156/jsoft.37.1_506
2.	Wenqiang Shi, Fei Teng, Yingke Lei, Hu Jin, DPDS: A Systematic Framework for Few-Shot Specific Emitter Incremental Identification, 2025, 12, 2327-4662, 8725, 10.1109/JIOT.2024.3502406
3.	Muhammad Sufyian Mohd Azmi, Mohd Hazli Mohamed Zabil, Lim Kok Cheng, 2024, Human Activity Recognition: The Role of Hybrid Feature Extraction Technique in Accelerometer Data Using a Voting Classifier Approach, 979-8-3315-3030-3, 380, 10.1109/ICOCO62848.2024.10928221

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.4

Metrics

Article views(2211) PDF downloads(123) Cited by(3)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(10) / Tables(3)

AIMS Mathematics

Features gradient-based signals selection algorithm of linear complexity for convolutional neural networks

Related Papers:

Abstract

1. Introduction

2. Related works

3. Proposed method

3.1. Initial signals set

3.2. Class estimation

3.3. Time-directional grad-CAM

3.4. Signals importance index

3.5. Signals selection algorithm

4. Experiment 1: Relationship between validation accuracy and the number of deleted signals

4.1. Objective and outline

4.2. Dataset

4.3. Result and discussion

5. Experiment 2: Effectiveness of FG-SSA on generalization scores

5.1. Objective and outline

5.2. Result and discussion

6. Comparison of FG-SSA and other methods

6.1. Objective and outline

6.2. Result and discussion

7. Conclusion, limitations, and future works

A. Appendix: Layers structure of CNNs

B. Appendix: Effectiveness of FG-SSA on swimming style classification

B.1. Outline

B.2. Preprocessing

B.3. Result and discussion

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Mathematics

Features gradient-based signals selection algorithm of linear complexity for convolutional neural networks

Related Papers:

Abstract

1. Introduction

2. Related works

3. Proposed method

3.1. Initial signals set

3.2. Class estimation

3.3. Time-directional grad-CAM

3.4. Signals importance index

3.5. Signals selection algorithm

4. Experiment 1: Relationship between validation accuracy and the number of deleted signals

4.1. Objective and outline

4.2. Dataset

4.3. Result and discussion

5. Experiment 2: Effectiveness of FG-SSA on generalization scores

5.1. Objective and outline

5.2. Result and discussion

6. Comparison of FG-SSA and other methods

6.1. Objective and outline

6.2. Result and discussion

7. Conclusion, limitations, and future works

A. Appendix: Layers structure of CNNs

B. Appendix: Effectiveness of FG-SSA on swimming style classification

B.1. Outline

B.2. Preprocessing

B.3. Result and discussion

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog