1.
Introduction
Attention deficit hyperactivity disorder (ADHD) is one of the most common neurodevelopmental disorders, estimated to affect 5–7% of children worldwide [1]. Impaired functioning is characterized by major symptoms such as inattention, hyperactivity, impulsivity, and emotional dysregulation. A youngster with this illness also has trouble regulating their impulses and focusing on a single activity at a time. The timely and accurate diagnosis of ADHD is critical for early intervention and treatment access, which can improve long-term outcomes [2]. However, diagnosis remains challenging, often based on subjective behavioral observations due to the lack of reliable biomarkers [3]. Emerging research has sought to identify neurophysiological features that could serve as objective indicators of ADHD pathology. Electroencephalography (EEG) is a non-invasive technique that directly measures electrical brain activity and has shown promise for probing neural dysfunction in ADHD [4]. Prior studies have reported EEG abnormalities in ADHD children, including altered theta/beta ratio [5], reduced P300 event-related potential, and EEG power differences [6]. Advanced analytical approaches, such as machine learning applied to EEG data, may enable robust classification of ADHD.
The significances of a delayed diagnosis and treatment for ADHD may have detrimental effects on individuals, potentially leading to the development of more extensive mental health disorders, difficulties in interpersonal relationships and work, engagement in criminal behaviors, and the abuse of substances. The detrimental consequences of untreated ADHD have been extensively reported, exhibiting poor impacts on scholastic achievements [7], social interactions [8], occupational prospects [9], and overall mortality rates [10].
The use of ML approaches for the diagnosis of ADHD in individuals aged 17 years and above is a contemporary strategy for addressing this particular concern. Knowledge-based systems are frequently employed in medical environments where there is a substantial need for interpretability. These systems strive to explicitly represent knowledge by utilizing tools such as production or if-then rules. This enables the system to engage in reasoning processes to arrive at conclusions and offer explanations of its rationale to the user [11]. A hybrid approach was employed to leverage the advantages of machine learning-based methodologies while maintaining the interpretability of knowledge-based systems [12]. This approach integrates patterns derived from machine-learning algorithms with the expertise provided by clinicians, resulting in a unified framework that optimally combines both approaches.
1.1. Contribution
ADHD is a complex disorder characterized by a diverse array of symptoms, as mentioned previously. Timely intervention and accurate identification offer the potential to modify neural connections and enhance symptomatology. However, due to the multifaceted nature of ADHD, the presence of co-occurring disorders, and a global shortage of diagnostic professionals, the identification of ADHD is frequently delayed. Therefore, it is crucial to explore alternative approaches to enhance the effectiveness of early detection, such as leveraging deep learning techniques. These techniques have the potential to augment existing diagnostic methods and contribute to more efficient and timely identification of ADHD.
This system aims to automate ADHD classification using EEG. The method employs neural signal analysis to discriminate ADHD from typically developing children through rigorous automated classification of ADHD. The successful implementation of this framework has the potential to significantly aid clinical diagnosis and facilitate early access to treatment for individuals with ADHD.
Furthermore, this work contributes to the ongoing efforts to enhance ADHD diagnostics and deepen our understanding of the underlying neural mechanisms of the disorder. To address the research gaps identified in previous investigations, an integrated system was devised, combining a CNN with BiLSTM networks, along with a gated recurrent unit-transformer (GRU-Transformer block). This integrated system aims to improve upon existing approaches and advance the field of ADHD classification.
By utilizing deep learning techniques and developing an EEG-based framework, we strive to enhance the accuracy and efficiency of ADHD identification, ultimately benefiting individuals with ADHD and the clinical community.
1.2. Background of research
The National Survey on Children's Health (NSCH) during 2016–2017 found that 6,630 children (ranging in age from 3 to 17) were officially diagnosed with ADHD. It is worth noting that children, on average, are diagnosed with ADHD at the age of 12.4 years. In the realm of research, scientists have employed diverse machine-learning classifiers [13,14,15,16,17,18] to identify children who might be at risk of developing ADHD.
Uluyagmur-Ozturk and colleagues [19] conducted a study investigating the relationship between psychological health and diagnoses of autism spectrum disorder (ASD) and ADHD among young people in Turkey. The study included 61 participants sourced from the Marmara University Medical Center. This diverse group consisted of 30 children diagnosed with ADHD, 18 classified with ASD, and 13 typically developing children, with ages ranging from 9.22 to 10.50 years. To classify individuals into the respective groups, the researchers employed five machine-learning approaches: decision tree (DT), random forest (RF), SVM, k-nearest neighbor (KNN), and AdaBoost (AB) algorithms. Remarkably, the AB algorithm achieved an accuracy of 80%, demonstrating its effectiveness in classification tasks.
Slobodin et al. [20] conducted a study utilizing a continuous performance test (CPT) to examine signs of ADHD in children and adolescents. The study involved 458 participants aged 6 to 12 years, who were divided into different age groups. The mean age of the group consisting of children and adolescents was 8.7 years, with a standard deviation of 1.8 years. Among the children, 46.51% were diagnosed with ADHD. Interestingly, statistical analysis revealed no significant age difference between individuals with ADHD and those without the condition (p-value = 0.94). Various machine learning classifiers were employed to generate predictions for ADHD. The training set, which constituted 60% of the data, was used to train the classifiers, while the remaining 40% served as a test set to evaluate classifier performance. The machine learning-based classifiers demonstrated impressive levels of accuracy (87.0%), sensitivity (89.0%), and specificity (84.0%).
Morrow et al. [21] conducted a research study focusing on examining the impact of therapy on children and adolescents diagnosed with ADHD. The study employed four distinct machine learning classifiers to identify relevant characteristics of children seeking medical treatment for ADHD. Specifically, Classification and Regression Trees (CART), Logistic Regression (LR), Endpoint Detection and Response (EDR), and Deep Net classifiers were utilized. Notably, the Deep Net-based classifier demonstrated superior performance with a robust area under the curve (AUC) value of 0.72, outperforming CART, EDR, and LR classifiers.
The utilization of EEG data as a diagnostic tool for ADHD remains a topic of debate, primarily due to the lack of standardized criteria for its usage and interpretation. Moghaddari et al. [22] addressed this issue by implementing a CNN model to diagnose ADHD in children based on EEG readings. A total of 61 participants took part in the study, with 31 diagnosed with ADHD and 30 considered to be typically developing. Prior to analysis, the collected EEG signals underwent a preprocessing step to eliminate unwanted artifacts and background noise. Subsequently, the EEG data was transformed into RGB pictures using a conversion process, which was then processed by a 13-layer CNN. The model's performance was evaluated using a 5-fold cross-validation procedure, resulting in an impressive average accuracy of 99.06% for the model validation.
Tosun et al. [23] employed LSTM, power spectral density (PSD), and spectral entropy (SE). The suggested method for conducting research was evaluated using an 80:20 hold-out validation strategy, which resulted in an accuracy of 92.15%.
In their study, Khoshnoud et al. [24] investigated a group of young individuals diagnosed with ADHD, employing nonlinear EEG analytic techniques for their analysis. The researchers utilized two measures, namely the maximum value of the Lyapunov exponent (LLE) and the approximation entropy (ApEn), to evaluate the nonlinear characteristics of the EEG signals. To facilitate the evaluation process, a probabilistic neural network (PNN) was employed, leading to a classification accuracy of 87.5%.
Chen et al. [25] undertook the development of a deep learning model with the purpose of accurately identifying individuals with ADHD within the pediatric population. The study involved the translation of raw EEG data into visual representations, which were subsequently inputted into a CNN model. The dataset utilized in their work consisted of EEG recordings collected from a representative group of children and adolescents. Specifically, the dataset included a total of 102 participants, comprising 51 typically developing children and 51 children diagnosed with ADHD. Notably, the deep learning model achieved an impressive accuracy rate of 94.67%.
Tenev et al. [26] presented a technique to speed up the process of detecting and classifying individuals who have been diagnosed with ADHD. Their study included 117 individuals: 67 diagnosed with ADHD, and 50 healthy controls. The EEG signals of the participants were analyzed. To conduct the data analysis, they made use of SVM in addition to a voting mechanism. The suggested approach had a significant overall effectiveness of 82.3%.
Saini et al. [27] used EEG data for testing ML models for predicting ADHD. The featured selection in the given model was done using PCA approach, while the data classification was done using KNN. With EEG data collected from 77 ADHD children and 80 typically developing children.
Dubreuil-Vall et al. [28] conducted a study involving EEG data collection from a group of 40 volunteers, including 20 individuals diagnosed with ADHD and 20 healthy controls. The study aimed to develop a procedure for identifying ADHD using EEG data. The researchers utilized spectrogram visuals generated from the EEG data as inputs for a CNN model. The findings of the study revealed an accuracy of 88% for data classification.
Tor et al. [29] used a mix of nonlinear features, empirical model decomposition (EMD), and discrete wavelet transform (DWT) decomposition methods for predicting ADHD. They analyzed EEG data obtained from a group of 123 children and teenagers. A subgroup representing 45% exhibited symptoms of ADHD. By leveraging the aforementioned analytical approaches, the researchers aimed to differentiate and classify individuals based on their specific diagnoses.
In a recent study conducted by Loh et al. [30], a comprehensive analysis was undertaken to explore and compare various computerized diagnostic approaches for ADHD. The researchers delved into published research encompassing these different techniques, aiming to gain a thorough understanding of their strengths and limitations. By conducting a comparative analysis, the study shed light on the distinctions and similarities among the evaluated approaches, providing valuable insights into the advancements made in the field of computerized ADHD.
The aforementioned techniques were devised to automate the diagnosis of ADHD via the analysis of physiological data and photographic documentation. Table 1 presents a summary of the previous study focusing on the detection of ADHD.
2.
Materials and methods
We implement a rigorous methodology pipeline to classify ADHD from healthy and controls using EEG data. EEG provides a direct measure of cortical activity and has shown utility as a biomarker for various neurological and psychiatric conditions.
The proposed framework includes critical steps for robust EEG-based classification, including preprocessing, feature extraction, feature selection, classifier optimization, and performance evaluation. Preprocessing involves filtering and artifact removal to isolate neural signals. Informative features that capture distinguishing characteristics are then extracted from the clean EEG data, as shown in Figure 1. Feature selection serves to eliminate redundant variables and improve generalizability. The chosen features were utilized to train a machine learning classifier with the objective of accurately classifying subjects as either ADHD or healthy.
2.1. EEG Dataset
The EEG data employed in this study were sourced from a standard dataset, which consisted of 61 children diagnosed with ADHD and 60 typically developing controls. Further details regarding the dataset can be found in this section.
The dataset, collected by Nasrabadi, A.M., is accessible online at the following link: "https://ieee-dataport.org/open-access/eeg-data-adhd-control-children" (accessed on 25 August 2023).
2.1.1. Participants
The dataset consisted of 61 children diagnosed with ADHD and 60 typically developing children. Additional details regarding the dataset can be found in Table 2. The ADHD cohort was recruited from referrals to the psychiatric clinic. The diagnosis of ADHD was made by an experienced child and adolescent psychiatrist, following the DSM-IV criteria. The control group, comprising 50 participants from an all-boys school and 10 from an all-girls school in Tehran, was carefully selected. A psychiatrist evaluated the control group to ensure the absence of any neurological or psychiatric disorders [41].
2.1.2. EEG data acquisition
The collection of EEG signals was performed utilizing a 19-channel digital acquisition equipment known as the SD-C24. The signals were acquired at a sampling rate of 128 Hz, with a resolution of 16 bits. Recording was conducted based on a visual attention task comprising 20 images with varying numbers of characters (5–16 per image). The images were sufficiently large and randomly distributed to elicit sustained visual attention. Each image was displayed immediately after the participant's response to maintain continuous visual stimuli. The task had performance penalties associated with performance. The globally established 10–20 technique was used to put the 19 scalp EEG electrodes [41]. This system ensures standardized coverage of all regions of the brain. The anterior region electrodes included Fp1, Fp2, F7, F3, Fz, F4, and F8. The posterior region was covered by electrodes T5, P3, Pz, P4, T6, O1, and O2. The central region is comprised of C3 and T3 in the left hemisphere and C4 and T4 in the right hemisphere. Finally, reference electrodes A1 and A2 were placed on the left and right earlobes, respectively, as shown in Figure 2. This spatial arrangement and color coding allowed for a clear visualization of the electrode positions across the scalp and their correspondence to the underlying cortical regions. The 10–20 placement optimized the recording of brain electrical activity relevant for analysis and classification.
The EEG signal activity of both the Control and ADHD groups was visualized, with each subplot depicting 5 seconds of data from three randomly chosen electrodes: Fp1, Cz, and Pz. Figure 3 presents these plots, showcasing the temporal variation in signal amplitude for the selected electrodes. The left column corresponds to the Control group, while the right column corresponds to the ADHD group.
Figure 4 displays the visualization of raw EEG signals for a subset of channels (Fp1, Fp2, C3, C4, O1, and O2). The EEG time series data is presented over a 10-second window, providing a visual representation of the recorded EEG activity.
Short-Time Fourier Transform (STFT) time-frequency analysis was performed on selected EEG channels. STFT analysis in ADHD research has several benefits. First, it has the capability to investigate non-stationary signals in both the temporal and frequency domains. This unique approach allows for the detection of transient events and facilitates the understanding of changes in frequency bands over time, which is crucial for gaining insights into the neurological aspects of ADHD. Second, the STFT enables the examination of event-related potentials, spectral analysis, and abnormality detection. These capabilities are valuable in differentiating ADHD patients from individuals with typical development. The fine-grained time-frequency resolution of the STFT is particularly valuable, as it enhances the localization of neuronal events. This makes it useful for tasks such as artifact removal and the development of sophisticated analysis techniques like machine learning models for ADHD classification. Overall, the STFT excels in managing complex EEG data and provides comprehensive insights into the brain pathways associated with ADHD. It is preferred over other techniques such as the Wavelet Transform and Multitaper in ADHD research due to its balanced time-frequency resolution and easy interpretability. The STFT's uniform analytic grid is well-suited for capturing the temporal dynamics of EEG signals, which is essential for investigating the neurological features of ADHD. Furthermore, the STFT is recommended for its computing efficiency and the clarity it offers in time-frequency analysis. These qualities make it a practical and efficient solution for the sophisticated examination of EEG data in ADHD research. This analysis generates spectrograms for each channel, providing insights into the dynamics of frequency content over time. Figure 5 illustrates the STFT representations of EEG channels.
The spectrograms obtained from the analysis depict the power distribution of each frequency band across different time points, enabling the identification of EEG patterns associated with cognitive or neurological events. For instance, fluctuations in the alpha band (8–13 Hz) may indicate a state of relaxation, while alterations in the beta band (13–30 Hz) could be indicative of active thinking or focused attention.
2.2. Preprocessing
We implement a rigorous data preprocessing and feature engineering pipeline to develop a robust EEG-based diagnostic system for ADHD. Raw EEG signals were filtered to isolate clinically relevant frequency bands, including delta, theta, alpha, and beta using fourth-order Butterworth band pass filters. Notch filters were also applied to eliminate power line interference.
Filtering enabled the extraction of band-specific features corresponding to neuronal oscillations associated with ADHD pathophysiology. ADHD neuro biomarkers include heightened theta and lower beta activity. Preserving frequency-specific information is critical for modeling the complex spectral signatures of ADHD versus healthy brains.
2.2.1. Feature extraction approach
The preprocessed EEG signals were divided into segments of 2-second duration using a sliding window technique with an overlap of 50% between consecutive windows. This segmentation allows for the extraction of discriminative features localized over time. Descriptive features capturing the characteristics of the signal in both the time and frequency domains were extracted for each 2-second window. Statistical features based on the raw signal values in the time domain were calculated to capture variability, distribution, and complexity. Measures such as standard deviation, skewness, kurtosis, and Hjorth parameters represent the variance, symmetry, tailed ness, and complexity of the distribution, respectively. This combination of statistical and spectral features provides a multidimensional profile, encapsulating both time-based morphology and frequency-based brain activity relevant to EEG analysis. The values calculated on short windows enable the tracking of dynamic fluctuations over time.
2.2.2. Feature selection approach
Robust feature selection was critical for extracting the most discriminative biomarkers from the complex, multidimensional EEG data and for enhancing the performance of our machine- and deep-learning models. A two-stage feature-screening pipeline, incorporating both filter-based and wrapper-based techniques to identify optimal ADHD-relevant features, was implemented.
In the first stage, PCA was applied as a filter-based approach to reduce dimensionality and derive a lower-dimensional feature subspace that captured 95% of the variance. By eliminating redundant and irrelevant features, PCA enhanced the signal-to-noise ratio in the input data for more efficient modeling.
A wrapper-based feature selector, using Chi-square testing to refine the PCA-filtered features, was subsequently implemented. Chi-square assessed the correlation between each feature and the ADHD/control classes to rank features by discriminative power. Only the top-ranked Chi-square features, that exhibited the strongest diagnostic relevance, were retained.
This two-step filtering approach combines the power of a supervised wrapper technique (Chi-square) with an unsupervised filter method (PCA) to extract both generalized and class-specific predictive information from EEG data. The research experiments demonstrated significant performance gains from coupling Chi-square and PCA, rather than using either in isolation.
2.2.3. Data splitting
The final dataset was split into training and testing sets using a ratio of 80:20. While 80% of the data was allocated to the training set, which is used to fit the machine-learning models, the remaining 20% was assigned to the testing set, which provided an unbiased evaluation of the model's performance on new unseen data.
2.2.4. Handling imbalanced class
In the original dataset, the ADHD class constituted the majority class, with significantly more samples than the control group. This imbalance could lead the learning algorithms to focus primarily on the ADHD class and compromise the accuracy of the control group. Figure 6 shows class distribution of the original dataset.
To mitigate this issue, the Synthetic Minority Oversampling Technique (SMOTE) algorithm was applied to address the class imbalance in the training data after data preprocessing. SMOTE approach generates new synthetic samples from the minority class (control group) to improve its representation. Oversampling occurs by providing synthetic instances along line segments to nearest neighbors for each minority class sample [8].
SMOTE oversampling was implemented on the training data to generate new synthetic control samples. This achieved a more balanced class distribution between the ADHD and control groups in the training set. Balanced representation will enable the machine- and deep-learning models to learn the nuances of both classes more effectively. The testing set was kept untouched to provide an unbiased estimate of real-world performance. The balanced training data is expected to build more robust classifiers with equal emphasis on each class and improved sensitivity. This approach ensures a meticulous and unbiased evaluation of the model's capabilities, promising reliable performance assessments in real-world scenarios.
We found a class imbalance in the original dataset, with ADHD and control group samples distributed unevenly. Each trial lasted varying amounts of time depending on how long each youngster took to count the animals and enter their response. The 50 seconds were the shortest trial for a control and the longest was 285 seconds for an ADHD patient. This bias had to be addressed to enable thorough model training due to this imbalance. To address the imbalance in the training data, SMOTE was carefully applied to the control class, which was underrepresented. The SMOTE method finds each minority class sample's k nearest neighbors based on feature space closeness. Then, along the chosen neighbor's line segment that connects the minority sample, synthetic samples are created. This will be carried out until the representation of minority classes achieves the necessary level of domination. To prevent bias in the test data, SMOTE was exclusively applied to the training data. To evaluate the model's generalizability to real-world unbalanced circumstances, the untouched test subset preserved the fundamental class imbalance. This imbalance gave an honest assessment of the model's capacity to manage dataset issues. SMOTE approach was applied to balance just the EEG training dataset allowed robust model tweaking while leaving the EEG test dataset.
2.3. Proposed model
The machine-learning algorithms, including naive Bayes, support vector machines, ensemble methods, and deep neural networks, such as convolutional and gated recurrent units, to EEG biomarker datasets were applied.
2.3.1. Machine learning
Machine-learning algorithms build mathematical models from sample data, known as training data, to make predictions or decisions without being specifically coded for the task [42]. Machine-learning approaches are commonly categorized as supervised, unsupervised, or reinforcement learning based on the nature of the problem and data labeling. In supervised learning, the training data comprise examples with known output labels, and the algorithms learn to map inputs to outputs [43].
A range of standard supervised learning algorithms were implemented for comparative benchmarking on the ADHD classification task. The models encompass both linear classifiers, such as probabilistic models including naive Bayes, support vector machines, and ensemble methods like random forest, gradient boosting, and model stacking. Naive Bayes provides a probabilistic framework for classification, assuming feature independence to estimate class probabilities via Bayes' theorem [44]. Support vector machines are commonly used for classification and regression tasks [45]. The goal of SVMs is to find the optimal separating hyperplane that maximizes the margin between classes in a high-dimensional space. SVMs do not use all training points to define the hyperplane; rather, they select a subset of points near the class boundaries called support vectors. SVMs are the critical elements that determine the hyperplane and have the greatest influence.
A stacking ensemble was implemented by training a high-level Cat Boost classifier on the combined predictions from base random forest and Light Gradient Boosting Machines (GBM) models to improve predictive performance. These diverse algorithms provide complementary modeling strengths for assessing a comprehensive set of approaches for robust ADHD classification from EEG data.
2.3.2. Deep learning algorithm
Deep learning constitutes a subfield of machine learning focused on architecting artificial neural networks with multiple layers to automatically learn representations and extract hierarchical features from data [46]. This facilitates the development of highly capable predictive models for complex tasks, including medical diagnoses.
In this study, CNN architecture used 1D convolutions to learn a hierarchical feature representation directly from the raw EEG signals and trained end-to-end on the labeled data for the diagnostic classification task.
The model comprises convolutional layers to automatically learn spatial feature representations interspersed with max pooling, dropout, and fully connected layers [47,48,49,50]. The input EEG signals are input into 1D convolutions. Temporal correlations were identified using two consecutive Conv1D layers with 64 filters and a kernel size of 3. Rectified linear unit (ReLU) activation was employed for nonlinear transformations. To improve generalization, dropout with a rate of 0.5 was implemented between convolutional blocks. Max pooling layers reduced dimensionality while retaining significant features. The convolutional feature maps were flattened into a 1D vector and connected to a series of dense layers for high-level reasoning, with a progressively decreasing number of units. ReLU activation was again applied to nonlinear combinations. Two nodes in the final Softmax output layer allowed binary categorization of ADHD and control groups. Figure 7 depicts model architecture.
The convolutional-LSTM path also applies 1D convolutions to obtain local features. A hundred twenty eight filters in the Conv1D layer were used. This is followed by max pooling and a bidirectional LSTM with 64 units to learn temporal dependencies in both directions. The LSTM output is processed by a 1024-unit dense layer and dropout before being concatenated with the other paths' outputs.
Within the context of CNNs, the symbol F denotes a convolution kernel or filter, while i, and j pertain to the specific rows and columns of an image I, respectively. The input picture is convolved with the kernel, yielding a novel two-dimensional output. The procedure involves decomposing the image into individual neurons, followed by flattening these neurons along the y and z dimensions. Each layer inside the network is equipped with a set of x filters that are designed to detect and identify traits. Feature maps of size X are generated by Layer L, and these feature maps are annotated appropriately.
The term BLi denotes the bias matrix, whereas FLi,j signifies the filter that connects the jth feature map inside the layer.
The equations above are often used in the sequential forward and sequential backward procedures. They denote the equations that encapsulate the BiLSTM model. The BiLSTM network may be conceptualized as a gated cell that assesses input data and determines its retention depending on its significance or weight. The BiLSTM model is composed of three fundamental components: The input gate, the forget gate, and the output gate. The forget gate, denoted as ft, is responsible for determining the states that should be retained in memory or discarded. The input gate it adjusts the value by considering the incoming signals. The output gate, denoted as ot, facilitates the transmission of the cell state to adjacent neurons. The architecture comprises of a logistic layer and an additional layer responsible for generating a novel vector that is then combined with the existing state. In the context of a recurrent neural network (RNN), the input Xt is processed by the hidden layer using the weight matrix W, resulting in the generation of the final output yt. The LSTM model incorporates a memory cell denoted as ht, which serves as a pivotal component regulated by three distinct gates.
2.3.3. GRU-Transformer approach
The model implements GRU layers for sequence modeling [51]. The input layer is designed to accommodate the time series EEG feature matrix, which consists of a single channel. Two sequential GRU layers with 1,024 and 128 units were stacked to learn temporal relationships. The second GRU layer returns the full sequence for further modeling. GRUs contain gating units that modulate information flow, enabling them to better capture long-range dependencies compared to vanilla RNNs. The final Softmax output layer produced normalized predicted probabilities for the two target classes: ADHD and the healthy control. The model's architecture is shown in Figure 8.
The symbol xt denotes the input, ot represents the output, μt signifies the output of the update gate, rt denotes the output of the reset gate, and ⊙ denotes the Hadamard product. The parameters or weight matrices are denoted as V, W, and b.
The GRU encoder and Transformer path uses a recurrent GRU layer to produce embeddings of the input sequence. Thirty-two GRU units were applied to encode 32-dimensional vectors at each timestep. Multi-head self-attention with 2 heads is then applied, which allows the GRU embeddings to attend to each other based on learned relationships. Residual connections and layer normalization stabilize the training. The attention outputs are flattened to a 1D vector.
2.3.4. Proposed system
A novel deep neural network architecture that combines convolutional, recurrent, and attention-based models for improved sequence classification performance has been developed. The core of the model is the integration of the CNN, CNN-LSTM, and GRU-Transformer blocks through concatenation of their outputs. This merging allows the model to leverage the diverse representations learned by each path in a unified architecture. The concatenated feature vector encapsulates local spatial-temporal correlations from the CNN, long-term dependencies from the LSTM, and global relationships from the Transformer's attention mechanism. This robust combined representation is fed into additional dense layers for final classification. A 1024-unit Dense layer learns nonlinear combinations of the concatenated features. Dropout regularization prevents overfitting. The final Softmax output layer predicts class probabilities. By concatenating the complementary outputs of the CNN, RNN, and Transformer paths, the developed model achieves robust sequence classification. The integrated architecture leverages the strengths of each block-convolutional features, recurrence, and attention - to represent the input data in multiple ways for improved accuracy. The goal of this multi-path, and concatenated design is to enhance classification performance compared to single-path models. By merging diverse spatial, temporal, and attention-based representations, the model can capture nuances in the data that might be missed by CNN, RNN, or Transformer architectures alone. Figure 9 displays the integrating model, and Table 3 shows the parameters of integrating models.
2.4. Experimental
In this section, we delineate the experimental framework used to develop and validate the developed EEG-based ADHD classification models. Detailed results are presented and analyzed to benchmark the efficacy of the proposed approach against state-of-the-art methods. The rigorous experimental pipeline provides critical insights into the real-world viability of using EEG data and machine learning for enhanced ADHD screening. The developed framework could inform best practices for applying these techniques to improve the diagnosis of neurological conditions.
2.4.1. System setup
The experiments were conducted using a laptop workstation with a Core i7 CPU, 8 GB of RAM. This provided sufficient computing capabilities for efficient model training and evaluation. Models were implemented using TensorFlow [15] and Scikit-learn, an open-source framework for model's designing, training, and testing. TensorFlow uses GPU acceleration, which significantly expedites neural network computations compared to CPU-only environments.
2.4.2. Evaluation metrics
Assessing the performance of models is crucial to comprehending their proficiency [16]. Several evaluation metrics exist, such as accuracy, sensitivity, precision, recall, F1-score, receiver operating characteristic (ROC) curve, and confusion matrix. Each metric provides unique insights into the strengths and weaknesses of the model. Thorough evaluation using diverse metrics offers a comprehensive portrayal of model efficacy.
2.4.3. Confusion matrix
The confusion matrix constitutes an essential evaluation tool for binary classification systems by summarizing the predictive performance across the test dataset. Its cardinal components encompass True Positives (TP): ADHD cases correctly classified by the model as positive; False Positives (FP): Control cases incorrectly predicted as ADHD (positive); True Negatives (TN): Control cases properly classified by the model as negative; and False Negatives (FN): ADHD cases incorrectly classified as controls (negative). By tabulating the true and false, positive ADHD, and negative control predictions, the confusion matrix facilitates the quantitative assessment of the classifier's discrimination proficiency between ADHD and control classes. It illuminates critical errors through false positives (controls predicted as ADHD) and false negatives (ADHD predicted as controls) to enable the identification of learning deficiencies.
2.4.4. Accuracy
Accuracy is calculated as the ratio of correct predictions to total predictions, as shown in Eq (12). It serves as a commonly used metric for assessing the performance of models.
2.4.5. Sensitivity
Sensitivity is defined as the proportion of actual positives accurately detected as in Eq (13). It quantifies the rate of true positives in binary classification. It measures model proficiency in identifying positive cases (ADHD) without type Ⅱ errors.
2.4.6. F1-Score
The F1-score is the harmonic mean of precision and recall. It synthesizes these metrics into a singular value, providing a balanced evaluation of model performance. A high F1-score indicates strong precision and recall, meaning few false positive and false negative predictions. Unlike accuracy alone, the F1-score offers a nuanced portrait of classification proficiency. Calculated via Eq (14), the F1-score furnishes crucial insights beyond accuracy.
2.4.7. Specificity
Specificity measures a binary classifier's ability to correctly identify negatives. It is quantified as the ratio of true negatives (TD) to total negatives. It evaluates model competence in avoiding false positives—negative instances mistakenly classified as positive. The specificity calculation was performed using Eq (15).
2.4.8. Receiver operating characteristics
The true positive rate (TPR) against false positive rate (FPR) ROC curve shows model discrimination abilities. The ROC curve shows the tradeoff between true and false positives as TPR measures proper identification and FPR measures negative misclassifications. This illuminates balanced model performance beyond accuracy.
3.
Results
3.1. Machine learning classification results
We evaluated three machine-learning classifiers—Gaussian naïve Bayes, SVM, and a stacking ensemble to improve the accuracy of ADHD diagnosis from EEG biomarkers. PCA was first utilized for feature selection, retaining 95% of the data variance. With the PCA-reduced features alone, the SVM model achieved the highest diagnostic performance, attaining 94.86% accuracy, 96.33% sensitivity, 93.02% specificity, 95.42% F1- score, and 98.71% AUC curve. These results demonstrate SVM's robust capabilities for accurately categorizing both ADHD and non-ADHD cases based on EEG data. Figure 10 shows the confusion matrix and ROC curve. Table 4 presents the SVM classification report.
Supplementing Chi-square with PCA for more targeted, variance-focused feature selection further improved outcomes. SVM attained 92.8% accuracy, 94.3% sensitivity, 91.0% specificity, 93.6% F1-score, and 97.8% AUC using the combined PCA Chi-square feature set. This highlights the utility of coupling PCA and Chi-square for honing the most diagnostically relevant features from complex EEG biomarkers. Figure 11 shows the confusion matrix and ROC curves, while Table 5 presents the SVM classification report.
However, SVM remained the top individual classifier, indicating it is presently the most effective standalone machine learning technique for EEG-based ADHD detection when combined with Chi-square and PCA feature selection. These results clearly demonstrate that machine learning, especially SVM architectures, can significantly enhance ADHD diagnostic accuracy compared to conventional methods (see Table 6).
3.2. Results of integrating deep leaning with PCA
This study assessed combined deep leaning, known as CNN, CNN-BiLSTM block, and GRU-Transformer block, for the automated diagnosis of ADHD using neuroimaging biomarkers. Table 7 presents the classification report of integrating deep leaning model with PCA feature selection. The PCA initially reduced feature dimensionality, retaining 95% of the data variance. Utilizing just PCA-derived features, the integrating deep leaning attaining accuracy 95% accuracy, 95% recall, and 94% F1-score. This underscores the aptitude of integrating deep leaning for mining predictive ADHD patterns from neuroimaging data.
Figure 12 shows accuracy of the training and testing of the integrating model with PCA. Throughout the training phase of the model, two significant metrics are often used to assess its performance: Training accuracy and validation accuracy. Moreover, the training and validation losses play a crucial role in assessing the model's learning progress. This investigation evaluated an instance where training accuracy increased from 75% to 99%. Concurrently, the validation accuracy started at 55% and had a moderate rise reaching an estimated value of 95%. Moreover, there was a decrease in the accuracy loss throughout the testing phase, from 1.3 to 0.4.
3.3. Results of integrating deep leaning system with Chi-square
Add Chi-square for enhanced, variance-focused feature selection to increase suggested integrated deep learning models' accuracy, precision, recall, and F1-score to 96%. This highlights the utility of Chi-square to isolate the most diagnostically relevant imaging biomarkers. Table 8 presents the performance of the integrating deep leaning system with Chi-square.
The integrating deep leaning model decisively surpassed the across all diagnostic metrics, indicating greater proficiency for extracting discriminative neuroimaging patterns. Coupling integrating deep leaning system with Chi-square feature selection further bolsters detection accuracy. These results clearly demonstrate the viability of deep neural networks for enhancing ADHD diagnosis when combined with robust feature engineering. Figure 13 displays performance of Chi-square with integrating deep leaning model.
To assess the efficacy of the proposed algorithms, the researchers chose to use the confusion matrix, which is a widely utilized metric for evaluating classification tasks. The deep learning model confusion matrix is shown in Figure 14.
To have a deeper understanding of the classification efficacy of the proposed integrating deep leaning system, it is essential to examine its performance when trained and tested using PCA and Chi-square methodologies. The proposed integrating deep learning system model, when used with Chi-square, accurately identified 1395 participants as control and 1804 participants as ADHD. However, when integrating deep learning with PCA, 77 patients were misclassified. On the other hand, the integrating deep learning model with Chi-square classified 1413 participants as control and 1823 as ADHD, with 58 misclassifications.
4.
Discussions
We presented an integrating deep leaning model for automated EEG-based detection of ADHD, achieving new, and state-of-the-art accuracy through optimized data preprocessing and feature engineering. The developed SVM model attained 94.86% accuracy on the ADHD dataset when combined with PCA for feature selection, as presented in Table 9. This surpassed prior works, including Alim et al. (2023) [49], who reached 93.2% accuracy using a Gaussian SVM model without specialized feature engineering.
The developed integrating deep leaning model, coupled with Chi-square and PCA for enhanced feature screening, achieved comparable state-of-the-art accuracy (95%), outperforming previous deep-learning techniques, as presented in Table 9. This includes the graph neural network proposed by Ekhlasi et al. (2022) [50], which obtained 91.2% and 90% accuracy in theta and delta EEG bands without inputs tailored for ADHD detection.
These gains highlight the efficacy of the data preprocessing pipeline and custom feature selection techniques for extracting the most discriminative biomarkers from complex, high-dimensional EEG data. The two-stage PCA and Chi-square feature screening enabled the SVM and integrating deep leaning model to better capture predictive ADHD patterns for significant performance improvements over conventional approaches. Additionally, proactive balancing of the training data addressed class imbalance, further boosting the model learning of salient ADHD characteristics. The work's specialized data wrangling and feature engineering optimizations were critical to unlocking the full diagnostic potential of machine and deep learning for EEG-based ADHD detection.
Figure 15 displays a ROC curve of the integrating model with feature selection of PCA and Chi-square. The performance of a classification model is better represented by the ROC-AUC measure. Different machine learning and deep learning algorithms may successfully identify and diagnose ADHD, as shown by the ROC-AUC. The classifier model's capacity for diagnosis is gauged by the ROC-AUC score. The CNNs with PCA and Chi-square scored AUC = 99%. Figure 16 presents a comparison of the developed integrating deep leaning model with existing ADHD AI models developed by other researchers [52,53].
5.
Conclusions
We present a machine-learning and deep-learning framework for automated discrimination between ADHD and healthy children using EEG. The framework has many key components, including preprocessing, feature extraction, feature selection, and classification. The developed models achieve cutting-edge accuracy, highlighting the power of optimized machine-learning pipelines to improve ADHD diagnosis in comparison to traditional approaches. In the ADHD-EEG dataset, the SVM approach combined with PCA feature selection achieved 94.86% accuracy, significantly outperforming earlier machine-learning models. In addition, the proposed integrated deep leaning model achieves a high accuracy of 95% when combined with Chi-square.
These findings emphasize the significance of optimization feature and data wrangling to extract the most diagnostically useful biomarkers from complex EEG data. Strong discrimination between ADHD and typical neurological patterns is made possible by specialized preprocessing, class balancing, and Chi-square with PCA-based feature selection.
This work highlights the potential of EEG and machine learning as valuable tools for aiding clinical ADHD evaluations. The optimized models presented in this study have the potential to provide reliable supplementary support for diagnosis, especially in challenging cases. Further validation across diverse patient cohorts would be beneficial.
Ultimately, the utilization of these models could enable earlier and more targeted to enhance the performance of individuals with ADHD. The framework developed in this study lays the foundation for future translational initiatives aiming to maximize the diagnostic utility of machine learning and neurophysiological data.
Acknowledgments
The authors extend their appreciation to the King Salman Center for Disability Research for funding this work through Research Group no KSRG-2023-500.
Data availability statement
Available online: https://ieee-dataport.org/open-access/eeg-data-adhd-control-children (accessed on 25 August 2023)
Conflicts of interest
The authors declare no conflicts of interest.