Hybrid optimized artificial neural network using Latin hypercube sampling and Bayesian optimization for detection, classification and location of faults in transmission lines

Abdul Yussif Seidu; Elvis Twumasi; Emmanuel Assuming Frimpong; Abdul Yussif Seidu; Elvis Twumasi; Emmanuel Assuming Frimpong

doi:10.3934/electreng.2024024

AIMS Electronics and Electrical Engineering

2024, Volume 8, Issue 4: 508-541. doi: 10.3934/electreng.2024024

Previous Article Next Article

Research article

Hybrid optimized artificial neural network using Latin hypercube sampling and Bayesian optimization for detection, classification and location of faults in transmission lines

Department of Electrical and Electronic Engineering, College of Engineering, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana

Received: 03 August 2024 Revised: 19 September 2024 Accepted: 02 November 2024 Published: 29 November 2024

This paper introduces a novel hybrid approach that integrates Latin hypercube sampling (LHS) and Bayesian optimization for optimizing artificial neural networks (ANNs) in fault detection, classification, and location for transmission lines. The proposed method advances the accuracy and efficiency of fault diagnosis in power systems, representing a significant step forward compared to conventional approaches. The test system is a 400 kV, 50 Hz, 300 km transmission system, and the simulations were carried out in MATLAB/Simulink environment. Using the strategic insight of LHS, optimal initial points were determined, which subsequently formed the basis for the Bayesian optimization to further refine the learning rate and training epochs. Within the fault detection domain, the model showcased remarkable precision when deployed on an evaluation dataset of 168 cases, accurately detecting every instance of normal and faulty scenarios. This culminated in an astounding 100% accuracy in fault detection. In terms of fault classification, the ANN model, trained on a dataset of 495 instances, revealed perfect regression coefficients across the training, testing, and validation subsets. When tested against unseen data, it demonstrated its proficiency by correctly classifying 154 out of 154 cases, showcasing a 100% F1 score. Also, the accuracy figures in terms of fault location fluctuated between 99.826% and 99.999%, with a mean absolute percentage error (MAPE) of 0.053%. The model's mean square error (MSE) stood at 0.0083, while the mean absolute error (MAE) was calculated at 0.0717. A deep dive into diverse fault types reaffirmed the model's precision, underscoring its consistent performance across various fault scenarios. The trained models were deployed on three different transmission lines and the models exhibited remarkable precision in all the cases tested. In summary, the innovative hybrid optimized ANN model, weaving together the strengths of LHS and Bayesian optimization, signifies an advancement in the field of power system fault analysis, ushering in heightened efficiency and reliability.

Keywords:

Citation: Abdul Yussif Seidu, Elvis Twumasi, Emmanuel Assuming Frimpong. Hybrid optimized artificial neural network using Latin hypercube sampling and Bayesian optimization for detection, classification and location of faults in transmission lines[J]. AIMS Electronics and Electrical Engineering, 2024, 8(4): 508-541. doi: 10.3934/electreng.2024024

Related Papers:

[1]	Jing Zhang, Haoliang Zhang, Ding Lang, Yuguang Xu, Hong-an Li, Xuewen Li . Research on rainy day traffic sign recognition algorithm based on PMRNet. Mathematical Biosciences and Engineering, 2023, 20(7): 12240-12262. doi: 10.3934/mbe.2023545
[2]	Sung Woong Cho, Sunwoo Hwang, Hyung Ju Hwang . The monotone traveling wave solution of a bistable three-species competition system via unconstrained neural networks. Mathematical Biosciences and Engineering, 2023, 20(4): 7154-7170. doi: 10.3934/mbe.2023309
[3]	Shuai Cao, Biao Song . Visual attentional-driven deep learning method for flower recognition. Mathematical Biosciences and Engineering, 2021, 18(3): 1981-1991. doi: 10.3934/mbe.2021103
[4]	Weibin Jiang, Xuelin Ye, Ruiqi Chen, Feng Su, Mengru Lin, Yuhanxiao Ma, Yanxiang Zhu, Shizhen Huang . Wearable on-device deep learning system for hand gesture recognition based on FPGA accelerator. Mathematical Biosciences and Engineering, 2021, 18(1): 132-153. doi: 10.3934/mbe.2021007
[5]	Boyang Wang, Wenyu Zhang . ACRnet: Adaptive Cross-transfer Residual neural network for chest X-ray images discrimination of the cardiothoracic diseases. Mathematical Biosciences and Engineering, 2022, 19(7): 6841-6859. doi: 10.3934/mbe.2022322
[6]	Xiao Ma, Xuemei Luo . Finger vein recognition method based on ant colony optimization and improved EfficientNetV2. Mathematical Biosciences and Engineering, 2023, 20(6): 11081-11100. doi: 10.3934/mbe.2023490
[7]	Qingwei Wang, Xiaolong Zhang, Xiaofeng Li . Facial feature point recognition method for human motion image using GNN. Mathematical Biosciences and Engineering, 2022, 19(4): 3803-3819. doi: 10.3934/mbe.2022175
[8]	Ting Yao, Farong Gao, Qizhong Zhang, Yuliang Ma . Multi-feature gait recognition with DNN based on sEMG signals. Mathematical Biosciences and Engineering, 2021, 18(4): 3521-3542. doi: 10.3934/mbe.2021177
[9]	Jing Wang, Jiaohua Qin, Xuyu Xiang, Yun Tan, Nan Pan . CAPTCHA recognition based on deep convolutional neural network. Mathematical Biosciences and Engineering, 2019, 16(5): 5851-5861. doi: 10.3934/mbe.2019292
[10]	Jinhua Zeng, Xiulian Qiu, Shaopei Shi . Image processing effects on the deep face recognition system. Mathematical Biosciences and Engineering, 2021, 18(2): 1187-1200. doi: 10.3934/mbe.2021064

Abstract

1. Introduction

Traffic signs give drivers instructions and information that are indispensable in protecting human life and property. Therefore, automatic traffic sign recognition is an important computer vision task ^[1,2,3]. In recent years, driver assistance systems (DAS) ^[4,5,6,7] have developed rapidly. Traffic sign recognition is an integrated function of DAS. Normally, professional equipment is mounted on the top of a vehicle to capture traffic sign images. Under real-world conditions, these images are distorted due to various natural and human factors including vehicle speed, weather, destroyed signs, angles, etc. Hence, we apply data augmentation to simulate all kinds of situations. These enhancement techniques refer to rotations, crop, scale, lighting variations and weather conditions changes, which may eventually decrease the traffic sign recognition network performance.

As for the field of traffic sign recognition, it is a famous problem of computer vision tasks. A lot of literature ^[8,9,10] has studied the topic of traffic sign detection, and some reviews are available in ^[11,12]. In ^[13], an algorithm based on Gaussian kernel support vector machines (SVMs) ^[14] is utilized for traffic sign classification. From experimental results, the proposed algorithm is robust under various conditions including translation, rotation and scale. Ayoub ^[15] presented a random forest classifier and gave satisfactory results on the Swedish Traffic Signs Dataset. Lu ^[16] proposed a graph embedding approach that preserved the sparse representation property by using L2, 1-norm. Experiments demonstrate that the proposed approach outperformed previous traffic sign recognition approaches.

In this paper, we introduce the interference module inspired by ^[17], which dynamically aggregates information by improving the representation way of information according to different semantic contents of an image. In quantum mechanics, a wave function containing both amplitude and phase ^[18] represents an entity. We describe each entity that is generated by the convolutional filters as a wave to realize the information aggregation procedure dynamically. The amplitude is the real-value feature representing the content of each entity, while the phase term is represented as a complex value. These wave-like entities intervene with each other and close phases tend to enhance each other. The whole framework is constructed by stacking the interference module and channel-mixing multi-layer perceptrons (MLPs). Figure 1 summarizes the architecture. We further conduct ablation experiments and analyze the performance of the proposed model. The final experimental results demonstrate the visible priority of the existing architectures (shown in Table 2).

Figure 1. The structure chart of our proposed model. k is the number of the block. BN refers to batch normalization.

DownLoad: Full-Size Img PowerPoint

The contributions of our work are as follows: we propose a novel method (WiNet) for traffic sign recognition. A new convolution structure is introduced for learning multi-scale features and it is successfully combined with a wave function. Additionally, our model achieves the most performance compared with several concurrent works based on ResMLP ^[19], ResNet50, PVT ^[20] and ViT ^[21]. We also test the robustness of all models on datasets with a sufficient number of synthetic samples. Furthermore, we analyze the effects of type of representation on the overall wave interference network that dynamically aggregates information by improving the representation of them according to different semantic contents of an image.

The rest of the paper is organized as follows: Section 2 reviews related methods applied to traffic sign recognition. Section 3 introduces the formulation and architecture of the proposed model. We present experiments and implementation details in Section 4, and further analyze the effectiveness of different modules and activation functions in Section 5. Finally, we draw conclusions in Section 6.

2. Related work

In recent decades, many methods have been proposed for traffic sign recognition. We briefly review related literature ^[22,23]. It can be divided into traditional methods, methods based convolutional neural networks (CNNs) and methods with attention mechanisms.

Traditional methods for traffic sign recognition: Before CNNs, traffic sign recognition depended on hand-crafted features. In ^[15], the comparison of different combinations of four features was conducted, including the histogram of oriented gradients (HOG), Gabor, local binary pattern (LBP) and local self-similarity (LSS). The authors tested the proposed method on the Swedish Traffic Signs Data set. Machine learning methods have also been utilized to solve related problems, such as random forests, logistic regression ^[24] and SVM ^[25].

CNN for traffic sign recognition: With the rapid development of memory and computation, CNN-based architectures have been the mainstream in the computer vision field. Many works have used CNNs for traffic sign classification. The committee of the CNN-based approach ^[26] obtains a high recognition rate of 99.15%, which is above the human recognition rate of 98.98%. In the GTSRB competition in 2011, multi-scale CNNs ^[27] made full use of local and global features and established a new record of an error rate of 1.03%. ^[28] proposes a novel deep network for traffic sign classification that achieves outstanding performance on GTSRB. The author utilized spatial transformer layers ^[29] and a modified version of the inception module ^[30] to build the model. The well-designed inception module allows the network to classify intraclass samples precisely. The spatial transformer layer improves the robustness of the network to deformations such as translation, rotation and scaling of input images.

Attention mechanism for traffic sign recognition: The attention mechanism can adapt to select important information from the input feature. For the outstanding performance, attention mechanism and variants have been applied to a variety of tasks. These improvements include channel attention ^[31], spatial attention ^[32] and global attention ^[33]. This paper ^[34] proposed an attention-based convolutional pooling neural network (ACPNN). Convolutional pooling replaces max pooling. The ACPNN was validated on the German Traffic Sign Recognition Benchmark (GTSRB). Experiments showed it was robust against external noises and increased recognition accuracy. Based on the ice environment traffic sign recognition benchmark (ITSRB), the author proposed an attention network based on high-resolution traffic sign classification (PFANet) ^[35] and reached 93.57% accuracy. At the same time, its performance on the GTSRB was as good as the newest and most effective networks.

3. Methods

In this section, we describe our proposed network for traffic-sign recognition in detail. First, we describe the interference module. Then, we describe the whole structure of our proposed network. At last, we describe our data augmentation techniques.

3.1. Interference module

Figure 2. The structure of the interference module. In the representing and aggregating structure, a

$3\times 3$ depthwise convolution is used to aggregate information.

DownLoad: Full-Size Img PowerPoint

In recent years, many neural network modules based on CNN have been proposed such as ^[36,37]. In the interference module, the extraction structure of features based on convolution adopts residual-like connections within a single residual block ^[38]. After the input is normalized, it is split into two subsets which have the same spatial size and number of channels, denoted by ${X}_{1}$ and ${X}_{2}$ , respectively. Each ${X}_{i}$ corresponds to a $3\times 3$ convolution, denoted by ${K}_{i}\left(\right)$ , where $\mathrm{i}\in \{1, \mathrm{ }2\}$ . ${\mathrm{X}}_{2}$ is added with the output of ${K}_{1}\left(\right)$ and then fed into ${K}_{2}\left(\right)$ . At last, a $1\times 1$ convolution makes sure the channel number is the same as that of the following module. The output result can be calculated using the following formula:

${E}_{i} = {Conv}_{1\times 1}{(K}_{2}\left({K}_{1}\right({X}_{1})+{X}_{2})), \quad\quad i = \mathrm{1, 2}, ..., n.$

(1)

We name the output ${\mathrm{E}}_{\mathrm{i}}$ . The final output ${\mathrm{E}}_{\mathrm{i}}$ can capture feature information from a larger receptive field. Different channels often contain the complete content of different objects in deep neural networks ^[39]. By using a $1\times 1$ convolution, different channels are assigned to corresponding weights. It is worth noting that the feature extraction structure not only achieves adaptability in the spatial dimension ^[36] but also adaptability in the channel ^[36] dimension.

The output ${\mathrm{E}}_{\mathrm{i}}$ is the feature map extracted by the convolutional filters. To describe the feature map concretely, we refer to each feature as an entity. Next, each entity will be converted into a wave. After all, waves have aggregated each other, the new feature map will be generated and fed into the next stage.

An entity is represented as a wave ${B}_{j}$ with both amplitude and phase information. A wave can be formulated by

${B}_{j} = \left|{A}_{j}\right|⨂{e}^{i{\theta }_{j}}, j = \mathrm{1, 2}, ..., n, \left(2\right)$

where ${B}_{j}$ returns the j-th wave, $i$ is the imaginary unit and ${i}^{2} = -1$ . The |.| denotes the absolute value operation. $\otimes$ means element-wise multiplication. The amplitude $\left|{\mathrm{A}}_{\mathrm{j}}\right|$ represents the information of each entity. A periodic function of the ${e}^{i{\theta }_{j}}$ makes its values distribute over a fixed range. ${\theta }_{j}$ is the phase and points out the current location of an entity in a period.

In Eq (2), a wave is represented a complex-value. To embed it in the module, we expand it with Euler's formula. It can be written as

${B}_{j} = \left|{A}_{j}\right|⨂\mathrm{cos}{\theta }_{j}+i\left|{A}_{j}\right|\otimes \mathrm{sin}{\theta }_{j}, j = \mathrm{1, 2}, ..., n. \left(3\right)$

To get the above equation, we are required to design the form of information expression about both amplitude and phase. The amplitude $\left|{\mathrm{A}}_{\mathrm{j}}\right|$ is a real-value feature in the formula. In fact, the absolute operation in the formula is not implemented. A feature map ${x}_{j} = {R}^{N\times \mathrm{C}}$ is taken as the input, and we use a plain channel_FC operation ^[17] to estimate the amplitude. Specifically, a Tanh activation is adopted to gain the nonlinearity ability. Compared with Relu activation, we found Tanh activation achieves significantly better performance. As can be seen from Figure 3, a wave with a phase has a direction. The value range of Tanh activation is -1–1. Positive and negative numbers can be used to represent different directions. So Tanh activation can help models achieve better performance. Comparisons of Tanh activation with Relu activation are shown in Table 6.

Figure 3. The superposition of two waves with different phases. The dashed lines describe waves with different initial phases. The solid lines describe the superposition results of two waves.

DownLoad: Full-Size Img PowerPoint

In the above paragraphs, we presented the relevant mathematical expressions. Next, we introduce a concrete evaluation expression about the phase and the result of two waves superimposed.

${\mathrm{C}\mathrm{h}\mathrm{a}\mathrm{n}\mathrm{n}\mathrm{e}\mathrm{l}}_{\mathrm{F}\mathrm{C}\left({\mathrm{x}}_{\mathrm{j}}, {\mathrm{W}}^{\mathrm{c}}\right)} = {\mathrm{W}}^{\mathrm{c}}{\mathrm{x}}_{\mathrm{j}}, \mathrm{ }\mathrm{j} = 1, \mathrm{ }2, ..., \mathrm{n}\mathrm{ }, \mathrm{ }\mathrm{ }\mathrm{ }\mathrm{ }\mathrm{ }$

(4)

where the learnable parameter ${W}^{c}$ is the weight. The phase plays an important role in the whole module. It points out the current location of an entity in a period. We take the simplest method to estimate the phase whose parameters are represented with the output value from Tanh activation.

To dynamically adjust the relationship between different entities with fixed parameters, we take the token_FC operation ^[17] to aggregate information. The token-FC is formulated as:

$\mathrm{T}\mathrm{o}\mathrm{k}\mathrm{e}\mathrm{n}\_\mathrm{F}\mathrm{C}({\mathrm{x}}_{\mathrm{j}}, {\mathrm{W}}^{\mathrm{c}}) = \sum\limits_{\mathrm{K}}{\mathrm{W}}_{\mathrm{j}\mathrm{k}}^{\mathrm{t}}{\mathrm{x}}_{\mathrm{k}}, \mathrm{ }\mathrm{j} = \mathrm{1, 2}, ..., \mathrm{n}\mathrm{ }, \mathrm{ }\mathrm{ }\mathrm{ }\mathrm{ }\mathrm{ }\mathrm{ }\mathrm{ }$

(5)

where the learnable parameter ${W}^{t}$ is the weight. A feature map ${x}_{k}\in {R}^{N\times C}$ is taken as the input. Here, j means the j-th entity. Finally, the real-value output ${O}_{j}$ can be written as:

${O}_{j} = \sum\limits_{k}{\mathrm{W}}_{\mathrm{j}\mathrm{k}}^{\mathrm{t}}{\mathrm{x}}_{\mathrm{k}}\mathrm{cos}{\mathtt{θ}}_{\mathrm{k}}+{\mathrm{W}}_{\mathrm{j}\mathrm{k}}^{\mathrm{i}}{\mathrm{x}}_{\mathrm{k}}\mathrm{sin}{\mathtt{θ}}_{\mathrm{k}},\; \mathrm{j} = \mathrm{1, 2}, ..., n,$

(6)

where the learnable parameter W^t and ${W}^{i}$ are the weights. As can be seen from , when two waves have a similar phase, the output ${O}_{j}$ tends to be enhanced. The same semantic content of the input feature map can be extracted in Figure 7 of Section 4.

In the channel dimension, the $1\times 1$ convolution is conducive to fusing information among different channels. To enhance the information fusion ability in the spatial dimension, we use MLPs to exchange information among different entities. Before MLPs, we also apply batch normalization. We adopt residual learning to fusion information from MLPs. As Table 6 illustrates, MLPs significantly improve the performance compared with the network structure not containing MLPs.

3.2. Wave interference network

In terms of the network architecture, our model is a simple hierarchical structure with 4 stages. Each stage will decrease the output spatial resolution, i.e., $\mathrm{H}/2\times \mathrm{W}/2$ , $\mathrm{H}/4\times \mathrm{W}/4$ , $\mathrm{H}/8\times \mathrm{W}/8$ and $\mathrm{H}/16\times \mathrm{W}/16$ . Here, H and W represent the height and width of the input image. The number of output channels is increasing with the decrease of resolution. The detailed configuration can be seen in Table 1.

Table 1. The detailed setting of WiNet.

Stage	Output size	Blocks
1	$\mathrm{H}/2\times \mathrm{W}/2\times 64$	2
2	$\mathrm{H}/4\times \mathrm{W}/4\times 128$	3
3	$\mathrm{H}/8\times \mathrm{W}/8\times 256$	4
4	$\mathrm{H}/16\times \mathrm{W}/16\times 512$	2

| Show Table

DownLoad: CSV

At the beginning of each stage, we downsample the input and control the downsampling rate by using the stride number. In the first stage, we use a $6\times 6$ convolution with stride 2 to embed an input image with the shape $\mathrm{H}\times \mathrm{W}\times 3$ . In the following three stages, there is a $3\times 3$ convolution with stride 2 to downsample the input data. Note that all other layers in a stage keep the same output size. We use a global average pooling operation for the feature map from the last stage and then adopt a linear classifier to predict the logits.

3.3. Datasets and data augmentation

In general, it is very difficult to estimate which model gives better performance. Many authors evaluate their models on different datasets. To enrich the set of traffic signs, some authors sample images from multiple datasets to perform the evaluation ^[40,41,42]. On the other hand, lots of authors use their own private datasets ^[43,44] instead of public datasets ^[45,46]. In order to be fair, we adopt the ratio of the training set and test set in the CTSRD database that ensures different models are implemented at the same benchmark. We carried out several data augmentation techniques to extend the training set for addressing the challenges from various realistic scenarios.

To demonstrate the generalization of our model, we train our model on another well-known benchmark. The German Traffic Sign Recognition Benchmark (GTSRB) ^[47] consists of more than 50,000 images in which the classes of traffic signs are more than 40, but we consider only images with a size of at least 30 pixels. All images are resized to $64\times 64$ resolution. The remaining images are ignored on account of the low human visual identification rate. Many models ^{[48,49,50,51]} have obtained good results on this dataset, so the comparison between different models is more convincing.

Data augmentation has shown its validity in deep networks. It effectively expands the size of the training set, which is an important factor to consider when training models. A sufficient amount of training data contributes to modulating millions of learnable parameters during the training phase. In real-world situations, traffic signs may be distorted in shape because of human factors. Traffic sign images may contain various appearance distortions such as brightness and contrast. In order to simulate various physical changes in real-world scenarios, we apply image processing techniques to each image randomly and expand the original training dataset as large as five times. Every image is only augmented by one of the available methods in processing, which makes sure that each augmentation operation has the same probability of being implemented. Some samples of original and sample results from these techniques are presented in Figure 4. There are many imaging processing libraries. We apply Albumentations, except for rain, to complete all options. To recapitulate briefly, we perform four classes of augmentations in the data preprocessing stage:

Figure 4. Several synthetic examples of traffic-sign instances.

DownLoad: Full-Size Img PowerPoint

● Random weather. We apply data augmentation to achieve weather effects such as sunny, foggy, rainy and snowy. Synthetic samples keep the size and aspect ratio of input images.

● Random blur. We apply data augmentation to blurred original images. Methods of data augmentation include Blur, MotionBlur, GaussianBlur and MedianBlur in Albumentations.

● Random affine. Geometric transformations for data augmentation are common and effective methods. PiecewiseAffine, Affine and RandomResizedCrop in Albumentations are selected to generate synthetic samples.

● Gause noise. Gauss Noise in Albumentations generates a matrix of random values. Moreno-Barea ^[52] found adding noise to images can make models more robust on nine datasets from the UCI repository ^[53].

4. Experiments

In this section, we conduct a few analytical experiments to explore our proposed model. To make the comparison fair, we carried out all experiments under the same settings. We report our results on CTSRD, which achieves significant improvements compared with other state-of-the-art models. In addition, we verify the effectiveness of Tanh activation and MLPs. Result visualizations further show the effectiveness of our model.

4.1. Implementation details

We implemented our experiments on a PC with an Intel i5-11400H, an NVIDIA GeForce RTX 3050 GPU, and 16 GB RAM. All models are carried out with TensorFlow. During training, we ensure all models use the same parameters for fairness. We train all models using Adam with the same initial learning rate of 10−4. Due to the limitation of the video card, the mini-batch size is set to 64. After considering the resolution of input images, the image is resized to 64 pixels in both width and height. To fuse spatial dimension information, we set the window size to 3 empirically in the interference module. All models are trained for 60 epochs on the original dataset. We train again all models for 30 epochs when using data augmentation and greatly increasing the size of the original dataset.

Table 2 presents the comparison of WiNet with other models, including ResMLP, ResNet50, PVT and ViT. WiNet achieves the best accuracy, which outperforms ResMLP, ResNet50, PVT and ViT by 5.8%, 2.4%, 23.5% and 11%, respectively. This clearly demonstrates that our model has a strong ability to extract features. In comparison to ResNet50 with fewer parameters, our WiNet not only has larger throughputs but also performs better. In particular, WiNet achieves 100% accuracy when images using data augmentation were added to the training set.

Table 2. Comparisons with some other models on CTSRD.

Model	Params	Throughput (image/s)	Original dataset	Original dataset including synthetic examples
Model	Params	Throughput (image/s)	Top-1 Accuracy (%)	Top-1 Accuracy (%)
ResMLP	14.52 M	13	94.0	97.2
ResNet50	23.7 M	19	97.4	99.7
PVT	12.3 M	13	76.3	89.5
ViT	85.8 M	11	88.8	95.7
ours	26.1 M	14	99.8	100

| Show Table

DownLoad: CSV

We give quality histograms in Figure 5, while corresponding statistics (median, mean and standard deviation of all models) are provided in Table 3. It can be seen that the distribution of predicted probability for our model is closer to 1 than for other models trained on CTSRD simultaneously. Statistics further verify the results that WiNet has a higher median value, mean value and a smaller discrete degree.

Figure 5. Quality histogram about the predicted probability of all models on CTSRD.

DownLoad: Full-Size Img PowerPoint

Table 3. Quality histogram statistics of all models on CTSRD.

Model	Median	Mean	Std. Dev.
ResMLP	0.99	0.96	0.109
ResNet50	1.00	0.99	0.047
PVT	0.95	0.85	0.184
ViT	0.99	0.94	0.121
ours	1.00	0.99	0.034

| Show Table

DownLoad: CSV

In order to further analyze the model's performance, we utilize data augmentation including weather, Gauss, blur and affine on the testing set, and then make predictions. Data augmentation has made the distribution of the testing set different from the training set, so it can be used to test the model's robustness. Table 4 lists the predicted results of the model on the testing set. In the above case, our model shows good performance. We use models, which are trained on the training set with synthetic samples, to predict this testing set. As can be seen from Table 5, the performance of WiNet surpasses ResNet50 and has a sharp boost with +2.71, +3.52, +2.21 and +2.76 points for weather, Gauss, blur and affine, respectively. It indicates that our model's effectiveness can be fully exploited on a larger dataset.

Table 4. Results of testing robustness on CTSRD.

Model	Weather	Gauss	Blur	Affine
ResMLP	44.08%	53.01%	51.86%	54.71%
ResNet50	70.26%	65.05%	64.09%	65.75%
PVT	35.66%	32.25%	32.80%	37.46%
ViT	48.14%	47.34%	47.09%	48.35%
ours	73.37%	61.48%	60.48%	62.64%

| Show Table

DownLoad: CSV

Table 5. Results of the model on testing set and training set with synthetic samples.

Model	Weather	Gauss	Blur	Affie
ResNet50	91.22%	91.57%	90.97%	91.27%
ours	93.93%	95.09%	93.18%	94.03%

| Show Table

DownLoad: CSV

Except for considering the accuracy of our model, it is crucial to test the generalization of our model by training on different datasets. We divided the dataset from GTSRB into training and validating datasets according to the original proportion. We trained the model with 30 epochs, batch size 64, learning rate 0.0001 and Adam optimizer. The solid curves show the accuracy change, while the dashed curves show the loss change. Here, we can see from Figure 6 that the highest accuracies of both training (21,792 images) and validating datasets (6893 images) are 1.00. The overall trends of accuracies for both training and validating datasets are increasing with epochs. After around 16 epochs, only slight fluctuations can be observed. We observed similar but opposite trends in the loss-changing curves. The losses of both training and testing datasets are very small. In these curves, we can see our proposed model has the advantages of minor loss and high accuracy. This result indicates that our model keeps a good generalization ability rather than only fitting CTSRD.

Figure 6. Loss and accuracy curves of the training and validating processes. The blue and red line on the chart above represents the training and validating processes, respectively. The annotation indicates the best result and corresponding epoch in the training and validating processes.

DownLoad: Full-Size Img PowerPoint

4.2. Ablation studies and visualization

In this section, we ablate some design components to explore their effectiveness. For a more intuitive feeling, we draw some feature maps with heatmaps in the aggregating process.

To evaluate the performance of design components, we take WiNet with Tanh activation and MLPs as the baseline model. WiNet-Relu is built by replacing Tanh activation with Relu activation on the baseline model. WiNet-noMLP is equal to the baseline model removing MLPs. Ablation results are reported in Table 6. It is obvious that two components play an important part in learning effective parameters. Tanh activation improves the accuracy of WiNet by about 1.2% compared with WiNet-Relu. Without MLPs, the model's performance is down by at least 0.4% (99.8% vs 99.4%).

Table 6. Ablation study with different fusions.

Model	WiNet-Relu	WiNet-noMLP	WiNet
Top-1 Accuracy	98.5%	99.4%	99.8%

| Show Table

DownLoad: CSV

Diagrams are an important tool to help people intuitively understand the world. In Section 3.1 (Figure 3 and Eq (6)), we analyze the superposition of two waves with different phases. In order to have a better understanding of the effects of the representation type, we take the visualized feature maps of a traffic sign as an example. From Figure 7, we can clearly see that the visualized feature maps of the first stage with the similar contents are aggregated together. Similar parts in the picture have a closer phase reformulated. Similar parts gradually stick out in the aggregating process, so we get a strong stereo effect from the picture. As the number of network layers deepens, the model extracts more abstract features.

Figure 7. The visualized feature maps of a traffic sign.

DownLoad: Full-Size Img PowerPoint

5. Conclusions

We propose a novel deep learning architecture for traffic sign recognition, whose mechanism for aggregating information is different from the existing transformer, CNN architectures or MLP architectures. In the proposed approach, we combine a new CNN-based module with a wave function successfully. Firstly, we get multi-scale feature representations by the CNN-based module. Then, we utilize channel-FC operations to estimate the amplitude and phase information. Amplitude and phase are key parameters to dynamically modulate relationship entities with similar contents. Extensive experimental evaluations are performed according to different strategies to explore the superiority of the proposed architecture or understand how that works. We will explore further how to use the information representation in different fields or directly use the information representation to preprocess raw data. We also hope our work can encourage people to get new ideas from physical phenomenon.

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

This work is jointly supported by the National Natural Science Foundation of China under Grant 61976055, the special fund for education and scientific research of Fujian Provincial Department of Finance under Grant GY-Z21001 and Scientific Research Foundation of Fujian University of Technology under Grant GY-Z22071.

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

[1]	Ogar, VN, Hussain S, Gamage KA (2023) The use of artificial neural network for low latency of fault detection and localisation in transmission line. Heliyon 9. https://doi.org/10.1016/j.heliyon.2023.e13376.
[2]	Goni MO, Nahiduzzaman M, Anower MS, Rahman MM, Islam MR, Ahsan M, et al. (2023) Fast and Accurate Fault Detection and Classification in Transmission Lines using Extreme Learning Machine. e-Prime - Advances in Electrical Engineering, Electronics and Energy 3: 100107. https://doi.org/10.1016/j.prime.2023.100107 doi: 10.1016/j.prime.2023.100107
[3]	Naji HA, Fayadh RA, Mutlag AH (2023) ANN-based Fault Location in 11 kV Power Distribution Line using MATLAB. 2023 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology, JEEIT, 134–139. https://doi.org/10.1109/JEEIT58638.2023.10185849.
[4]	Venkata P, Pandya V, Vala K, Sant AV (2022) Support vector machine for fast fault detection and classification in modern power systems using quarter cycle data. Energy Reports 8: 92–98. https://doi.org/10.1016/j.egyr.2022.10.279 doi: 10.1016/j.egyr.2022.10.279
[5]	Ahmadipour M, Othman MM, Bo R, Salam Z, Ridha HM, Hasan K (2022) A novel microgrid fault detection and classification method using maximal overlap discrete wavelet packet transform and an augmented Lagrangian particle swarm optimization-support vector machine. Energy Reports 8: 4854–4870. https://doi.org/10.1016/j.egyr.2022.03.174 doi: 10.1016/j.egyr.2022.03.174
[6]	Aiswarya R, Nair DS, Rajeev T, Vinod V (2023) A novel SVM based adaptive scheme for accurate fault identification in microgrid. Electr Pow Syst Res 221: 109439. https://doi.org/10.1016/j.epsr.2023.109439 doi: 10.1016/j.epsr.2023.109439
[7]	De Santis E, Rizzi A, Sadeghian A (2018) A cluster-based dissimilarity learning approach for localized fault classification in Smart Grids. Swarm Evol Comput 39: 267–278. https://doi.org/10.1016/j.swevo.2017.10.007 doi: 10.1016/j.swevo.2017.10.007
[8]	Bhuyan A, Panigrahi BK, Pal K, Pati S (2022) Convolutional Neural Network Based Fault Detection for Transmission Line. 2022 International Conference on Intelligent Controller and Computing for Smart Power, ICICCS, 1-4. https://doi.org/10.1109/ICICCSP53532.2022.9862446.
[9]	Omar AMS, Osman MK, Ibrahim MN, Hussain Z, Abidin AF (2020) Fault classification on transmission line using LSTM network. Indonesian Journal of Electrical Engineering and Computer Science 20: 231–238. https://doi.org/10.11591/ijeecs.v20.i1.pp231-238 doi: 10.11591/ijeecs.v20.i1.pp231-238
[10]	Biswas S, Nayak PK, Panigrahi BK, Pradhan G (2023) An intelligent fault detection and classification technique based on variational mode decomposition-CNN for transmission lines installed with UPFC and wind farm. Electr Pow Syst Res 223: 109526. https://doi.org/10.1016/j.epsr.2023.109526 doi: 10.1016/j.epsr.2023.109526
[11]	Mirshekali H, Keshavarz A, Dashti R, Hafezi S, Shaker HR (2023) Deep learning-based fault location framework in power distribution grids employing convolutional neural network based on capsule network. Electr Pow Syst Res 223: 109529. https://doi.org/10.1016/j.epsr.2023.109529 doi: 10.1016/j.epsr.2023.109529
[12]	Tabari M, Sadeh J (2022) Fault location in series-compensated transmission lines using adaptive network-based fuzzy inference system. Electr Pow Syst Res 208: 107800. https://doi.org/10.1016/j.epsr.2022.107800 doi: 10.1016/j.epsr.2022.107800
[13]	Naidu K, Ali MS, Abu Bakar AH, Tan CK, Arof H, Mokhlis H (2020) Optimized artificial neural network to improve the accuracy of estimated fault impedances and distances for underground distribution system. PLoS One 15: e0227494. https://doi.org/10.1371/journal.pone.0227494.
[14]	Iliyaeifar MM, Hadaeghi A (2023) Extreme learning machine-based fault location approach for terminal-hybrid LCC-VSC-HVDC transmission lines. Electr Pow Syst Res 221: 109487. https://doi.org/10.1016/j.epsr.2023.109487 doi: 10.1016/j.epsr.2023.109487
[15]	Kanwal S, Jiriwibhakorn S (2023) Artificial Intelligence Based Faults Identification, Classification, and Localization Techniques in Transmission Lines-A Review. IEEE Latin America Transactions 21: 1291-1305. https://doi.org/10.1109/TLA.2023.10305233 doi: 10.1109/TLA.2023.10305233
[16]	Yadav A, Dash Y (2014) An Overview of Transmission Line Protection by Artificial Neural Network: Fault Detection, Fault Classification, Fault Location, and Fault Direction Discrimination. Advances in Artificial Neural Systems 2014: 230382. https://doi.org/10.1155/2014/230382 doi: 10.1155/2014/230382
[17]	Kanwal S, Jiriwibhakorn S (2024) Advanced Fault Detection, Classification, and Localization in Transmission Lines: A Comparative Study of ANFIS, Neural Networks, and Hybrid Methods. IEEE Access 12: 49017–49033. https://doi.org/10.1109/ACCESS.2024.3384761 doi: 10.1109/ACCESS.2024.3384761
[18]	Ravesh NR, Ramezani N, Ahmadi I, Nouri H (2022) A hybrid artificial neural network and wavelet packet transform approach for fault location in hybrid transmission lines. Electr Pow Syst Res 204: 107721. https://doi.org/10.1016/j.epsr.2021.107721 doi: 10.1016/j.epsr.2021.107721
[19]	Hannan MA, How DN, Lipu MH, Ker PJ, Dong ZY, Mansur M, et al. (2020) SOC Estimation of Li-ion Batteries with Learning Rate-Optimized Deep Fully Convolutional Network. IEEE T Power Electr 36: 7349–7353. https://doi.org/10.1109/TPEL.2020.3041876 doi: 10.1109/TPEL.2020.3041876
[20]	Liu S, Ghosh R, Min JT, Motani M (2022) Optimizing Learning Rate Schedules for Iterative Pruning of Deep Neural Networks. arXiv preprint arXiv: 2212.06144.
[21]	Xu C, Liu S, Huang Y, Huang C, Zhang Z (2021) Over-the-air Learning Rate Optimization for Federated Learning. 2021 IEEE International Conference on Communications Workshops, ICC Workshops, 1-7. https://doi.org/10.1109/ICCWorkshops50388.2021.9473663.
[22]	Yasusi K (2016) Optimizing Neural-network Learning Rate by Using a Genetic Algorithm with Per-epoch Mutations. 2016 International Joint Conference on Neural Networks (IJCNN), 1472-1479. IEEE.
[23]	Hsieh HL, Shanechi MM (2018) Optimizing the learning rate for adaptive estimation of neural encoding models. PLoS Comput Biol 14: e1006168. https://doi.org/10.1371/journal.pcbi.1006168.
[24]	Dai Z, Yu H, Low BK, Jaillet P (2019) Bayesian Optimization Meets Bayesian Optimal Stopping. International conference on machine learning, 1496-1506.
[25]	Serizawa T, Fujita H (2020) Optimization of Convolutional Neural Network Using the Linearly Decreasing Weight Particle Swarm Optimization. https://doi.org/10.11517/pjsai.JSAI2022.0_2S4IS2b03.
[26]	Sheikholeslami R, Razavi S (2017) Progressive Latin Hypercube Sampling: An efficient approach for robust sampling-based analysis of environmental models. Environmental Modelling and Software 93: 109–126. https://doi.org/10.1016/j.envsoft.2017.03.010 doi: 10.1016/j.envsoft.2017.03.010
[27]	Shields MD, Zhang J (2016) The generalization of Latin hypercube sampling. Reliab Eng Syst Saf 148: 96–108. https://doi.org/10.1016/j.ress.2015.12.002 doi: 10.1016/j.ress.2015.12.002
[28]	Wu J, Chen XY, Zhang H, Xiong LD, Lei H, Deng SH (2019) Hyperparameter optimization for machine learning models based on Bayesian optimization. Journal of Electronic Science and Technology 17: 26–40. https://doi.org/10.11989/JEST.1674-862X.80904120 doi: 10.11989/JEST.1674-862X.80904120
[29]	Thelen A, Zohair M, Ramamurthy J, Harkaway A, Jiao W, Ojha M, et al. (2023) Sequential Bayesian optimization for accelerating the design of sodium metal battery nucleation layers. J Power Sources 581: 233508. https://doi.org/10.1016/j.jpowsour.2023.233508 doi: 10.1016/j.jpowsour.2023.233508
[30]	Shin S, Lee Y, Kim M, Park J, Lee S, Min K (2020) Deep neural network model with Bayesian hyperparameter optimization for prediction of NOx at transient conditions in a diesel engine. Eng Appl Artif Intell 94: 103761. https://doi.org/10.1016/j.engappai.2020.103761 doi: 10.1016/j.engappai.2020.103761
[31]	Burrage K, Burrage P, Donovan D, Thompson B (2015) Populations of models, experimental designs and coverage of parameter space by Latin Hypercube and Orthogonal sampling. Procedia Computer Science, 1762–1771. https://doi.org/10.1016/j.procs.2015.05.383.
[32]	Shu Z, Jirutitijaroen P (2011) Latin hypercube sampling techniques for power systems reliability analysis with renewable energy sources. IEEE T Power Syst 26: 2066–2073. https://doi.org/10.1109/TPWRS.2011.2113380 doi: 10.1109/TPWRS.2011.2113380
[33]	Deutsch JL, Deutsch CV (2012) Latin hypercube sampling with multidimensional uniformity. J Stat Plan Inference 142: 763–772. https://doi.org/10.1016/j.jspi.2011.09.016 doi: 10.1016/j.jspi.2011.09.016
[34]	Nguyen V (2019) Bayesian optimization for accelerating hyper-parameter tuning. Proceedings - IEEE 2nd International Conference on Artificial Intelligence and Knowledge Engineering, AIKE, 302–305. https://doi.org/10.1109/AIKE.2019.00060.
[35]	D. Singh D, Singh B (2022) Feature wise normalization: An effective way of normalizing data. Pattern Recogn 122: 108307. https://doi.org/10.1016/j.patcog.2021.108307.
[36]	Islam MJ, Ahmad S, Haque F, Reaz MB, Bhuiyan MA, Islam MR (2022) Application of Min-Max Normalization on Subject-Invariant EMG Pattern Recognition. IEEE T Instrum Meas 71: 1-12. https://doi.org/10.1109/TIM.2022.3220286 doi: 10.1109/TIM.2022.3220286
[37]	Szabó S, Holb IJ, Abriha-Molnár VE, Szatmári G, Singh SK, Abriha D (2024) Classification Assessment Tool: A program to measure the uncertainty of classification models in terms of class-level metrics. Appl Soft Comput 155: 111468. https://doi.org/10.1016/j.asoc.2024.111468 doi: 10.1016/j.asoc.2024.111468
[38]	Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21: 1-13. https://doi.org/10.1186/s12864-019-6413-7 doi: 10.1186/s12864-019-6413-7
[39]	Borkhade AD (2014) Transmission line fault detection using wavelet transform. International Journal on Recent and Innovation Trends in Computing and Communication 2: 3138-3142.
[40]	Teja ON, Ramakrishna MS, Bhavana GB, Sireesha K (2020) Fault Detection and Classification in Power Transmission Lines using Back Propagation Neural Networks. Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020), 1150–1156.
[41]	Fan R, Liu Y, Huang R, Diao R, Wang S (2018) Precise Fault Location on Transmission Lines Using Ensemble Kalman Filter. IEEE T Power Deliver 33: 3252–3255. https://doi.org/10.1109/TPWRD.2018.2849879 doi: 10.1109/TPWRD.2018.2849879
[42]	Arranz R, Paredes Á, Rodríguez A, Muñoz F (2021) Fault location in Transmission System based on Transient Recovery Voltage using Stockwell transform and Artificial Neural Networks. Electr Pow Syst Res 201: 107569. https://doi.org/10.1016/j.epsr.2021.107569 doi: 10.1016/j.epsr.2021.107569
[43]	Moradzadeh A, Teimourzadeh H, Mohammadi-Ivatloo B, Pourhossein K (2022) Hybrid CNN-LSTM approaches for identification of type and locations of transmission line faults. Int J Electr Power 135: 107563. https://doi.org/10.1016/j.ijepes.2021.107563 doi: 10.1016/j.ijepes.2021.107563
[44]	Shang B, Luo G, Li M, Liu Y, Hei J (2023) Transfer learning-based fault location with small datasets in VSC-HVDC. Int J Electr Power 151: 109131. https://doi.org/10.1016/j.ijepes.2023.109131 doi: 10.1016/j.ijepes.2023.109131
[45]	Pouabe Eboule PS, Pretorius JHC, Mbuli N (2018) Artificial Neural Network Techniques apply for Fault detecting and Locating in Overhead Power Transmission Line. 2018 Australasian Universities Power Engineering Conference (AUPEC), 1-6
[46]	Said A, Saad MH, Eladl SM, Elbarbary ZS, Omar AI, Saad MA (2023) Support Vector Machine Parameters Optimization for 500 kV Long OHTL Fault Diagnosis. IEEE Access 11: 3955–3969. https://doi.org/10.1109/ACCESS.2023.3235592 doi: 10.1109/ACCESS.2023.3235592

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)