Semantic segmentation of substation tools using an improved ICNet network

Guozhong Liu; Qiongping Tang; Changnian Lin; An Xu; Chonglong Lin; Hao Meng; Mengyu Ruan; Wei Jin; Guozhong Liu; Qiongping Tang; Changnian Lin; An Xu; Chonglong Lin; Hao Meng; Mengyu Ruan; Wei Jin

doi:10.3934/era.2024246

Electronic Research Archive

2024, Volume 32, Issue 9: 5321-5340. doi: 10.3934/era.2024246

Previous Article Next Article

Research article Special Issues

Semantic segmentation of substation tools using an improved ICNet network

1.
School of Instrument Science and Opto-Electronics Engineering, Beijing Information Science and Technology University, Beijing 100096, China
2.
Beijing Kedong Electric Control System Co., Ltd., Haidian District, Beijing 100192, China
3.
NARI Group Corporation (State Grid Electric Power Research Institute), Nanjing 211106, China

Academic Editor: Zhenglin Wang

Received: 19 July 2024 Revised: 03 September 2024 Accepted: 11 September 2024 Published: 18 September 2024

In the field of substation operation and maintenance, real-time detection and precise segmentation of tools play an important role in maintaining the safe operation of the power grid and guiding operators to work safely. To improve the accuracy and real-time performance of semantic segmentation of substation operation and maintenance tools, we have proposed an improved, light-weight, real-time, semantic segmentation network based on an efficient image cascade network architecture (ICNet). The network uses multiscale branches and cascaded feature fusion units to extract rich multilevel features. We designed a semantic segmentation and purification module to deal with redundant and conflicting information in multiscale feature fusion. A lightweight backbone network was used in the feature extraction stage at different resolutions, and a recursive gated convolution was used in the upsampling stage to achieve high-order spatial interactions, thereby improving segmentation accuracy. Due to the lack of a substation tool semantic segmentation data set, we constructed one. Training and testing on the data set showed that the proposed model improved the accuracy of tool detection while ensuring real-time performance. Compared with the currently popular semantic segmentation network, it had better performance in real-time and accuracy, and provided a new semantic segmentation method for embedded platforms.

Keywords:

Citation: Guozhong Liu, Qiongping Tang, Changnian Lin, An Xu, Chonglong Lin, Hao Meng, Mengyu Ruan, Wei Jin. Semantic segmentation of substation tools using an improved ICNet network[J]. Electronic Research Archive, 2024, 32(9): 5321-5340. doi: 10.3934/era.2024246

Related Papers:

[1]	Yazhuo Fan, Jianhua Song, Yizhe Lu, Xinrong Fu, Xinying Huang, Lei Yuan . DPUSegDiff: A Dual-Path U-Net Segmentation Diffusion model for medical image segmentation. Electronic Research Archive, 2025, 33(5): 2947-2971. doi: 10.3934/era.2025129
[2]	Zhenhua Wang, Jinlong Yang, Chuansheng Dong, Xi Zhang, Congqin Yi, Jiuhu Sun . SSMM-DS: A semantic segmentation model for mangroves based on Deeplabv3+ with swin transformer. Electronic Research Archive, 2024, 32(10): 5615-5632. doi: 10.3934/era.2024260
[3]	Kaixuan Wang, Shixiong Zhang, Yang Cao, Lu Yang . Weakly supervised anomaly detection based on sparsity prior. Electronic Research Archive, 2024, 32(6): 3728-3741. doi: 10.3934/era.2024169
[4]	Chengyong Yang, Jie Wang, Shiwei Wei, Xiukang Yu . A feature fusion-based attention graph convolutional network for 3D classification and segmentation. Electronic Research Archive, 2023, 31(12): 7365-7384. doi: 10.3934/era.2023373
[5]	Jinjiang Liu, Yuqin Li, Wentao Li, Zhenshuang Li, Yihua Lan . Multiscale lung nodule segmentation based on 3D coordinate attention and edge enhancement. Electronic Research Archive, 2024, 32(5): 3016-3037. doi: 10.3934/era.2024138
[6]	Huixia Liu, Zhihong Qin . Deep quantization network with visual-semantic alignment for zero-shot image retrieval. Electronic Research Archive, 2023, 31(7): 4232-4247. doi: 10.3934/era.2023215
[7]	Rongrong Bi, Liang Guo, Botao Yang, Jinke Wang, Changfa Shi . 2.5D cascaded context-based network for liver and tumor segmentation from CT images. Electronic Research Archive, 2023, 31(8): 4324-4345. doi: 10.3934/era.2023221
[8]	Yu Wang . Bi-shifting semantic auto-encoder for zero-shot learning. Electronic Research Archive, 2022, 30(1): 140-167. doi: 10.3934/era.2022008
[9]	Jinmeng Wu, HanYu Hong, YaoZong Zhang, YanBin Hao, Lei Ma, Lei Wang . Word-level dual channel with multi-head semantic attention interaction for community question answering. Electronic Research Archive, 2023, 31(10): 6012-6026. doi: 10.3934/era.2023306
[10]	Jiping Xing, Yunchi Wu, Di Huang, Xin Liu . Transfer learning for robust urban network-wide traffic volume estimation with uncertain detector deployment scheme. Electronic Research Archive, 2023, 31(1): 207-228. doi: 10.3934/era.2023011

Abstract

1. Introduction

In the quickly changing field of computer vision, facial recognition is a key area that is driving advancements in identity verification, security systems, and human-computer interaction. Its importance highlights the need for reliable and effective facial recognition technologies across a range of industries, including consumer electronics and law enforcement ^[1]. This topic has been completely transformed by deep learning models, most notably FaceNet, which uses neural network architecture to create unique and compact numerical embeddings of faces in high-dimensional spaces. Thanks to FaceNet's creative use of a triplet loss function, these embeddings allow accurate face identification under difficult circumstances such as changing lighting, positions, and emotions ^[2,3]. FaceNet's effectiveness comes from its capacity to group faces that are similar together while differentiating between unique people, which guarantees accurate recognition in a variety of settings ^[4].

FaceNet's unique triplet loss function, which makes it easier to create condensed and distinctive embeddings capable of accurate face recognition in a variety of situations, such as changes in lighting, positions, and expressions, is essential to the network's efficacy ^[5,6]. FaceNet's strong performance is based on this technique, which allows it to discriminate between different persons while grouping similar faces ^[7]. To enhance the network's representational capability, architectural improvements such as the Squeeze-and-excitation (SE) block automatically adjust channel-wise characteristics ^[8].

The SE module enhances feature extraction by adaptively emphasizing significant trends and suppressing less relevant ones by combining a globally average pooling approach with a fully linked layer. Similar to this, multi-level contextual information may be captured using Atrous Spatial Pyramid Pooling (ASPP) technology, which is frequently used in semantic segmentation tasks within convolutional neural networks (CNNs) without incurring a large processing cost ^[9]. These developments, in conjunction with SE's improvement of channel interdependencies at low computational cost ^[10], add to the overall effectiveness of face recognition systems.

Face recognition systems are widely used in a wide range of industries, such as entertainment, internet communication, security, surveillance, access control, and law enforcement ^[11,12]. This highlights the essential role that face recognition systems play in contemporary technology ecosystems. These systems automate the recognition or verification process from digital photos and serve as a way of identifying or authenticating people based on their facial traits. Verification scenarios further demonstrate the adaptability and usefulness of facial recognition systems in a variety of applications by evaluating the similarity between two facial images and producing a match or non-match judgment.

Advancements in security algorithms and technology have driven the use of facial recognition systems in access control, law enforcement, security, surveillance, internet communication, and entertainment, as depicted in Figure 1. It is a method for authenticating or identifying people by automatically recognizing them based on their facial features. It is a computer program that automatically recognizes or verifies one or more individuals from a digital picture. In the verification situations, the similarity between two face images is assessed and a judgment of match or non-match is made.

Figure 1. The generic structure of face recognition system.

DownLoad: Full-Size Img PowerPoint

Our research is important because we aim to enhance face recognition skills, a technology with wide-ranging effects on surveillance, security, and human-computer interaction. Advanced feature fusion procedures were used by the proposed FFSSA-GAN model to address complicated face recognition challenges. Through the use of GANs' capabilities and the integration of cutting-edge neural network modules, we endeavor to push the boundaries of accuracy and dependability in automated facial recognition systems. we present a novel model called FFSSA-GAN in response to a request for a comprehensive approach to face recognition. This model was designed to facilitate a Functional Fusion Module (FFM). FFM group features from various phases of the FaceNet network. The FFM comprises four blocks, each corresponding to a specific stage of the FaceNet network, which includes Squeeze and excitation modules, 11 convolutional layers, an ASPP module, and an attention module. This combination in FFM makes it position invariant, light impervious, and resistant to other environmental conditions.

The advent and empirical validation of the FFSSA-GAN version for facial recognition constitute the observer's primary contribution. This model affords a complex feature fusion module that intelligently blends Atrous Spatial Pyramid Pooling, Squeeze-and-Excitation networks, and interest approaches, expanding the talents of the FaceNet architecture ^[13,14]. About managing face popularity problems, including different lighting fixture conditions, different stances, and distinct facial emotions, FFSSA-GAN performs distinctly properly. The version creates compact and discriminative embedding using the triplet loss function of FaceNet, which guarantees unique recognition in a spread of dynamic instances. With the aid of enhancing the model's resistance to environmental changes, the characteristic Fusion Module allows it to continuously characterize under shifting circumstances. The better overall performance of the version in face characteristic analysis and identity tasks was substantiated empirically using the CelebA dataset, confirming it as a reliable answer within the discipline of laptop vision. This painting represents a sizeable step forward in the search for versatile, accurate, and dependable face recognition structures, with capacity blessings for improving human-laptop interaction, protection, and surveillance in practical settings.

FaceNet is the main component of this study's face recognition framework for several important reasons. First, FaceNet has proven to be more effective at producing highly discriminative embeddings, which are essential for precise face recognition in a variety of settings, such as changing lighting, postures, and facial emotions. By using a triplet loss function, the network can effectively match face embeddings by learning a compact representation of faces in a high-dimensional space. Furthermore, FaceNet's design enables multiscale feature extraction, which is necessary to capture fine-grained face characteristics at various sizes and resolutions ^[15]. We aim to further improve the accuracy and robustness of facial recognition systems, especially in real-world applications where environmental factors present challenges, by utilizing FaceNet as the foundation and integrating novel methodologies like feature fusion and spatial attention mechanisms.

The paper proposes a novel deep-learning model called FFSSA-GAN for facial recognition. The key contributions include:

● Proposes a new FFSSA-GAN model for facial recognition that fuses multiscale features extracted from FaceNet using a modular Feature Fusion block comprising squeeze-excitation, ASPP, and attention units. This enhances representational power and recognition accuracy.

● Demonstrates state-of-the-art results of over 99% accuracy on CelebA dataset, with extensive ablation experiments proving the efficacy of each proposed component. Sets new benchmarks for facial analysis tasks.

● Discusses model limitations, distortion susceptibility, social impacts, and requirements like explainability and governance that are imperative for translating these technical innovations responsibly into real-world systems. Constructively aligns facial recognition advancements with societal values.

The remainder of this paper is structured as follows paper organization. In Section 2, we delve into a comprehensive review of the related work. Section 3 outlines the proposed research design and procedure employed in this study. Moving on to Section 4, we present the details of experiments and results in a thorough perspective. Section 5, presents the ablation study, and Section 6 provides the concluding remarks for this whole research.

2. Related work

A lot of attention has been given to facial recognition, a biometric technique that uses photos of people's faces to identify them based on a database of recognized images. This is especially true in light of the COVID-19 epidemic, where mask-wearing has become commonplace. Scholars have investigated many approaches to address obstacles in facial identification, particularly in situations when faces are concealed. Research has demonstrated novel techniques for masked and unmasked face recognition, including support vector machines (SVM), statistical features, and integrated improvement systems ^[16].

Furthermore, cutting-edge methods like FaceNet, OpenCV, and MobileNetV2 have been used in this field. Researchers ^[17] developed an edge computing-based privacy-preserving system that uses generative adversarial networks (GAN) to address the privacy concerns raised by facial recognition technology. Create privacy-preserving synthetic images and investigate user behavior moderators and privacy issues in facial recognition payments ^[18,19]. In addition, new strategies such as federated learning and privacy protection in impact detection have been developed to address privacy issues in facial recognition ^[20]. Facial recognition is a biometric method that identifies an individual by matching a photograph of a person's face to a database of known faces. The top match in the database then appears as the subject's presumed identity. Various approaches to dealing with the problem of facial recognition, especially during the COVID-19 pandemic, where wearing masks has become common, have been taken up by various researchers. Several studies have presented unique methods for mask recognition and masked and unmasked face recognition using support vector machines (SVM), statistical features, and integrated improvement systems.

Recent advancements in face recognition include the use of a three-person GAN architecture to improve the quality of synthetic facial images, new strategies for pose-invariant 2D face recognition, and landmark-based face frontalization methods to improve the recognition performance ^[21]. Innovative methods such as the PM-GAN have been proposed to obtain better face frontalization results under difficult conditions ^[22].

The literature also covers a wide range of face recognition topics, such as using the Dlib deep learning package to analyze facial attributes in historical artifacts, creating masked facial datasets for accurate face recognition during the COVID-19 epidemic, and the role of machine learning in face identification methods ^[23,24]. Furthermore, research has concentrated on building low-cost face recognition technology for small enterprises as well as investigating applications in healthcare, education, and entertainment ^[25]. As illustrated in Figure 2, the research encompasses a diverse landscape of traditional face recognition systems.

Figure 2. Traditional facial recognition systems.

DownLoad: Full-Size Img PowerPoint

Innovative systems using MediaPipe technology have been designed to facilitate computer interaction through facial gesture detection, automate computer activities using gesture recognition, address face mask detection during the COVID-19 pandemic, and improve home security monitoring systems ^[26,27]. These technologies make major contributions to domains such as accessibility, security, and public health.

The integration of blockchain technology with facial recognition has been investigated to improve security and verification in a variety of applications. Researchers proposed a blockchain-based consortium e-diploma system, investigated video conferencing for identity verification using blockchain, emphasized the role of blockchain in securing biometric data, and introduced a decentralized voting system for secure and transparent elections ^[28,29]. Through the combined potential of blockchain and facial recognition technology, this study jointly enhances the realms of identity verification, security, and data integrity.

The literature thoroughly investigates advances in face identification, particularly in the context of the COVID-19 epidemic. Researchers address the complexities of mask-wearing scenarios by offering novel methodologies that leverage Support Vector Machine (SVM), statistical characteristics, and novel systems that integrate MobileNetV2, OpenCV, and FaceNet. Concerns regarding privacy in facial recognition are being addressed by edge computing-based privacy protection systems, Generative Adversarial Networks (GANs), and research on user behavior modifiers. Recent advances include three-player GAN structures, unique 2D pose-invariant face recognition algorithms, and landmark-based approaches for face frontalization, all of which have produced improved results. The literature also covers a wide range of subjects, such as the use of Dlib for facial attribute analysis, production of masked facial datasets during the epidemic, and importance of machine learning in face recognition. Innovative solutions that use MediaPipe technology can help in areas such as facial gesture detection, automation, and security. Blockchain integration with face recognition has emerged as a focus point for improving the security, verification, and data integrity across several applications.

Comparing the suggested model in the study to previous works in the field of face recognition, a number of new features and improvements are introduced. To improve representational power, it first adds a Feature Fusion Module (FFM) that combines multiscale embeddings taken from a trained FaceNet model. It does this by combining squeeze-excitation blocks, Atrous spatial pyramid pooling, and attention layers in a novel way. This method differs from other approaches in that it uses a methodical fusing of features from several FaceNet network stages, enabling more thorough feature extraction and representation. The suggested model further refines feature importance and improves accuracy in facial recognition tasks by focusing on relevant facial areas through the use of spatial attention processes. Furthermore, the model surpasses state-of-the-art accuracies on face analysis tasks by integrating these components inside an FFSSA-GAN pipeline. This all-encompassing strategy, which combines generative adversarial networks, feature fusion, and spatial attention, marks a substantial divergence from conventional approaches and provides an extensive and creative framework for automated facial recognition.

3. Research design and procedure

We put forth an innovative facial recognition framework centered around systematic feature fusion and spatial attention mechanisms. The core methodology entails a Feature Fusion Module (FFM) that amalgamates multiscale embeddings extracted from a pre-trained FaceNet model. Specifically, the FFM integrates squeeze-excitation blocks, atrous spatial pyramid pooling, and attention layers in a novel configuration that enhances representational power. We also employ a triplet loss function to produce highly discriminative 128-dimensional face embeddings, robust to challenging environments. Additionally, spatial attention focuses on informative facial regions. The proposed FFSSA-GAN pipeline unifies these components to surpass state-of-the-art accuracies on facial analysis tasks. Extensive experiments on CelebA substantiate the efficacy of each architectural innovation and their collective synergy. By robustly handling poses, occlusions and lighting variances, this research pushes boundaries of reliability for real-world facial recognition systems across security, commercial and governance contexts. The methodology overview conveys the rationale and objectives guiding the technical roadmap.

An essential component of FaceNet's design for producing face embeddings is the triplet loss function. To function, it must learn to map face pictures into a high-dimensional space where the faces of the same person are mapped closer together and the faces of other people further apart. Let $f(x)$ be the mathematical representation of the embedding that the neural network created for the face image $x$ . The triplet loss function may be written as follows for a given triplet of face pictures $(a, p, n)$ , where $a$ is an anchor image, $p$ is a positive image of the same person as the anchor, and $n$ is a negative image of a different person:

$L(a, p, n) = \max(0, d(a, p) - d(a, n) + \alpha).$

Here, $\alpha$ is a margin hyper-parameter indicating the lowest desirable difference between the distances of positive and negative pairings, and $d(x, y)$ is the Euclidean distance between the embeddings of images $x$ and $y$ . The goal of the triplet loss function is to minimize the distance between the embeddings of anchor-positive pairs and maximize the distance between the embeddings of anchor-negative pairings by at least $\alpha$ units. This encourages the network to learn embeddings that preserve compactness within each person's embedding and efficiently discriminate between various persons. Understanding this description of the triplet loss function will enhance readers' comprehension of how FaceNet produces discriminative facial embeddings, thereby improving the technical clarity of the suggested FFSSA-GAN model.

The following subsections describe the overall research design and procedure.

3.1. FaceNet

FaceNet ^[30] is a deep learning model developed for face recognition that employs a neural network architecture to generate numerical embeddings, or compact representations, of faces in a high-dimensional space. It utilizes a triplet loss function to learn to map faces into this space such that the embeddings of the same person's face are close together, while the faces of different individuals are far apart. By comparing these embeddings, FaceNet enables accurate and efficient face recognition, allowing for facial verification and identification across various images or video frames, even under different lighting conditions, poses, and facial expressions.

3.2. Squeeze and excitation (SM)

The squeeze-and-excitation (SE) networks, a neural network architecture described by ^[31], are intended to improve feature representation in convolutional neural networks (CNNs). This design clearly simulates channel interdependence to increase network performance. The "squeeze" stage of the SE module aggregates spatial information across feature maps using global average pooling, yielding a compressed channel descriptor with dimensions of 1 x 1. This stage seeks to provide a succinct representation of the input feature maps. In the "excitation" step, fully linked layers are applied to this channel descriptor to produce per-channel weights or "activations". These activations, which are generated using aggregated information from the feature maps, capture channel interdependencies and adapt the network's attention to essential properties. The SE block improves the neural network's overall expressive power and functioning by increasing the importance of significant elements while decreasing the importance of less relevant ones. Finally, these activations are applied to the feature maps, with each channel's feature maps multiplied by the appropriate activation value to produce the SE block's final output. This technique successfully emphasises important characteristics while suppressing less important ones, resulting in better efficiency in feature representation inside CNNs.

3.3. Atrous Spatial Pyramid Pooling (ASPP)

A semantic segmentation module called Atrous Spatial Pyramid Pooling (ASPP) resamples a given feature layer at various rates before the convolution. To capture objects and valuable picture contexts at many sizes, essentially involves probing the source image with several filters that have complementary effective fields of view. The mapping is accomplished using numerous parallel Atrous convolutional layers with varying sampling rates, as opposed to resampling the features directly. Because ASPP enables us to extract more information from a picture, it is crucial for image recognition jobs because it can result in more precise and superior object detection. We can catch items that might be too small or too far away to be recognized with a single filter by analyzing a picture of several sizes.

3.4. Attention module

Attention modules, as defined in ^[32], use attention processes to improve the capacities of neural networks. One sort of attention module, called as multihued attention, uses many attention heads to analyses incoming data. These attention processes work similarly to cognitive attention, naturally giving "soft" weights to different components inside a context window, with a special emphasis on their embeddings. These soft weights might be generated sequentially, as in recurrent neural networks, or concurrently, as in transformers. Unlike hard weights, which are predefined, changed, and then fixed inside the network architecture, soft weights alter dynamically during runtime based on the context of the input data. This dynamic allocation provides for increased flexibility and adaptation in attention allocation, making soft weights useful in activities that need nuanced attention allocation. Overall, soft weights allow the network to priorities different parts according on their relevance to the given job, resulting in increased performance and flexibility when compared to hard weights.

3.5. Proposed feature fusion squeeze spatial attention generative adversarial network for automatic facial recognition (FFSSA-GAN)

The proposed deep-learning model is a unique way to automate facial recognition tasks. The system uses a Feature Fusion Module (FFM) to combine features derived from different stages of a FaceNet network, which is a convolutional neural network (CNN) trained on face images. Initially, the FaceNet network evaluates input photos at various sizes and resolutions, using convolutional layers to recognize complex patterns. To accommodate different input scales, pooling layers down sample pictures, allowing for feature extraction at several scales. The proposed FFSSA-GAN architecture is illustrated in Figure 3.

Figure 3. The proposed FFSSA-GAN network architecture for facial recognition.

DownLoad: Full-Size Img PowerPoint

Following the extraction process, the FFM integrates features from several FaceNet stages to create a single feature vector. The FFM consists of four blocks that correspond to different FaceNet stages. It includes crucial components including a squeeze-and-excitation module (SEM), a 1 × 1 convolutional layer, an atrous spatial pyramid pooling (ASPP) module, and an attention module. The SEM dynamically recalibrates feature representations, while the 1 × 1 convolutional layer minimizes feature dimension. Furthermore, the ASPP module collects features at different scales, which improves feature representation efficacy, while the attention module fine-tunes feature significance weights.

The FFM creates a fixed-size feature vector by concatenating and scaling the outputs of the four blocks, which is then fed into a feature extraction algorithm. The feature extractor refines this vector, resulting in a 128-dimensional embedding indicated by E. This embedding contains key face traits, allowing for exact identification. The classifier then uses this high-dimensional embedding to determine the identity of the individual represented in the image. The classifier assigns the most likely identification label based on characteristics in the embedding space, guaranteeing accurate facial recognition.

Overall, the combination of the FaceNet network, FFM, feature extractor, and classifier improves system resilience to changing circumstances, such as posture and lighting alterations. The attention module of the FFM refines feature significance, resulting in higher accuracy in facial recognition tasks.

The FFSSA-GAN algorithm amalgamates state-of-the-art techniques effectively, demonstrating its proficiency in automated face recognition across diverse scenarios as technically discussed in Algorithm 1. With an input image $I$ , the FFSSA-GAN Facial Recognition method aims to predict the identification $P$ reliably and precisely. The technique utilizes FaceNet's capability to extract features $F_1', F_2', F_3',$ and $F_4'$ at various scales, forming a comprehensive representation of faces. These features undergo systematic refinement through the FFM. Each step encompasses specific functions: The 1 x 1 Convolutional Layer reduces dimensionality, the ASPP Module captures multi-scale context effectively, the Attention Module identifies important features, and the SEM aggregates global contextual information. Subsequently, a feature extractor refines the concatenated and resized feature vector, resulting in a 128-dimensional embedding $E$ . Finally, a classifier leverages this embedding to accurately predict the person's identification. Notably, the algorithm's robustness is augmented by attention mechanisms and multi-stage processing, allowing it to perform well under diverse environmental conditions, including varied stances, lighting, and facial expressions.

Algorithm 1 FFSSA-GAN Facial Recognition

Input: Image

$I$
Output: Predicted Identity

$P$
1: FaceNet Feature Extraction:
2: Use FaceNet to extract features from the input image

$I$ at different scales and resolutions.
3: Let

$F_1', F_2', F_3', F_4'$ represent the features from different stages of FaceNet.
4: Feature Fusion Module (FFM):
5: for each stage do
6: Squeeze-and-Excitation Module (SEM):
7:

$F_i' = \text{GlobalAveragePooling}(F_i)$
8:

$F_i' = \sigma (W_2 \delta(W_1 F_i'))$
9:

$F_i' = F_i \odot F_i'$
10: 1 × 1 Convolutional Layer:
11:

$F_i' = \text{Conv}_{1X1}(F_i')$
12: Atrous Spatial Pyramid Pooling (ASPP) Module:
13:

$F_i' = \text{ASPP}(F_i')$
14: Attention Module:
15:

$F_i' = \text{Attention}(F_i')$
16: end for
17: Concatenate and Resize:
18: Concatenate the output features from all stages:

$F_{\text{concat}} = \text{Concat}(F_1', F_2', F_3', F_4')$
19: Resize the concatenated feature vector to a fixed size.
20: Feature Extraction:
21: Use a feature extractor to generate a 128-dimensional embedding:

$E = \text{FeatureExtractor}(F_{\text{concat}})$
22: Classification:
23: Use a classifier to predict the identity of the person:

$P = \text{Classifier}(E)$

4. Experiments and results

We achieve state-of-the-art facial recognition accuracy by systematically fusing multiscale features through novel combinations of squeeze-excitation, atrous spatial pyramid pooling, and attention modules. Extensive experiments substantiate consistent performance gains over prevailing approaches on the CelebA benchmark, with our model demonstrating over 2–5% elevated precision. Component analysis validates each architectural innovation, proving their efficacy even in isolation. However, maximal representation power stems from their synergistic unification that compounds robustness. Core advances include accentuating channel interdependencies, hierarchical contextual aggregation, and dynamic spatial prioritization to filter identity evidence. Through these concerted strategies, the proposed system sets new performance ceilings for facial recognition tasks across poses, illuminations, and occlusions. The realized benchmarks significantly advance academic literature at the confluence of metric learning and generative adversarial modeling. By surmounting real-world complexities, our contributions open up facial analysis for ethical and reliable automation across security, forensic, and commercial functionalities.

4.1. Dataset

The CelebFaces Attributes (CelebA)^* dataset was used in this research ^[33]. The CelebA dataset is a widely used benchmark dataset in computer vision and machine learning, particularly for tasks related to face recognition, attribute detection, and facial attribute analysis. It contains over 200,000 celebrity images, with more than 10,000 unique identities. Each image in the CelebA dataset is annotated with various facial attributes, such as the presence of glasses, facial hair, gender, and age. These annotations provide valuable labeled data for training and evaluating machine-learning models to perform tasks such as facial attribute classification, face detection, and facial landmark localization. Some sample images from the CelebA dataset are shown in Figure 4.

^*https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

Figure 4. The sample images from the training dataset of CelebA.

DownLoad: Full-Size Img PowerPoint

4.2. Evaluation metrics

In evaluating the proposed FFSSA-GAN model, several key metrics were employed to assess its performance. Precision, measuring the model's accuracy in predicting positive outcomes, and recall, evaluating its ability to identify all positive cases, were computed using standard formulas. The F1 score, representing a balanced measure of precision and recall, provided further insights into model effectiveness, particularly in scenarios with imbalanced class distributions. Additionally, accuracy served as a general evaluation metric, quantifying the ratio of correct predictions to the total dataset instances. Furthermore, the area under the ROC curve (AUC) was utilized to gauge the model's discrimination ability across different categorization thresholds. These comprehensive evaluation metrics collectively demonstrated the robustness and efficacy of the FFSSA-GAN model in facial recognition tasks.

Precision

Precision gauges how well the model predicts favorable outcomes. The ratio of accurately predicted positive observations to all anticipated positives is the main focus of this analysis. False positives (FP) are cases that are mistakenly classified as positive, whereas true positives (TP) are cases that are accurately classified as positive. The formula for precision is given by:

$\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}}.$

Recall

Recall gauges the model's accuracy in identifying every positive case. It is sometimes referred to as sensitivity or the true positive rate. The ratio of accurately anticipated positive observations to the total number of actual positives is the main emphasis. Recall takes into account false negatives (FN), or cases that are mistakenly categorized as negative, in addition to TP. The formula for recall is given by:

$\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}}.$

F1 Score

The harmonic mean of recall and precision is the F1 score. It offers a fair assessment that takes both precision and recall into account. When there is an uneven distribution of classes, the F1 score is very helpful. It can be calculated as:

$\text{F1 Score} = \frac{2 \cdot (\text{Precision} \cdot \text{Recall})}{\text{Precision} + \text{Recall}}.$

Accuracy

The ratio of accurate predictions to all instances in the dataset is the measure of accuracy. It offers a general evaluation of the model's effectiveness. For instance, a model is considered to be 80% accurate if it predicts 80 out of 100 cases accurately. It can be calculated as:

$\text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}}.$

Area Under Curve (AUC)

The Receiver Operating Characteristic (ROC) curve displays the true positive rate (TPR) versus the false positive rate (FPR) at different categorization thresholds. AUC, or area under the curve, is a statistical measure of this relationship. AUC has a range of 0 to 1, with a higher number denoting improved model performance. A perfect classifier has an AUC of 1, whereas a random classifier has an AUC of 0.5.

4.3. Training parameters

The FFSSA-GAN model suggested in this study makes use of a customized Convolutional Neural Network (CNN) architecture that integrates attention, Spatial Excitation Module (SEM), and Atrous Spatial Pyramid Pooling (ASPP) modules for feature extraction. The training process utilized the Adam optimizer with an initial learning rate of $0.0001$ . Overfitting was mitigated using categorical cross-entropy loss and L2 regularization. Celebrity face photos from the CelebA dataset were employed to train the model with a batch size of $32$ and $150$ epochs. To enhance the model's resilience, data augmentation methods such as flips, random rotations, and brightness modifications were applied during training. For the final prediction, the Generative Adversarial Network (GAN) received the extracted features from the CNN model, which included the attention module. To expedite the training process, hardware equipped with an NVIDIA GeForce RTX 2080 Ti GPU was used.

4.4. Results

In this section, we discuss the results of the proposed model. Various experiments were conducted to obtain the best results. The CelebA dataset is used to evaluate the proposed model.

Self consistency test

A self-consistency test was used to evaluate the robustness and accuracy of the machine learning model. The idea behind the self-consistency test is to test the model's ability to correctly predict the output for the same input multiple times. The test typically involves training a model on a dataset and subsequently using it to make predictions on the same dataset multiple times. The predictions were then compared to the true output values to calculate the consistency of the model predictions. A high consistency score indicated that the model was robust and accurate, whereas a low consistency score indicated that the model was less robust and less accurate.

The results of the self-consistency test of the proposed machine learning model are presented in Table 1. The results show that the proposed model achieves higher accuracy using the self-consistency test.

Table 1. The results of the proposed model on CelebA dataset using self consistency test.

Evaluation metrics	Value (%)
Accuracy	97.81
Precision	97.69
Recall	96.73
F1 Score	96.65
ROC	98.85

| Show Table

DownLoad: CSV

Independent set test

For rigorous assessment of the proposed machine-learning model's performance, Independent Set Testing was employed, a widely recognized method in machine-learning evaluation. Following standard practice, 80% of the dataset was allocated for model training, while the remaining 20% was dedicated to rigorous testing. The outcomes derived from the independent set testing procedure are detailed in Table 2, showcasing the model's efficacy in real-world scenarios. This systematic evaluation approach ensures robustness and reliability in the assessment of the model's performance across diverse datasets and applications.

Table 2. The results of the proposed model on CelebA dataset using independent set test.

Evaluation metrics	Value (%)
Accuracy	98.93
Precision	99.17
Recall	98.64
F1 Score	99.78
ROC	99.59

| Show Table

DownLoad: CSV

K-fold cross validation

K-fold cross-validation is used in machine learning to evaluate the performance of a model on a dataset. It involves dividing the data into k "folds, " where k-1 folds are used for training and 1-fold is used for testing. This process was repeated k times, with a different fold being used for testing each time. The performance of the model was then averaged across all k iterations to provide a more robust estimate of its performance on unseen data. This technique can help reduce the risk of overfitting, which occurs when a model is too closely tuned to the training data and generalizes better to new data.

The results of the proposed deep learning model with 10-fold cross-validation are presented in Table 3. The best results were obtained in the seventh fold with an accuracy, precision, recall, F1 score, and AUC of 99.81%, 99.93%, 99.84%, 99.78%, and 1.0, respectively.

Table 3. The results of the proposed model using K-fold cross validation.

Folds	Accuracy (%)	Precision (%)	Recall (%)	TF1 Score (%)	AUC (%)
1	95.49	97.18	97.88	97.65	98.87
2	96.92	98.74	98.93	98.46	1.0
3	92.17	95.31	95.27	96.09	95.78
4	89.63	92.47	93.02	92.78	92.34
5	93.14	96.03	96.49	96.37	95.98
6	90.37	93.48	93.68	92.61	98.
7	99.81	99.93	99.84	99.78	1.0
8	95.73	97.84	97.91	96.93	97.62
9	93.62	96.37	96.13	95.67	95.89
10	96.84	98.94	98.82	98.49	98.67

| Show Table

DownLoad: CSV

Theoretical implications

This research pioneers an integrated facial recognition architecture that unifies attribute encoders, channel optimizers, multi-scale feature aggregators, and spatial attention filters through systematic fusion. Quantitative benchmarking over large datasets substantiates the cumulative enhancement from each component, overcoming environmental constraints such as occlusions and illumination variance. Core theoretical contributions include attention-driven dynamic identity evidence filtration and sacrificing scene statistics for probe specificity optimization. Extensive evaluations attest to outperformance over prevailing academic methods on image-centered person characterization tasks. Diagnostic characterization, such as distortion susceptibility, provides baselines, guiding further optimization.

To improve accuracy and reliability, this paper presents a thorough method for facial recognition using the suggested FFSSA-GAN model, which combines feature fusion and spatial attention methods. On the other hand, the suggested method's possible drawbacks and difficulties are not sufficiently covered in the study. It does not fully address any possible downsides or areas where the FFSSA-GAN model could fall short, even if it does emphasize the improvements and benefits of the model over other methods. The work might also benefit from a more thorough analysis of relevant literature, particularly with relation to earlier approaches to facial recognition problems and a thorough breakdown of the benefits and drawbacks of the suggested techniques. The paper's contribution to the area would be strengthened by a more careful examination of the suggested model's performance under various circumstances and its scalability to different datasets and demographic groupings.

Practical implications

Robustness to real-world visibility variation empowers integration across ethically precarious but automation-reliant domains, such as law enforcement forensics, access control, and behavioral analytics. Undeterred by masks, glasses, or lighting, our system surpasses unreliable human inspections for identification. Compliant performance on standard biometric datasets ensures dependable automated scalability. The transfected pipeline allows granular flow governance between discrete processing stages, catering to variable auditability-privacy tradeoffs. Hence, our reference implementation promises adaptable accuracy, explainability, and transparency, unlocking facial recognition adoption through deliberate trust and risk balancing. Mainstream proliferation necessitates a constructive alignment with societal values.

5. Ablation study

Ablation experiments were performed to demonstrate the best results using the proposed deep-learning model. The quantitative results of facial recognition are shown in Table 4. Multiple experiments were conducted, and the results of the ablation experiments and proposed model were significant.

Table 4. The results of the proposed model using ablation study.

Module	SEM	ASPP	Attention	Accuracy (%)	Precision (%)	Recall (%)	TF1 Score (%)	AUC (%)
FFSSA-GAN	$\checkmark$	$\checkmark$	$\checkmark$	99.81	99.93	99.84	99.78	1.0
No SEM	$\times$	$\checkmark$	$\checkmark$	99.47	97.91	98.12	97.83	99.79
No ASPP	$\checkmark$	$\times$	$\checkmark$	96.18	97.95	98.08	98.73	98.32
No Attentoin	$\checkmark$	$\checkmark$	$\times$	95.96	97.91	98.12	97.83	98.72

| Show Table

DownLoad: CSV

5.1. SEM exclusion analysis

From Table 4, the face recognition results without the SEM module achieved an accuracy, precision, recall, F1 score, and AUC of 99.47%, 97.91%, 98.12%, 97.83%, and 99.79%, respectively. The SEM module plays an important role in the proposed model. Therefore, the SEM module can enhance spatial inconsistencies and improve FFSSA-GAN performance.

5.2. ASPP deprivation investigation

The proposed model achieved an accuracy, precision, recall, F1 score, and AUC of 96.18%, 97.95%, 98.08%, 98.73%, and 98.32%, respectively. The ASPP block is a critical component that enables the neural network to effectively capture and integrate multi-scale contextual information. Therefore, by adding an ASPP block, multi-scale features can enhance the performance of the proposed model.

5.3. Attention module omission evaluation

The absence of the attention block decreased the model's results by achieving accuracy, precision, recall, F1 score, and AUC of 95.96%, 97.91%, 98.12%, 97.83%, and 98.72%, respectively. An Attention block was designed to enhance the representational power of the convolutional layers through attention mechanisms.

6. Conclusions

Facial recognition leveraging Generative Adversarial Networks (GANs) is a dynamic approach in computer vision. Existing methods for facial recognition face several challenges. Using the capabilities of GANs, the proposed approach uses a feature fusion approach, squeeze and excitation block for channel-wise feature enhancement, ASPP for multi-scale features, and an attention module that focuses on the region of interest. The feature maps were fed into a Generative Adversarial Network for the generation of facial images. This research signifies a significant breakthrough in facial recognition technology with the creation of a Feature Generative Spatial Attention Adversarial Network (FFSSA-GAN) model. The model's successful handling of facial recognition issues on the CelebA dataset was demonstrated by its strong performance. This is achieved by smart integration of novel components, including compression and excitation networks, as well as segmentation techniques. The feature synthesis module incorporates the atrous spatial pyramid cluster and the attention mechanism. FaceNet's triplet loss enables the creation of precise embeddings, facilitating accurate face recognition in many circumstances. The model's superiority is further reinforced by the use of measurable evaluation measures such as precision, recall, and F1 score. This research has not only made notable technical advancements, but has also had a substantial influence on the fields of human-computer interaction, security systems, and computer vision. The integration of technology into daily life is growing, resulting in enhanced user experiences and improved security measures. The versatility and efficiency of FFSSA-GAN make it a cutting-edge solution that has driven advancements in facial recognition technology across numerous industries. In the future, researchers may investigate how to include explainable methodologies into the FFSSA-GAN model to make it more transparent and comprehensible. These techniques aid in elucidating the model's decision-making process, which is essential for fostering confidence in practical implementations. Making the FFSSA-GAN model more realistic by modifying it to address issues like face recognition in poor light or when portions of the face are obscured might be another research field. To further confirm the model's efficacy in a range of scenarios, researchers might also evaluate the model's performance using bigger datasets and across diverse populations. Through ongoing refinement and customization to individual requirements, the FFSSA-GAN model may be more easily included into security and surveillance systems, for example. It is imperative that these obstacles be overcome in the dynamic field of facial recognition technology.

Use of AI tools declaration

The authors declare that they have not used artificial intelligence tools in the creation of this article.

Conflict of interest

The authors declare there is no conflict of interest.

References

[1]	Z. Q. Cheng, Q. Dai, S. Li, T. Mitamura, A. Hauptmann, Gsrformer: Grounded situation recognition transformer with alternate semantic attention refinement, in Proceedings of the 30th ACM International Conference on Multimedia, (2022), 3272–3281. https://doi.org/10.1145/3503161.3547943
[2]	H. Wang, Z. Q. Cheng, J. Sun, X. Yang, X. Wu, H. Y. Chen, et al., Debunking free fusion myth: Online multi-view anomaly detection with disentangled product-of-experts modeling, in Proceedings of the 31st ACM International Conference on Multimedia, (2023), 3277–3286. https://doi.org/10.1145/3581783.3612487
[3]	J. Zhang, X. Wu, Z. Q. Cheng, Q. He, W. Li, Improving anomaly segmentation with multi-granularity cross-domain alignment, in Proceedings of the 31st ACM International Conference on Multimedia, (2023), 8515–8524. https://doi.org/10.1145/3581783.3611849
[4]	S. Gupta, P. Arbeláez, R. Girshick, J. Malik, Indoor scene understanding with RGB-D images: Bottom-up segmentation, object detection and semantic segmentation, Int. J. Comput. Vision, 112 (2015), 133–149. https://doi.org/10.1007/s11263-014-0777-6 doi: 10.1007/s11263-014-0777-6
[5]	X. M. Zhang, Z. Y. Li, Y. Zheng, Multi-threshold image segmentation based on combining fisher criterion and potential function, J. Comput. Appl., 32 (2012), 2843–2847. https://doi.org/10.3724/SP.J.1087.2012.02843 doi: 10.3724/SP.J.1087.2012.02843
[6]	P. Liu, A. M. Yang, A method of region based color image segmentation, Comput. Eng. Appl., 43 (2007), 37–39. https://doi.org/10.3321/j.issn:1002-8331.2007.06.012 doi: 10.3321/j.issn:1002-8331.2007.06.012
[7]	C. Li, Z. Qu, Review of image edge detection algorithms based on deep learning, J. Comput. Appl., 40 (2020), 3280–3288. https://doi.org/10.11772/j.issn.1001-9081.2020030314 doi: 10.11772/j.issn.1001-9081.2020030314
[8]	J. Song, Y. Yu, Q. Luo, Cross-layer fusion feature based on richer convolutional features for edge detection, J. Comput. Appl., 40 (2020), 2053–2058. https://doi.org/10.11772/j.issn.1001-9081.2019112057 doi: 10.11772/j.issn.1001-9081.2019112057
[9]	S. J. Zhai, Research on Image Segmentation Based on Optimization Theory, Ph.D thesis, Hunan Normal University, 2018.
[10]	J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015), 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965
[11]	A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, Commun. ACM, 60 (2017), 84–90. https://doi.org/10.1145/3065386 doi: 10.1145/3065386
[12]	J. J. Qiao, Z. Q. Cheng, X. Wu, W. Li, J. Zhang, Real-time semantic segmentation with parallel multiple views feature augmentation, in Proceedings of the 30th ACM International Conference on Multimedia, (2022), 6300–6308. https://doi.org/10.1145/3503161.3547786
[13]	H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, Pyramid scene parsing network, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 2881–2890. https://doi.org/10.1109/CVPR.2017.660
[14]	L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, IEEE Trans. Pattern Anal. Mach. Intell., 40 (2017), 834–848. https://doi.org/10.1109/TPAMI.2017.2699184 doi: 10.1109/TPAMI.2017.2699184
[15]	O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in Proceedings of the 2015 International Conference on Medical Image Computing and Computer-Assisted Intervention, (2015), 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
[16]	C. Peng, T. Tian, C. Chen, X. Guo, J. Ma, Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation, Neural Networks. 137 (2021), 188–199. https://doi.org/10.1016/j.neunet.2021.01.021 doi: 10.1016/j.neunet.2021.01.021
[17]	Y. Liu, Z. Zhang, S. Pei, J. H. Wu, L. H. Liang, Z. R. Ma, Faulty insulator segmentation method in infrared image based on deep learning, Electr. Meas. Instrum., 59 (2022), 63–68.
[18]	Z. Hu, S. Bao, C. Xu, H. Wang, Semantic segmentation algorithm for remote sensing buildings based on DeepLabv3+, J. Comput. Appl., 41 (2021), 71–75.
[19]	X. Tang, W. Tu, K. Li, J. Cheng, DFFNet: An iot-perceptive dual feature fusion network for general real-time semantic segmentation, Inf. Sci., 565 (2021), 326–343. https://doi.org/10.1016/j.ins.2021.02.004 doi: 10.1016/j.ins.2021.02.004
[20]	Y. Wang, H. Liu, H. Wang, Y. Qian, Lightweight building semantic segmentation method based on remote sensing images, Comput. Eng. Design, 43 (2022), 2646–2653. https://doi.org/10.16208/j.issn1000-7024.2022.09.032 doi: 10.16208/j.issn1000-7024.2022.09.032
[21]	A. Paszke, A. Chaurasia, S. Kim, E. Culurciello, Enet: A deep neural network architecture for real-time semantic segmentation, preprint, arXiv: 1606.02147. https://doi.org/10.48550/arXiv.1606.02147
[22]	E. Romera, J. M. Alvarez, L. M. Bergasa, R. Arroyo, Erfnet: Efficient residual factorized convnet for real-time semantic segmentation, IEEE Trans. Intell. Transp. Syst., 19 (2017), 263–272. https://doi.org/10.1109/TITS.2017.2750080 doi: 10.1109/TITS.2017.2750080
[23]	F. Xiong, X. Zhang, X. Han, L. Kuang, H. Liu, J. Jia, Research on improved semantic segmentation of remote sensing, Comput. Eng. Appl., 58 (2022), 185–190. https://doi.org/10.3778/j.issn.1002-8331.2011-0021 doi: 10.3778/j.issn.1002-8331.2011-0021
[24]	S. Li, T. Wu, Lightweight semantic segmentation of road scenes for autonomous driving, Comput. Eng. Appl., 59 (2023). https://doi.org/10.3778/j.issn.1002-8331.2206-0433 doi: 10.3778/j.issn.1002-8331.2206-0433
[25]	H. Zhao, X. Qi, X. Shen, J. Shi, J. Jia, Icnet for real-time semantic segmentation on high-resolution images, in Proceedings of the European Conference on Computer Vision (ECCV), (2018), 405–420. https://doi.org/10.1007/978-3-030-01219-9_25
[26]	S. Liu, H. Ye, K. Jin, H. Cheng, CT-UNet: Context-transfer-UNet for building segmentation in remote sensing images, Neural Process. Lett., 53 (2021), 4257–4277. https://doi.org/10.1007/s11063-021-10592-w doi: 10.1007/s11063-021-10592-w
[27]	A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, J. Garcia-Rodriguez, A review on deep learning techniques applied to semantic segmentation, preprint, arXiv: 1704.06857. https://doi.org/10.48550/arXiv.1704.06857
[28]	C. Cui, T. Gao, S. Wei, Y. Du, R. Guo, S. Dong, PP-LCNet: A lightweight CPU convolutional neural network, preprint, arXiv: 2109.15099. https://doi.org/10.48550/arXiv.2109.15099
[29]	K. Zhou, Q. Yang, Y. Wang, J. Zhang, An improved SSD algorithm based on pressure plate status recognition, Electr. Meas. Instrum, 58 (2021), 69–76. https://doi.org/10.19753/j.issn1001-1390.2021.01.010 doi: 10.19753/j.issn1001-1390.2021.01.010
[30]	Q. Yao, S. Bie, J. Yu, Q. Chen, A bearing fault diagnosis method combining improved inception V2 module and CBAM, J. Vib. Eng., 35 (2022), 949–957. https://doi.org/10.16385/j.cnki.issn.1004-4523.2022.04.019 doi: 10.16385/j.cnki.issn.1004-4523.2022.04.019
[31]	H. Wang, X. Ge, Lightweight DeepLabv3+ building extraction method from remote sensing images, Remote Sens. Natural Resour., 34 (2022), 128–135. https://doi.org/10.6046/zrzyyg.2021219 doi: 10.6046/zrzyyg.2021219
[32]	D. Liu, Z. Liang, Y. Sun, Micro-expression recognition method based on spatial attention mechanism and optical flow features, J. Comput.-Aided Design Comput. Graphics, 33 (2021), 1541–1552. https://dx.doi.org/10.3724/SP.J.1089.2021.18569 doi: 10.3724/SP.J.1089.2021.18569
[33]	Z Lyu, X Xu, F Zhang, Lightweight attention mechanism module based on squeeze and excitation, J. Comput. Appl., 42 (2022), 2353–2360. https://doi.org/10.11772/j.issn.1001-9081.2021061037 doi: 10.11772/j.issn.1001-9081.2021061037
[34]	Y Rao, W Zhao, Y Tang, J Zhou, S. N. Lim, J. Lu, Hornet: Efficient high-order spatial interactions with recursive gated convolutions, preprint, arXiv: 2207.1428v3.
[35]	Y. Liu, F. Zheng, B. Fan, TV news automatic segmentation base on text and audio-visual multi-modal features information, Comput. Eng. Appl., 43 (2007), 190–194. https://doi.org/10.3321/j.issn:1002-8331.2007.35.057 doi: 10.3321/j.issn:1002-8331.2007.35.057
[36]	P. Wang, L. Liu, H. Zhang, T. Wang, CGNet: A cascaded generative network for dense point cloud reconstruction from a single image, Knowledge-Based Syst., 223 (2021), 107057. https://doi.org/10.1016/j.knosys.2021.107057 doi: 10.1016/j.knosys.2021.107057
[37]	Q. You, W. Xu, K. Zhang, L. Zhang, X. Yi, D. Yao, C. Wang, et al., ccNET: Database of co-expression networks with functional modules for diploid and polyploid Gossypium, Nuclc Acids Res., 45 (2017), D1090–D1099. https://doi.org/10.1093/nar/gkw910 doi: 10.1093/nar/gkw910

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)