
In this paper, in order to realize the predefined-time control of n-dimensional chaotic systems with disturbance and uncertainty, a disturbance observer and sliding mode control method were presented. A sliding manifold was designed for ensuring that when the error system runs on it, the tracking error was stable within a predefined time. A sliding mode controller was developed which enabled the dynamical system to reach the sliding surface within a predefined time. The total expected convergence time can be acquired through presetting two predefined-time parameters. The results demonstrated the feasibility of the proposed control method.
Citation: Yun Liu, Yuhong Huo. Predefined-time sliding mode control of chaotic systems based on disturbance observer[J]. Mathematical Biosciences and Engineering, 2024, 21(4): 5032-5046. doi: 10.3934/mbe.2024222
[1] | Zilong Liu, Jingbing Li, Jing Liu . Encrypted face recognition algorithm based on Ridgelet-DCT transform and THM chaos. Mathematical Biosciences and Engineering, 2022, 19(2): 1373-1387. doi: 10.3934/mbe.2022063 |
[2] | Chao Che, Chengjie Zhou, Hanyu Zhao, Bo Jin, Zhan Gao . Fast and effective biomedical named entity recognition using temporal convolutional network with conditional random field. Mathematical Biosciences and Engineering, 2020, 17(4): 3553-3566. doi: 10.3934/mbe.2020200 |
[3] | Junting Lin, Shan Li, Ning Qin, Shuxin Ding . Entity recognition of railway signal equipment fault information based on RoBERTa-wwm and deep learning integration. Mathematical Biosciences and Engineering, 2024, 21(1): 1228-1248. doi: 10.3934/mbe.2024052 |
[4] | Xiaoguang Liu, Mingjin Zhang, Jiawei Wang, Xiaodong Wang, Tie Liang, Jun Li, Peng Xiong, Xiuling Liu . Gesture recognition of continuous wavelet transform and deep convolution attention network. Mathematical Biosciences and Engineering, 2023, 20(6): 11139-11154. doi: 10.3934/mbe.2023493 |
[5] | Hangle Hu, Chunlei Cheng, Qing Ye, Lin Peng, Youzhi Shen . Enhancing traditional Chinese medicine diagnostics: Integrating ontological knowledge for multi-label symptom entity classification. Mathematical Biosciences and Engineering, 2024, 21(1): 369-391. doi: 10.3934/mbe.2024017 |
[6] | Qiaokang Liang, Jianzhong Peng, Zhengwei Li, Daqi Xie, Wei Sun, Yaonan Wang, Dan Zhang . Robust table recognition for printed document images. Mathematical Biosciences and Engineering, 2020, 17(4): 3203-3223. doi: 10.3934/mbe.2020182 |
[7] | Xiaowen Jia, Jingxia Chen, Kexin Liu, Qian Wang, Jialing He . Multimodal depression detection based on an attention graph convolution and transformer. Mathematical Biosciences and Engineering, 2025, 22(3): 652-676. doi: 10.3934/mbe.2025024 |
[8] | Qian Zhang, Haigang Li, Ming Li, Lei Ding . Feature extraction of face image based on LBP and 2-D Gabor wavelet transform. Mathematical Biosciences and Engineering, 2020, 17(2): 1578-1592. doi: 10.3934/mbe.2020082 |
[9] | Quan Zhu, Xiaoyin Wang, Xuan Liu, Wanru Du, Xingxing Ding . Multi-task learning for aspect level semantic classification combining complex aspect target semantic enhancement and adaptive local focus. Mathematical Biosciences and Engineering, 2023, 20(10): 18566-18591. doi: 10.3934/mbe.2023824 |
[10] | Shangbin Li, Yu Liu . Human motion recognition based on Nano-CMOS Image sensor. Mathematical Biosciences and Engineering, 2023, 20(6): 10135-10152. doi: 10.3934/mbe.2023444 |
In this paper, in order to realize the predefined-time control of n-dimensional chaotic systems with disturbance and uncertainty, a disturbance observer and sliding mode control method were presented. A sliding manifold was designed for ensuring that when the error system runs on it, the tracking error was stable within a predefined time. A sliding mode controller was developed which enabled the dynamical system to reach the sliding surface within a predefined time. The total expected convergence time can be acquired through presetting two predefined-time parameters. The results demonstrated the feasibility of the proposed control method.
People are progressively becoming active in social media, sharing their thoughts, beliefs, concerns and experiences. Consequently, a huge amount of useful information is produced that can help solve many problems in health such as mental health [1], health surveillance [2], public safety and policy [3,4], healthcare [5,6] and gender vulnerability [7,8]. User demographics provide social media-based research with essential information that can help study the issue from diverse perspectives. However, on most social media platforms, user information such as gender is considered private and therefore not freely available.
The COVID-19 pandemic has exacerbated global socio-economic inequalities, revealing how crises affect people differently according to their gender in troubling patterns which do not bode well for future resilience. Integrating governance at widening levels and mitigating the limited economic options of women are two examples of systematic challenges which require attention for human futurity. However, in many cases, even the data required to document and understand these challenges is not available. This paper addresses these systematic imperatives by providing a model for extracting users' gender on social media and helping researchers identify the elements of promising emergent governance frameworks to address local and global-scale socio-ecological challenges that disproportionately impact women.
Although many previous studies have focused on finding user information such as gender from text data [9,10,11,12,13], very few of them have considered using images. Combining image and text classification methods for finding users' genders can significantly increase the classification accuracy [14,15]. In this paper, we propose a multimodal approach to find social-media users' gender by combining text and image processing and adapting transformers.
Transformers are novel deep learning models that use a self-attention mechanism to identify and learn significant parts of a content [16]. The attention mechanism is a technique that is capable of enhancing and highlighting important parts of the content while downgrading other parts [17]. Self-attention is an attention mechanism that finds important tokens and their relations by comparing content with itself [18]. A token is usually a single word in natural language processing (NLP) and a group of pixels, known as a patch, that are processed together in computer vision. Since transformers can process tokens sequentially, they are suitable for both text and image processing [19,20].
Transformers were initially used for NLP and later on for computer vision. Before transformers, recurrent neural network (RNN) models such as long-short term memory (LSTM) and gated recurrent units (GRUs) with added attention layers on top of them were commonly used for NLP and convolutional neural networks (CNN) were dominantly used for vision. In 2017, transformers were introduced by keeping the attention layer and dropping the RNN part to speed up the training process for NLP. Recently, transformers have been used and performed very well in image recognition. BERT [21] and ViT [22] are one of the first models built using transformers and trained for text and image classification, respectively.
BERT, which has become very popular for NLP lately, was first developed in 2018 to improve GPT by looking at sequences of texts in a bidirectional way. GPT is a transformer-based model that was proposed by OpenAI in 2018, trained in an unsupervised manner and then fine-tuned for a specific supervised NLP task [23]. GPT includes 12-layers of transformer decoders with masked self-attention. For unsupervised learning, the model was pretrained for next-token prediction using an unpublished book dataset. Then the model was fine-tuned through labeled datasets for procedures such as classification, textual entailment, and sentiment analysis. This training technique is extremely favorable to NLP developers, since it performs very well when less labelled data is available.
BERT was presented in two different modes, BERTBASE and BERTLARGE which respectively include twelve layers of transformers with twelve-headed bidirectional self-attention and twenty-four layers of transformers with sixteen-headed bidirectional self-attention. Both models have been trained in an unsupervised manner for language modelling and next-sentence prediction, using a large corpus gathered from books and Wikipedia pages. This time consuming computationally-expensive pre-training phase resulted in learning contextual embeddings for tokens i.e., words, by BERT. BERT can then be fine-tuned to perform different NLP tasks such as question answering and language understanding, in a supervised manner.
Soon after, other models were developed to improve BERT. RoBERTa trains the BERT model with different hyperparameters, longer sequences and a larger batch size. Moreover, it applies dynamic masking for masked language modeling (MLM) rather than static masking which is usen in BERT and achieves significantly better results on different datasets [24]. XLNet replaces the autoencoding model of BERT with an autoregressive model and gains better results compared to BERT and RoBERTa [25]. ELECTRA substitutes the MLM pretraining method used in BERT with a replaced token detection method and outperforms the previous models in terms of accuracy while having less computational complexity [26].
After NLP, transformers were adjusted for constructing vision models using sequences of pixels/patches. Image GPT (iGPT) and ViT were the first vision models built with transformers. iGPT was developed in 2020 by OpenAI and trained in three different sizes, iGPT-S, iGPT-M and iGPT-L, which included 76 million, 455 million and 1.4 billion parameters, respectively. Since finding the relation between pixels is prohibitively complex in terms of memory and computation, iGPT reduces the resolution and color space of an image and then applies generative training on sequences of pixels using transformers [27]. ViT was developed in 2020 and published in 2021 by researchers from Google's Brain Team [28]. To decrease memory and computation complexities, ViT divides an image into 16 × 16 pixel sections for processing. Thus, a token is a 16 × 16 pixel piece of an image in ViT. Next, a learnable embedding vector is assigned to each token and along with positional embeddings are fed into a transformer architecture. Three different models are defined and trained for ViT, namely, ViT-Base, ViT-Large and ViT-Huge, which respectively, include twelve layers of transformers with twelve-head self-attention, twenty-four layers of transformers with sixteen-head self-attention and thirty-two layers of transformers with sixteen-head self-attention. The models have been pre-trained for image classification on different datasets including ImageNet, ImageNet-21k and JFT-300M and have had up to 99.74% accuracy. The authors found that when trained on large datasets (14–300 million images), ViT outperforms CNN-based models such as ResNet [29] and EfficientNet [30].
Afterwards, different vision models were proposed and built on top of ViT for image classification. DieT [31] was the first work to successfully train transformer-based models using mid-sized datasets (i.e. 1.2 million samples of ImageNet rather than 300 million images of JFT). A CNN was used as a teacher model for DieT to train the useful representations of input images. Hard and soft labeling were explored for this distillation approach, where the hard distillation was found to perform fairly better. Swin transformers [32] proposed hierarchical feature maps through merging image patches. It performs local attention using window partitioning, and uses shifted window approach to find cross-window connections. Several works have suggested to augment ViT with CNN architecture [33,34,35]. Convolutional vision transformer (CvT) [36] introduces CNNs to ViT to capture spatial structures and low-level details of image patches. CvT has a hierarchical design in which the sequence length progressively decreases while the token width increases. LeViT [37] used CNNs for image processing and feature extraction and passed the outcome as an input to a hierarchical ViT architecture. ViT has also been adjusted for carrying other image processing tasks such as object detection [38,39], segmentation [40] and image generation [41].
Previously, some works have used only the profile images to predicted user genders [42], while others have gathered several images posted by the users in social media to discover their gender [43,44]. In this work, we use transformer models to explore both of the methods and compare them in terms of accuracy. We use a gender classifier dataset available on Kaggle [45] and the PAN-2018 dataset [46,47] to build gender classification models based on profile images and image content posted by the user on social media, respectively. We fine-tune three vision models, i.e., ViT, LeViT and Swin Transformer to predict gender based on Twitter profile images (the Kaggle dataset) and ten different images posted by a user on Twitter (the PAN-18 dataset), respectively. In addition, we fine-tune three NLP models, i.e., BERT, RoBERTa and ELECTRA for text-based gender recognition using approximately 100 tweets posted by the user for both of the Kaggle and PAN-18 datasets. We found that concatenating several tweets improves the accuracy of the text-classification model. Likewise, concatenating several images posted on Twitter improves the accuracy of the image-classification model. Eventually, we combined the image- and text-classification models and found a high accuracy of 88.11 and 89.24% using transformers for the Kaggle and PAN-18 datasets, respectively. Our contribution to this work is threefold:
● We have fine-tuned and compared different state of the art transformer-based vision and text models for classification and evaluated their statistical significance using Mann-Whitney U test.
● We have completed the publicly available dataset on Kaggle and provided approximately 100 tweet ids for each female, male and brand classes. Therefore, we provide a great dataset that future works could build up on. Our dataset is publicly available at [48].
● Our work is extendable to other social media platforms such as Facebook and Reddit. This work paves the path for other research that require gender information of social media users for studying health-related issues.
We compared our model with state-of-the-art models and found that our multimodal method is superior to other methods in terms of accuracy.
In the following, Section 2 includes the literature review. Sections 3 and 4 present our proposed method and numerical results, respectively. A discussion is provided in Section 5, followed by conclusion and future work in Section 6.
Finding gender from text has been practiced using different approaches [9,10,11,12,13]. Vashisth and Meehan [9] used different NLP methods for gender detection using Tweets, including bag of words (BoW) created with term frequency-inverse document frequency (TF-IDF), word embeddings using W2Vec and GloVe embeddings, logistic regression, support vector machine (SVM), and Naïve Bayes. They concluded that word embeddings have the highest performance for gender recognition. Ikae and Savoy [10] compared different machine learning methods for gender detection using tweets including logistic regression, decision tree, k-nearest neighbors (KNN), SVM, Naïve Bayes, neural networks and random forest on seven different datasets. They concluded that neural networks and random forest perform best among the different approaches. Authors in [12] used n-grams as well as unigrams to tokenize sentences. They applied five different machine learning algorithms, Naive Bayes, sequential minimal optimization (SMO), logistic regression, random forest and j48 on text for gender recognition and found that a combination of 1- to 4-grams with SMO produces the best accuracy.
The studies mentioned above, have only used text for gender recognition and have not considered image data. Authors in [49] were the first to use profile images for gender detection. They stacked different approaches, namely, Microsoft discussion graph tool (DGT) using the username of the users, Face++ using their profile images, and SVMLight using their tweets. However, they combined pre-existing methods and did not train or fine-tune any model. In [44], VGG, a well-known image recognition model based on CNN has been fine-tuned for gender detection of Twitter users. In [50], text and image have been used for predicting the gender of Twitter users. In the image classification method, a CNN is trained for gender recognition. The text classification method includes applying TF-IDF to the hashtags and using latent Dirichlet allocation (LDA) to find the topics that the user is interested in. The results show that the combined method has higher accuracy.
Some studies have focused on image classification techniques for gender recognition. For example, authors in [51] propose a method for gender detection using images. First, they use CNN for feature extraction. Next, they apply a self-joint attention model for feature fusion. Finally, they use two fully connected neural network layers with ReLu and SoftMax activation functions and one average pooling layer to predict the gender. In [52], a method using gated residual attention networks has been proposed for gender recognition using images and tested on five different datasets. In [53], different CNNs are trained for gender recognition using different methods such as KNN, decision tree, SVM and SoftMax for feature extraction. The results of the CNN methods are combined by majority voting to increase the accuracy. Authors in [11] used posts, comments and replies on Facebook for gender recognition. They compared BERT with different machine learning and deep learning algorithms such as Naïve Bayes, Naïve Bayes Multinomial, SVM, decision tree, random forest, KNN, RNN and CNN. The results show that BERT has the highest performance among the different methods.
Some studies have combined both text and image classification models and employed transformers for gender recognition. Authors in [54] have designed a model for gender identification of Twitter users that combines three models, a multi-classifier for basic features (e.g. name, description), a multi-classifier for advanced features (i.e. k top words of tweets) and a ResNet-18 classifier for profile images of users. Among all the different methods (i.e. decision tree, SVM, AdaBoost, Gradient Boosting and Random Forest) that have been used for the multi-classifiers and for combining the models, Gradient Boosting has the highest accuracy. In [55], a multimodal approach using both text and image is proposed for the gender detection of Twitter users. The text classification part uses BERTBASE and the image classification part uses EfficientNet, a CNN-based approach for image recognition. The two methods are then combined to gain a higher accuracy. In [13], the gender of Twitter users has been predicted using their names, descriptions, tweets, and profile colors. SVM, BERT and BLSTM have been applied to user descriptions and BERT has performed better compared to SVM and BLSTM. Next, the different approaches are combined to improve the accuracy.
Some methods mentioned above have used transformers for text classification for author profiling, and have found that transformers have higher performance compared to other methods. However, transformer-based text-classification models have not been enhanced with transformer-based image recognition models for demographic information extraction. In this paper, we use transformer models to improve the performance and accuracy of text classification by combining it with image classification for gender recognition of Twitter users.
Two datasets were used to conduct this study. The first dataset which was released in 2016 and is freely available on Kaggle includes the link to the profile image and one random tweet of 20050 different Twitter users [45]. The dataset has four different labels for the users, female, male, brand and unknown. The second dataset is PAN-18 which was released in 2018 and includes 100 tweets and 10 images posted by Arabic, English and Spanish speaking Twitter users. In this work, powerful models based on transformers were fine-tuned, tested, combined and compared on both of the datasets. In the following, the datasets, methods and models are explained.
The Kaggle dataset which could be downloaded from [45] includes link to profile image and a random tweet of 20050 different Twitter users. However, one single tweet does not carry much information and is not enough for training a strong gender classification model. Moreover, most of the profile image links did not work. Therefore, similar to other works [42,56], we gathered more tweets and the updated profile image link of the Twitter users of the Kaggle dataset. The tweet IDs of the dataset that we gathered is available at [48]. First, all the users with the unknown label were removed. Then, using the Twitter API Academic Researcher account and through the usernames of the users provided by the Kaggle dataset, user IDs and subsequently updated profile image links and approximately 100 different tweets posted by the user were gathered. The tweet IDs that were retrieved for each user are available online [48]. In compliance with Twitter's privacy policy, only the Tweet IDs and user IDs could be publicly released [57]. To obtain the text and other metadata, e.g. create date and location, the Tweet IDs need to be hydrated [58]. Tweets were cleaned, hyperlinks and mentions were removed and punctuations were fixed. Emojis were preserved since they carry valuable information that machine learning models could significantly benefit from. After balancing the dataset 2943 records of each class, i.e. female, male and brand, were acquired.
In the PAN-18 dataset, which can be downloaded from [47] after permission is granted, 100 tweets and 10 images posted by 2500, 4900 and 5200 Arabic, English and Spanish speaking Twitter users have been gathered, respectively. Among the users 1000, 1900 and 2200 belong to the Arabic, English and Spanish testing and the rest to the training datasets, respectively. All the users have been labelled based on their gender, i.e. female and male. Half of the users in the training and testing datasets are female and the other half are male and the datasets are completely balanced. The tweets included emojis and were already cleaned. We used only the English datasets to train and test our models. Each of the 10 images of a user carries some information that could help the model separate the two genders. We found that by concatenating several images and feeding them into the base model for fine-tuning, the accuracy will significantly increase. The reason is that an image created from several images carries more information about the user and can help classify the gender with higher confidence. Since nine images can be concatenated to create a square image, nine of the ten images of a user were selected for concatenation. This was repeated ten times for a user, each time a different image was left out. The final image was resized to 224 × 224 pixels to be compatible with the transformer models. In order for our work to be reproducible, we have provided the code ("concatenate_images.py" in [48]) to generate the exact image combination that was used for training the models. We found that the accuracy of the model fine-tuned using the concatenated images is up to 16.92% higher compared to the model fine-tuned using the single original images.
Deep learning models such as transformers are advantageous to other machine learning models only when a large dataset is fed to them. Oftentimes, labelled data is not available or it is very limited. In cases where a great amount of data is not accessible, fine-tuning a pre-trained deep learning model can help find the desired accuracy. To this end, we have fine-tuned ViT-Base, LeViT and Swin transformer for gender recognition of users based on their Twitter profile images (the Kaggle dataset) and based on ten different images that they have posted on Twitter (the PAN-18 dataset).
We split the Kaggle dataset into balanced train, validation, and test datasets with precisely, 7332,498 and 999 users, respectively. Three models, namely, ViT, LeViT and Swin transformer, were fine-tuned to classify the images into three classes, female, male and brand.
For the PAN-18 dataset, 100 users of the train dataset were pulled out and allocated to the validation dataset. For each user ten concatenated images were created. All the ten images created for all the users from the training dataset were used for fine tuning the same three models (ViT, LeViT and Swin transformer). The accuracy of the model fine-tuned using the concatenated images was up to 16.92% higher compared to the model fine-tuned using the original images. Next, for each user the results of the ten concatenated images were combined using two fully connected neural network layers (Figure 1).
Before training each model, the cross-validation datasets were used for hyperparameter optimization using the WandB (weights and biases) library. We found that fine-tuning deep-learning models was not sensitive to the hyperparameters. The reason is that they have already been trained on a large dataset and require only a few more epochs to be fine-tuned. However, the fully connected neural network which combines the results of the ten different concatenated images was highly sensitive to the hyperparameters, since it was being trained from scratch. Most importantly, it was sensitive to the optimizer and performed well with Adam or AdamW optimizers, but very poor with the stochastic gradient descent (SGD) optimizer. Also, we found that smaller learning rates (≥ 0.001) work better when training the stacked neural networks. Table 1 shows the best hyperparameters used for training the stacked layers which combined the ten different concatenated images created for the PAN-18 dataset.
Model | Batch size | Dropout | Hidden layer size | Optimizer | Learning rate |
ViT | 16 | 0.5 | 8 | AdamW | 0.001 |
LeViT | 16 | 0.2 | 8 | AdamW | 0.0001 |
Swin Transformer | 16 | 0.2 | 5 | AdamW | 0.001 |
Some Twitter users may not have a suitable image for detecting their gender. However, we are able to retrieve the tweets of most Twitter users. Therefore, training a text classification model for gender recognition could help extract the gender of more users and increase the performance of the model. In both of our Kaggle and PAN-18 datasets, one hundred tweets are available for each user and are used to fine-tune the three transformer-based models, namely, BERTBASE, RoBERTa and ELECTRA for gender recognition.
We found that longer tweets result in a higher accuracy. Therefore, concatenating several tweets and using them for training the models significantly increase the accuracy. Since the number of tokens fed into BERTBASE cannot exceed 512, ten number of tweets could be concatenated at maximum. Thus, for each user we found ten concatenated tweets, and used them to fine-tune the models. This increased the accuracy of the model by 28.8 and 27.9% for the Kaggle and PAN-18 datasets, respectively. The model had three outputs, female, male and brand for the Kaggle dataset and two outputs, female and male for the PAN-18 dataset. The output of the model for each of the concatenated tweets of a user were combined using two fully connected layers. Figure 2 shows the text-classification model for the (A) Kaggle and (B) PAN-18 datasets. Similar to image-classification models, fine-tuning on text-classification models was not sensitive to hyperparameters. However, the stacked layer was highly sensitive to the hyperparameters, especially the optimizer and performed poorly with the SGD optimizer. Moreover, lower learning rates provided a higher accuracy. Table 2 shows the hyperparameters optimized using WandB library for the stacked fully connected network of the two datasets.
Model | Batch size | Dropout | Hidden layer size | Optimizer | Learning rate | |
The Kaggle dataset | BERT | 16 | 0.1 | 10 | Adam | 0.001 |
RoBERTa | 16 | 0.1 | 5 | Adam | 0.001 | |
ELECTRA | 16 | 0.2 | 10 | Adam | 0.001 | |
The PAN-18 dataset | BERT | 32 | 0.2 | 10 | Adam | 0.001 |
RoBERTa | 32 | 0.1 | 5 | Adam | 0.001 | |
ELECTRA | 32 | 0.2 | 10 | Adam | 0.001 |
For each of the Kaggle and PAN-18 datasets, the image and text classification models were combined using a neural network of two stacked layers. A SoftMax layer was placed at the top of the model to get the final outputs. Figures 3 shows the complete model for (A) the Kaggle and (B) the PAN-18 datasets. Since the model was built using three image-classification models and three text-classification models, nine different combinations were possible. Table 3 shows the optimized hyperparameters of the final stacked neural network for the nine different combinations. Our code is available at [48].
Vision model | NLP Model | Batch size | Dropout | Hidden layer size | Optimizer | Learning rate | |
The Kaggle dataset | ViT | BERT | 16 | 0.2 | 5 | AdamW | 10-5 |
RoBERTa | 16 | 0.2 | 5 | AdamW | 10-5 | ||
ELECTRA | 16 | 0.2 | 5 | AdamW | 10-5 | ||
LeViT | BERT | 16 | 0.2 | 5 | AdamW | 10-5 | |
RoBERTa | 16 | 0.2 | 5 | AdamW | 10-5 | ||
ELECTRA | 16 | 0.2 | 5 | AdamW | 10-5 | ||
Swin Transformer | BERT | 16 | 0.5 | 5 | AdamW | 10-5 | |
RoBERTa | 16 | 0.5 | 5 | AdamW | 10-5 | ||
ELECTRA | 16 | 0.5 | 5 | AdamW | 10-5 | ||
The PAN-18 dataset | ViT | BERT | 8 | 0.5 | 5 | AdamW | 10-5 |
RoBERTa | 8 | 0.5 | 5 | AdamW | 10-5 | ||
ELECTRA | 8 | 0.5 | 5 | AdamW | 10-5 | ||
LeViT | BERT | 16 | 0.2 | 8 | AdamW | 10-5 | |
RoBERTa | 16 | 0.2 | 8 | AdamW | 10-5 | ||
ELECTRA | 16 | 0.2 | 8 | AdamW | 10-5 | ||
Swin Transformer | BERT | 16 | 0.5 | 5 | AdamW | 10-5 | |
RoBERTa | 16 | 0.5 | 5 | AdamW | 10-5 | ||
ELECTRA | 16 | 0.5 | 5 | AdamW | 10-5 |
Each of the image-classification models were fine-tuned ten times, so that their statistical significance could be evaluated and compared. The different text-classification models were trained and built ten times, as well. Figure 4 compares the statistical significance of different (A) image- and (B) text-classification models through Mann-Whitney U test. In most cases, a p-value lower than 0.05 is considered significant in statistical analysis [59]. Table 4 compares the maximum accuracies of different models and their precision, recall and f1-scores. The p-value in Figure 4(A) indicates that the accuracy of the LeViT model is significantly lower than that of the ViT and Swin transformer models. This result suggests that transformer models that are enhanced with CNN have a lower accuracy compared to models that are solely built using transformers for our dataset. According to Table 4, Swin transformer provides a higher accuracy compared to the ViT and LeViT models on the Kaggle dataset. The same result is confirmed by Table 4 for the three image-classification models. The p-value in Figure 4(B) shows that the accuracy of BERT is significantly lower than RoBERTa and ELECTRA. Nonetheless, the accuracy of RoBERTa and ELECTRA are not significantly different from each other. However, Table 4 shows that the maximum accuracy of RoBERTa is higher than that of ELECTRA for the Kaggle dataset. Moreover, according to Table 4, the maximum accuracy of BERT is lower than RoBERTa and ELECTRA.
Model | Class | Accuracy | Precision | Recall | F1-score | |
Image-classification models | ViT | Female | 76.87 | 76.05 | 79.11 | 77.54 |
Male | 77.44 | 76.82 | 77.13 | |||
Brand | 75.93 | 75.34 | 75.63 | |||
LeViT | Female | 72.8 | 76.82 | 75.12 | 75.96 | |
Male | 70.18 | 70.68 | 70.43 | |||
Brand | 75.69 | 74.22 | 74.94 | |||
Swin Transformer | Female | 78.86 | 79.92 | 75.32 | 77.55 | |
Male | 81.61 | 74.42 | 77.85 | |||
Brand | 74.12 | 79.09 | 76.52 | |||
Text-classification models | BERT | Female | 83.69 | 82.11 | 83.25 | 82.68 |
Male | 82.02 | 83.17 | 82.59 | |||
Brand | 85.54 | 81.93 | 83.7 | |||
RoBERTa | Female | 84.92 | 83.11 | 85 | 84.04 | |
Male | 83.09 | 85.21 | 84.14 | |||
Brand | 85.91 | 83.11 | 84.49 | |||
ELECTRA | Female | 84.66 | 82.64 | 85.78 | 84.18 | |
Male | 82.81 | 85.16 | 83.97 | |||
Brand | 84.81 | 82.04 | 86.35 |
The three different vision models were combined with the three different NLP models to acquire a higher accuracy through a multimodal approach. Table 5 compares the maximum accuracies and the precision, recall and f1-score of the nine different combinations with each other. Precision indicates the percentage of the correctly classified items detected for a particular class and recall indicates the percentage of the items from a particular class that were actually detected. High precision and recall for all the classes of the final combined model indicate that it can distinguish between all the classes pretty well. According to Table 5, the highest accuracy is obtained when the result of the Swin transformer model is combined with the NLP models and the best accuracy is acquired from the combination of Swin transformer and BERT. The accuracy of all the nine different combination models is higher than that of their image- and text-classification models. The accuracy of the Swin transformer-BERT multimodal is 11.73 and 5.26% higher than the accuracy of Swin transformer and RoBERTa models, respectively.
Vision | NLP | Class | Accuracy | Precision | Recall | F1-score |
ViT | BERT | Female | 82.84 | 79.80 | 83.74 | 81.72 |
Male | 80.09 | 83.22 | 81.63 | |||
Brand | 86.13 | 80.13 | 83.02 | |||
RoBERTa | Female | 83.42 | 81.12 | 85.67 | 83.33 | |
Male | 80.79 | 85.21 | 82.94 | |||
Brand | 86.49 | 81.52 | 83.93 | |||
ELECTRA | Female | 83.11 | 80.02 | 86.04 | 82.92 | |
Male | 80.13 | 85.86 | 82.9 | |||
Brand | 86.17 | 80.42 | 83.2 | |||
LeViT | BERT | Female | 81.47 | 77.23 | 83.15 | 80.08 |
Male | 76.49 | 83.41 | 79.8 | |||
Brand | 84.33 | 78.12 | 81.11 | |||
RoBERTa | Female | 81.79 | 78.71 | 84.39 | 81.45 | |
Male | 78.48 | 84.84 | 81.53 | |||
Brand | 85.12 | 78.97 | 81.93 | |||
ELECTRA | Female | 81.56 | 76.81 | 85.93 | 81.11 | |
Male | 77.28 | 86.10 | 81.45 | |||
Brand | 86.33 | 78.51 | 82.23 | |||
Swin Transformer | BERT | Female | 88.11 | 85.81 | 90.42 | 88.05 |
Male | 86.14 | 89.2 | 87.64 | |||
Brand | 91.39 | 86.12 | 88.68 | |||
RoBERTa | Female | 85.32 | 82.23 | 88.11 | 85.07 | |
Male | 81.91 | 87.87 | 84.78 | |||
Brand | 89.19 | 82.18 | 85.54 | |||
ELECTRA | Female | 85.74 | 82.48 | 88.32 | 85.3 | |
Male | 82.14 | 88.48 | 85.19 | |||
Brand | 88.94 | 82.11 | 85.89 |
Similar to the Kaggle dataset, each image- and text-classification model for the PAN-18 was built and tested ten different times. Figure 5 evaluates the statistical significance of the (A) image- and (B) text-classification models using the Mann-Whitney U test. Moreover, Table 6 compares the maximum accuracies of different vision and NLP models. Figure 5(A) shows that the accuracy of the Swin transformer model is significantly higher than the other two models and the accuracy of ViT is significantly higher than LeViT model. Additionally, according to Table 6, the maximum accuracy observed for Swin transformer is higher than ViT and LeViT and the maximum accuracy observed for ViT is higher than LeViT. Figure 5(B) shows that RoBERTa has a significantly higher accuracy compared to ELECTRA and BERT, but BERT and ELECTRA are not significantly different for the PAN-18 dataset. However, according to Table 6, the maximum accuracy of RoBERTa is higher than BERT and ELECTRA and the maximum accuracy of ELECTRA is higher than BERT.
Model | Class | Accuracy | Precision | Recall | F1-score | |
Image-classification models | ViT | Female | 80.82 | 81.46 | 78.75 | 80.8 |
Male | 79.11 | 82.27 | 80.65 | |||
LeViT | Female | 74.22 | 73.76 | 75.53 | 74.63 | |
Male | 74.89 | 73.11 | 73.99 | |||
Swin Transformer | Female | 82.21 | 83.70 | 80 | 81.81 | |
Male | 80.84 | 84.42 | 82.59 | |||
Text-classification models | BERT | Female | 81.27 | 79.98 | 83.71 | 81.80 |
Male | 83.08 | 79.05 | 81.02 | |||
RoBERTa | Female | 81.89 | 80.54 | 84.11 | 82.29 | |
Male | 83.37 | 79.68 | 81.48 | |||
ELECTRA | Female | 81.42 | 79.24 | 85.16 | 82.09 | |
Male | 83.96 | 77.68 | 80.7 |
Maximum accuracies, and their precision, recall, and f1-score of the nine different multimodal methods for the PAN-18 dataset are compared in Table 7. Table 7 shows that the final model has a high value for the precision, recall and f1-score for the two female and male classes. This means that the model performs well for both of the classes and is capable of distinguishing them from each other very well. In addition, the maximum accuracy of the models combined with Swin transformer is dominantly higher than that of other models. Although RoBERTa significantly had a higher accuracy compared to other NLP models (Figure 5(B)), the best accuracy was obtained when the Swin transformer and BERT were combined. The maximum accuracy of the combination of Swin transformer-BERT is 8.55 and 9.8% higher than that of the Swin transformer and BERT models, respectively.
Vision | NLP | Class | Accuracy | Precision | Recall | F1-score |
ViT | BERT | Female | 86.79 | 87.01 | 86.92 | 86.97 |
Male | 85.63 | 87.12 | 86.37 | |||
RoBERTa | Female | 85.39 | 83.82 | 88.06 | 85.89 | |
Male | 87.34 | 83.24 | 85.24 | |||
ELECTRA | Female | 85.48 | 83.03 | 87.74 | 85.32 | |
Male | 88.31 | 82.96 | 85.55 | |||
LeViT | BERT | Female | 76.87 | 78.41 | 74.21 | 76.25 |
Male | 74.88 | 79.09 | 76.92 | |||
RoBERTa | Female | 79.42 | 82.41 | 78.12 | 80.21 | |
Male | 77.33 | 81.17 | 79.2 | |||
ELECTRA | Female | 78.91 | 80.89 | 77.64 | 79.23 | |
Male | 76.43 | 79.14 | 77.76 | |||
Swin Transformer | BERT | Female | 89.24 | 91.27 | 88.12 | 89.66 |
Male | 87.49 | 90.95 | 89.18 | |||
RoBERTa | Female | 88.36 | 90.13 | 86.97 | 88.52 | |
Male | 86.73 | 89.86 | 88.26 | |||
ELECTRA | Female | 88.22 | 89.93 | 86.92 | 88.4 | |
Male | 87.01 | 89.14 | 88.06 |
Table 8 compares the RoBERTa text-based model with the work done in [56] for the Kaggle dataset, and the RoBERTa text-based model and the Swin transformer-BERT multimodal with the works done in [15,43,60], which had the first, second, and third ranks in the PAN author profiling competition of 2018, for the PAN-18 dataset.
Text-based | Image-based | Overall | ||
The Kaggle dataset | Text-based with RF [56] | 71.22% | - | - |
Text-based with SVM [56] | 69.14% | - | - | |
Our Model (RoBERTa) | 84.09% | - | - | |
The PAN-18 dataset | Multimodal [44] | 79.68% | 81.63% | 85.84% |
Text-based [60] | 82.21% | - | - | |
Multimodal [43] | 80.74% | 69.63% | 81.32% | |
Our Model (RoBERTa) | 81.89% | - | - | |
Our Model (Swin Transformer-BERT) | 81.27% | 82.21% | 89.24% |
Authors in [56] have used the Kaggle dataset to build a text-based gender recognition model. After retrieving additional tweets for the female and male users, the tweets were cleaned and vectorized using a number of methods, namely, BOW, TF-IDF, Word2vec, GLObal VEctor for word representation (GLOVE) and BERT tokenization. Different machine learning algorithms were used to build a gender recognition model. Best results were obtained using GLOVE and random forest (RF) and GLOVE and SVM. We applied the GLOVE-RF and GLOVE-SVM models on our dataset and compared it with the RoBERTa text-classification model for only male and female classes (Table 8). Authors in [15] have used bidirectional GRU for text classification and CNN based on VGG16 for image classification parts. Then, the image and text classification parts are combined using fusion component which includes direct multiplication of text and image feature components. In [60], authors classified Twitter users using only text. They applied TF-IDF and singular value decomposition (SVD) on the tweets to extract the semantics. Then they applied latent semantic analysis (LSA) to extract the semantic topics and fed them into an SVM with linear kernel for gender classification. Authors in [43] proposed an approach for gender identification of PAN-18 dataset using text and image. They applied TF-IDF and then SVD to extract the semantics and use them for gender classification using linear-SVM. For image classification, they stacked three different classification layers. The first layer, low classifier, consisted of four different classifiers, object recognition, facial recognition, color histogram and local binary patterns. They all used linear-SVC except for color histogram that used multinomial naïve bayes (NB). The second classifier layer, meta-classifier used linear-SVC to combined the results of the four classifiers of the previous layer. The third classifier layer, aggregation classifier, combined the meta-classifier results of the ten different images of a user using MultinomialNB. Finally, they combined their text and image classifiers using linear-SVC. Table 2 shows that our multimodal method is superior to all the above models in terms of accuracy.
Previously, some works have used profile images of social media users and some other have used their image contents posted on social media for gender recognition. In this work, we have implemented a transformer-based model for both of the methods. We extended a publicly available dataset for gender recognition with profile images and used the PAN-18 dataset for gender recognition with image content posted on social media. Our results show that using the image content posted by users on social media a higher accuracy is obtained.
To further improve the accuracy of our model, the image-classification model was combined with a text-classification. Different transformer-based image classification models, namely, ViT, LeViT and Swin transformer and text classification models, i.e. BERT, RoBERTa and ELECTRA were explored. Swin transformer dominantly performed better than other vision models for both of the datasets. In contrast, LeViT had a lower accuracy compared to other models on our dataset. This shows that models built solely on transformers have a higher accuracy compared to models enhanced with CNN for our datasets. RoBERTa had a significantly higher accuracy compared to BERT for both of the datasets. However, BERT performed better when combined with Swin transformer. BERT and Swin transformer complemented each other very well and provided the best accuracy of 88.11% and 89.24% for the Kaggle and PAN-18 datasets.
One limitation to our work was lack of suitable dataset. To remove this barrier, we have completed the publicly available dataset on Kaggle and provided approximately 100 tweet IDs for female, male and brand classes that future studies could build up on.
Demographics of social media users are beneficial for research and applications in health, socio-economic inequalities and gender vulnerability. However, such information is not usually and freely available. During periods of upheaval, women are usually at greater risk from the adverse effects and potential losses incurred by these external stressors. They are also the slowest to recover from such emergencies. Integrating governance at widening levels and mitigating the limited economic options of women, are two examples of systemic challenges which require attention for human futurity. However, in many cases, even the data required to document and understand these challenges is not available. This paper addresses these systemic imperatives by providing a framework that can help us to identify the elements of promising emergent governance frameworks to address local and global-scale socio-economic challenges that disproportionately impact women.
In this work we have designed a model based on transformers to detect the gender of Twitter users using both text and image. We have implemented and compared our multimodal method using several transformer models and found that the combination of Swin transformer and BERT complement each other better provides the best accuracy for our datasets.
Future studies could build on our work by using other user information such as descriptions, media posts, comments and likes. Moreover, recognizing other user demographics such as age and ethnicity using transformers could be further investigated. In addition, heuristic methods for identifying user demographics when images are blurry, have low quality, are partially viewed or when people are wearing masks or sunglasses can be studied for higher accuracy and better performance.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
This research is funded by Canada's International Development Research Centre (IDRC) and Swedish International Development Cooperation Agency (SIDA) (Grant No. 109559-001).
The authors declare there is no conflict of interest.
[1] |
M. C. Ho, Y. C. Hung, Z. Y. Liu, I. M. Jiang, Reduced-order synchronization of chaotic systems with parameters unknown, Phys. Lett. A, 348 (2006), 251–259. https://doi.org/10.1016/j.physleta.2005.08.076 doi: 10.1016/j.physleta.2005.08.076
![]() |
[2] |
L. M. Pecora, T. L. Carroll, Synchronization in chaotic systems, Phys. Rev. Lett., 64 (1990), 821. https://doi.org/10.1103/PhysRevLett.64.821 doi: 10.1103/PhysRevLett.64.821
![]() |
[3] |
S. Ha, L. Y. Chen, H. Liu, Command filtered adaptive neural network synchronization control of fractional-order chaotic systems subject to unknown dead zones, J. Franklin Inst., 358 (2021), 3376–3402. https://doi.org/10.1016/j.jfranklin.2021.02.012 doi: 10.1016/j.jfranklin.2021.02.012
![]() |
[4] |
H. Bao, Z. Y. Hua, N. Wang, L. Zhu, M. Chen, B. C. Bao, Initials-boosted coexisting chaos in a 2-d sine map and its hardware implementation, IEEE Trans. Ind. Inf., 17 (2020), 1132–1140. https://doi.org/10.1109/TII.2020.2992438 doi: 10.1109/TII.2020.2992438
![]() |
[5] |
A. Altan, S. Karasu, S. Bekiros, Digital currency forecasting with chaotic meta-heurist bio-inspired signal processing techniques, Chaos Solitons Fractals, 126 (2019), 325–336. https://doi.org/10.1016/j.chaos.2019.07.011 doi: 10.1016/j.chaos.2019.07.011
![]() |
[6] |
C. H. Lin, C. W. Ho, G. H. Hu, B. Sreeramaneni, J. J. Yan, Secure data transmission based on adaptive chattering-free sliding mode synchronization of unied chaotic systems, Mathematics, 21 (2021), 2658. https://doi.org/10.3390/math9212658 doi: 10.3390/math9212658
![]() |
[7] |
G. W. Xu, S. D. Zhao, Y. Cheng, Chaotic synchronization based on improved global nonlinear integral sliding mode control, Comput. Electr. Eng., 96 (2021), 107497. https://doi.org/10.1016/j.compeleceng.2021.107497 doi: 10.1016/j.compeleceng.2021.107497
![]() |
[8] |
A. Izadbakhsh, N. Nikdel, Chaos synchronization using differential equations as extended state observer, Chaos Solitons Fractals, 153 (2021), 111433. https://doi.org/10.1016/j.chaos.2021.111433 doi: 10.1016/j.chaos.2021.111433
![]() |
[9] |
N. Cai, W. Q. Li, Y. W. Jing, Finite-time generalized synchronization of chaotic systems with different order, Nonlinear Dyn., 64 (2011), 385–393. https://doi.org/10.1007/s11071-010-9869-1 doi: 10.1007/s11071-010-9869-1
![]() |
[10] |
X. Y. Chen, T. W. Huang, J. D. Cao, J. H. Park, J. L. Qiu, Finite-time multi-switching sliding mode synchronisation for multiple uncertain complex chaotic systems with net-work transmission mode, IET Control Theory Appl., 13 (2019), 1246–1257. http://dx.doi.org/10.1049/iet-cta.2018.5661 doi: 10.1049/iet-cta.2018.5661
![]() |
[11] |
W. G. Ao, T. D. Ma, R. V. Sanchez, H. T. Gan, Finite-time and fixed-time impulsive synchronization of chaotic systems, J. Franklin Inst., 357 (2020), 11545–11557. https://doi.org/10.1016/j.jfranklin.2019.07.023 doi: 10.1016/j.jfranklin.2019.07.023
![]() |
[12] |
D. Zhang, J. Mei, P. Miao, Global finite-time synchronization of different dimensional chaotic systems, Appl. Math. Modell., 48 (2017), 303–315. https://doi.org/10.1016/j.apm.2017.04.009 doi: 10.1016/j.apm.2017.04.009
![]() |
[13] |
Q. J. Yao, Synchronization of second-order chaotic systems with uncertainties and disturbances using fixed-time adaptive sliding mode control, Chaos Solitons Fractals, 142 (2021), 110372. https://doi.org/10.1016/j.chaos.2020.110372 doi: 10.1016/j.chaos.2020.110372
![]() |
[14] |
M. Dutta, B. K. Roy, A new memductance-based fractional-order chaotic system and its fixed-time synchronisation, Chaos Solitons Fractals, 145 (2021), 110782. https://doi.org/10.1016/j.chaos.2021.110782 doi: 10.1016/j.chaos.2021.110782
![]() |
[15] |
Y. M. Sun, F. Wang, Z. Liu, Y. Zhang, C. P. Chen, Fixed-time fuzzy control for a class of nonlinear systems, IEEE Trans. Cyber., 52 (2020), 3880–3887. https://doi.org/10.1109/TCYB.2020.3018695 doi: 10.1109/TCYB.2020.3018695
![]() |
[16] |
S. Z. Xie, Q. Chen, Adaptive nonsingular predfiened-time control for attitude stabilization of rigid spacecrafts, IEEE Trans. Circuits Syst. II Express Briefs, 69 (2021), 189–193. https://doi.org/10.1109/TCSII.2021.3078708 doi: 10.1109/TCSII.2021.3078708
![]() |
[17] |
Y. Sun, Y. Gao, Y. Zhao, Z. Liu, J. Wang, J. Kuang, et al., Neural network-based tracking control of uncertain robotic systems: Predefined-time nonsingular terminal sliding-mode approach, IEEE Trans. Ind. Electron., 69 (2020), 10510–10520. https://doi.org/10.1109/TIE.2022.3161810 doi: 10.1109/TIE.2022.3161810
![]() |
[18] |
Y. Wang, H. Y. Li, Y. Guan, M. S. Chen, Predefine-time chaos synchronization of memristor chaotic systems by using simplified control inputs, Chaos Solitons Fractals, 161 (2022), 112282. https://doi.org/10.1016/j.chaos.2022.112282 doi: 10.1016/j.chaos.2022.112282
![]() |
[19] |
E. A. Assali, Predefined-time synchronization of chaotic systems with different dimensions and applications, Chaos Solitons Fractals, 147 (2021), 110988. https://doi.org/10.1016/j.chaos.2021.110988 doi: 10.1016/j.chaos.2021.110988
![]() |
[20] |
E. R. Wang, S. H. Yan, Q. Y. Wang, A new four-dimensional chaotic system with multistability and its predefined-time synchronization, Int. J. Bifurcation Chaos, 32 (2022), 2250207. https://doi.org/10.1142/S0218127422502078 doi: 10.1142/S0218127422502078
![]() |
[21] |
H. B. Xue, X. H. Liu, A novel fast terminal sliding mode with predefined-time synchronization, Chaos Solitons Fractals, 175 (2023), 114049. https://doi.org/10.1016/j.chaos.2023.114049 doi: 10.1016/j.chaos.2023.114049
![]() |
[22] |
C. A. Anguiano-Gijón, A. J. Muñoz-Vázquez, J. D. Sánchez-Torres, G. Romero-Galván, F. Martínez-Reyes, On predefined-time synchronisation of chaotic systems, Chaos Solitons Fractals, 122 (2019), 172–178. https://doi.org/10.1016/j.chaos.2019.03.015 doi: 10.1016/j.chaos.2019.03.015
![]() |
[23] |
A. J. Muñoz-Vázquez, J. D. Sánchez-Torres, C. A. Anguiano-Gijón, Single-channel predefined-time synchronisation of chaotic systems, Asian J. Control, 23 (2021), 190–198. https://doi.org/10.1002/asjc.2234 doi: 10.1002/asjc.2234
![]() |
[24] |
M. J. Zhang, H. Y. Zang, L. Y. Bai, A new predefined-time sliding mode control scheme for synchronizing chaotic systems, Chaos Solitons Fractals, 14 (2022), 112745. https://doi.org/10.1016/j.chaos.2022.112745 doi: 10.1016/j.chaos.2022.112745
![]() |
[25] |
J. K. Ni, C. X. Liu, K. Liu, L. Liu, Finite-time sliding mode synchronization of chaotic systems, Chin. Phys. B, 23 (2014), 100504. https://doi.org/10.1088/1674-1056/23/10/100504 doi: 10.1088/1674-1056/23/10/100504
![]() |
[26] |
M. Shirkavand, M. Pourgholi, Robust fixed-time synchronization of fractional order chaotic using free chattering nonsingular adaptive fractional sliding mode controller design, Chaos Solitons Fractals, 113 (2018), 135–147. https://doi.org/10.1016/j.chaos.2018.05.020 doi: 10.1016/j.chaos.2018.05.020
![]() |
[27] |
Q. P. Li, C. Yue, Predefined-time polynomial-function-based synchronization of chaotic systems via a novel sliding mode control, IEEE Access, 8 (2020), 162149–162162. https://doi.org/10.1109/ACCESS.2020.3021094 doi: 10.1109/ACCESS.2020.3021094
![]() |
[28] |
P. Li, L. Di, B. Simone, Boundary-layer control with unstructured uncertainties with application to adaptive autopilots, IEEE Trans. Control Syst. Technol., 2023 (2023). https://doi.org/10.1109/TCST.2023.3329908 doi: 10.1109/TCST.2023.3329908
![]() |
[29] |
J. Zhao, Y. F. Lv, Output-feedback robust tracking control of uncertain systems via adaptive learning, Int. J. Control Autom. Syst., 21 (2023), 1108–1118. https://doi.org/10.1007/s12555-021-0882-6 doi: 10.1007/s12555-021-0882-6
![]() |
[30] |
Y. X. Wang, Heterogeneous network representation learning approach for ethereum identity identification, IEEE Trans. Comput. Soc. Syst., 10 (2022), 890–899. https://doi.org/10.1109/TCSS.2022.3164719 doi: 10.1109/TCSS.2022.3164719
![]() |
[31] |
W. Qi, S. E. Ovur, Z. Li, A. Marzullo, R. Song, Multi-sensor guided hand gesture recognition for a teleoperated robot using a recurrent neural network, IEEE Rob. Autom. Lett., 6 (2021), 6039–6045. https://doi.org/10.1109/LRA.2021.3089999 doi: 10.1109/LRA.2021.3089999
![]() |
[32] |
M. P. Aghababa, S. Khanmohammadi, G. Alizadeh, Finite-time synchronization of two different chaotic systems with unknown parameters via sliding mode technique, Appl. Math. Modell., 35 (2011), 3080–3091. https://doi.org/10.1016/j.apm.2010.12.020 doi: 10.1016/j.apm.2010.12.020
![]() |
[33] |
X. Z. Guo, G. G. Wen, Z. X. Peng, Y. L. Zhang, Global fixed-time synchronization of chaotic systems with different dimensions, J. Franklin Inst., 357 (2020), 1155–1173. https://doi.org/10.1016/j.jfranklin.2019.11.063 doi: 10.1016/j.jfranklin.2019.11.063
![]() |
[34] |
J. D. Sánchez-Torres, D. Gómez-Gutiérrez, E. López, A. G. Loukianov, A class of predefined-time stable dynamical systems, IMA J. Math. Control Inf., 35 (2018), 1–29. https://doi.org/10.1093/imamci/dnx004 doi: 10.1093/imamci/dnx004
![]() |
[35] |
A. J. Munoz-Vazquez, J. D. Sanchez-Torres, M. l. Defoort, S, Boulaaras, Predefined-time convergence in fractional-order systems, Chaos Solitons Fractals, 143 (2021), 110571. https://doi.org/10.1016/j.chaos.2020.110571 doi: 10.1016/j.chaos.2020.110571
![]() |
[36] |
Q. Wang, J. D. Cao, H. Liu, Adaptive fuzzy control of nonlinear systems with predefined time and accuracy, IEEE Trans. Fuzzy Syst., 30 (2022), 5152–5165. https://doi.org/10.1109/TFUZZ.2022.3169852 doi: 10.1109/TFUZZ.2022.3169852
![]() |
[37] |
S. Sahoo, B. K. Roy, Design of multi-wing chaotic systems with higher largest lyapunov exponent, Chaos Solitons Fractals, 157 (2022), 111926. https://doi.org/10.1016/j.chaos.2022.111926 doi: 10.1016/j.chaos.2022.111926
![]() |
1. | Nicholas Perikli, Srimoy Bhattacharya, Blessing Ogbuokiri, Zahra Movahedi Nia, Benjamin Lieberman, Nidhi Tripathi, Salah-Eddine Dahbi, Finn Stevenson, Nicola Bragazzi, Jude Kong, Bruce Mellado, Frank Rudzicz, Evaluating automatic annotation of lexicon-based models for stance detection of M-pox tweets from May 1st to Sep 5th, 2022, 2024, 3, 2767-3170, e0000545, 10.1371/journal.pdig.0000545 | |
2. | Md Jahangir Alam, Sultan Ahmed, Ismail Hossain, Sai Puppala, Zahidur Talukder, Sajedul Talukder, 2024, SocialGuard: Bangla Text-Based Gender Identification for Enhancing Integrity in Social Networks, 979-8-3503-7696-8, 594, 10.1109/COMPSAC61105.2024.00086 | |
3. | Aditya Riyanto, Amalia Zahra, 2024, Dual Word Embedding Framework for Gender-Based Writing Style Analysis Using Word2Vec and BERT, 979-8-3315-0857-9, 527, 10.1109/BTS-I2C63534.2024.10942182 |
Model | Batch size | Dropout | Hidden layer size | Optimizer | Learning rate |
ViT | 16 | 0.5 | 8 | AdamW | 0.001 |
LeViT | 16 | 0.2 | 8 | AdamW | 0.0001 |
Swin Transformer | 16 | 0.2 | 5 | AdamW | 0.001 |
Model | Batch size | Dropout | Hidden layer size | Optimizer | Learning rate | |
The Kaggle dataset | BERT | 16 | 0.1 | 10 | Adam | 0.001 |
RoBERTa | 16 | 0.1 | 5 | Adam | 0.001 | |
ELECTRA | 16 | 0.2 | 10 | Adam | 0.001 | |
The PAN-18 dataset | BERT | 32 | 0.2 | 10 | Adam | 0.001 |
RoBERTa | 32 | 0.1 | 5 | Adam | 0.001 | |
ELECTRA | 32 | 0.2 | 10 | Adam | 0.001 |
Vision model | NLP Model | Batch size | Dropout | Hidden layer size | Optimizer | Learning rate | |
The Kaggle dataset | ViT | BERT | 16 | 0.2 | 5 | AdamW | 10-5 |
RoBERTa | 16 | 0.2 | 5 | AdamW | 10-5 | ||
ELECTRA | 16 | 0.2 | 5 | AdamW | 10-5 | ||
LeViT | BERT | 16 | 0.2 | 5 | AdamW | 10-5 | |
RoBERTa | 16 | 0.2 | 5 | AdamW | 10-5 | ||
ELECTRA | 16 | 0.2 | 5 | AdamW | 10-5 | ||
Swin Transformer | BERT | 16 | 0.5 | 5 | AdamW | 10-5 | |
RoBERTa | 16 | 0.5 | 5 | AdamW | 10-5 | ||
ELECTRA | 16 | 0.5 | 5 | AdamW | 10-5 | ||
The PAN-18 dataset | ViT | BERT | 8 | 0.5 | 5 | AdamW | 10-5 |
RoBERTa | 8 | 0.5 | 5 | AdamW | 10-5 | ||
ELECTRA | 8 | 0.5 | 5 | AdamW | 10-5 | ||
LeViT | BERT | 16 | 0.2 | 8 | AdamW | 10-5 | |
RoBERTa | 16 | 0.2 | 8 | AdamW | 10-5 | ||
ELECTRA | 16 | 0.2 | 8 | AdamW | 10-5 | ||
Swin Transformer | BERT | 16 | 0.5 | 5 | AdamW | 10-5 | |
RoBERTa | 16 | 0.5 | 5 | AdamW | 10-5 | ||
ELECTRA | 16 | 0.5 | 5 | AdamW | 10-5 |
Model | Class | Accuracy | Precision | Recall | F1-score | |
Image-classification models | ViT | Female | 76.87 | 76.05 | 79.11 | 77.54 |
Male | 77.44 | 76.82 | 77.13 | |||
Brand | 75.93 | 75.34 | 75.63 | |||
LeViT | Female | 72.8 | 76.82 | 75.12 | 75.96 | |
Male | 70.18 | 70.68 | 70.43 | |||
Brand | 75.69 | 74.22 | 74.94 | |||
Swin Transformer | Female | 78.86 | 79.92 | 75.32 | 77.55 | |
Male | 81.61 | 74.42 | 77.85 | |||
Brand | 74.12 | 79.09 | 76.52 | |||
Text-classification models | BERT | Female | 83.69 | 82.11 | 83.25 | 82.68 |
Male | 82.02 | 83.17 | 82.59 | |||
Brand | 85.54 | 81.93 | 83.7 | |||
RoBERTa | Female | 84.92 | 83.11 | 85 | 84.04 | |
Male | 83.09 | 85.21 | 84.14 | |||
Brand | 85.91 | 83.11 | 84.49 | |||
ELECTRA | Female | 84.66 | 82.64 | 85.78 | 84.18 | |
Male | 82.81 | 85.16 | 83.97 | |||
Brand | 84.81 | 82.04 | 86.35 |
Vision | NLP | Class | Accuracy | Precision | Recall | F1-score |
ViT | BERT | Female | 82.84 | 79.80 | 83.74 | 81.72 |
Male | 80.09 | 83.22 | 81.63 | |||
Brand | 86.13 | 80.13 | 83.02 | |||
RoBERTa | Female | 83.42 | 81.12 | 85.67 | 83.33 | |
Male | 80.79 | 85.21 | 82.94 | |||
Brand | 86.49 | 81.52 | 83.93 | |||
ELECTRA | Female | 83.11 | 80.02 | 86.04 | 82.92 | |
Male | 80.13 | 85.86 | 82.9 | |||
Brand | 86.17 | 80.42 | 83.2 | |||
LeViT | BERT | Female | 81.47 | 77.23 | 83.15 | 80.08 |
Male | 76.49 | 83.41 | 79.8 | |||
Brand | 84.33 | 78.12 | 81.11 | |||
RoBERTa | Female | 81.79 | 78.71 | 84.39 | 81.45 | |
Male | 78.48 | 84.84 | 81.53 | |||
Brand | 85.12 | 78.97 | 81.93 | |||
ELECTRA | Female | 81.56 | 76.81 | 85.93 | 81.11 | |
Male | 77.28 | 86.10 | 81.45 | |||
Brand | 86.33 | 78.51 | 82.23 | |||
Swin Transformer | BERT | Female | 88.11 | 85.81 | 90.42 | 88.05 |
Male | 86.14 | 89.2 | 87.64 | |||
Brand | 91.39 | 86.12 | 88.68 | |||
RoBERTa | Female | 85.32 | 82.23 | 88.11 | 85.07 | |
Male | 81.91 | 87.87 | 84.78 | |||
Brand | 89.19 | 82.18 | 85.54 | |||
ELECTRA | Female | 85.74 | 82.48 | 88.32 | 85.3 | |
Male | 82.14 | 88.48 | 85.19 | |||
Brand | 88.94 | 82.11 | 85.89 |
Model | Class | Accuracy | Precision | Recall | F1-score | |
Image-classification models | ViT | Female | 80.82 | 81.46 | 78.75 | 80.8 |
Male | 79.11 | 82.27 | 80.65 | |||
LeViT | Female | 74.22 | 73.76 | 75.53 | 74.63 | |
Male | 74.89 | 73.11 | 73.99 | |||
Swin Transformer | Female | 82.21 | 83.70 | 80 | 81.81 | |
Male | 80.84 | 84.42 | 82.59 | |||
Text-classification models | BERT | Female | 81.27 | 79.98 | 83.71 | 81.80 |
Male | 83.08 | 79.05 | 81.02 | |||
RoBERTa | Female | 81.89 | 80.54 | 84.11 | 82.29 | |
Male | 83.37 | 79.68 | 81.48 | |||
ELECTRA | Female | 81.42 | 79.24 | 85.16 | 82.09 | |
Male | 83.96 | 77.68 | 80.7 |
Vision | NLP | Class | Accuracy | Precision | Recall | F1-score |
ViT | BERT | Female | 86.79 | 87.01 | 86.92 | 86.97 |
Male | 85.63 | 87.12 | 86.37 | |||
RoBERTa | Female | 85.39 | 83.82 | 88.06 | 85.89 | |
Male | 87.34 | 83.24 | 85.24 | |||
ELECTRA | Female | 85.48 | 83.03 | 87.74 | 85.32 | |
Male | 88.31 | 82.96 | 85.55 | |||
LeViT | BERT | Female | 76.87 | 78.41 | 74.21 | 76.25 |
Male | 74.88 | 79.09 | 76.92 | |||
RoBERTa | Female | 79.42 | 82.41 | 78.12 | 80.21 | |
Male | 77.33 | 81.17 | 79.2 | |||
ELECTRA | Female | 78.91 | 80.89 | 77.64 | 79.23 | |
Male | 76.43 | 79.14 | 77.76 | |||
Swin Transformer | BERT | Female | 89.24 | 91.27 | 88.12 | 89.66 |
Male | 87.49 | 90.95 | 89.18 | |||
RoBERTa | Female | 88.36 | 90.13 | 86.97 | 88.52 | |
Male | 86.73 | 89.86 | 88.26 | |||
ELECTRA | Female | 88.22 | 89.93 | 86.92 | 88.4 | |
Male | 87.01 | 89.14 | 88.06 |
Text-based | Image-based | Overall | ||
The Kaggle dataset | Text-based with RF [56] | 71.22% | - | - |
Text-based with SVM [56] | 69.14% | - | - | |
Our Model (RoBERTa) | 84.09% | - | - | |
The PAN-18 dataset | Multimodal [44] | 79.68% | 81.63% | 85.84% |
Text-based [60] | 82.21% | - | - | |
Multimodal [43] | 80.74% | 69.63% | 81.32% | |
Our Model (RoBERTa) | 81.89% | - | - | |
Our Model (Swin Transformer-BERT) | 81.27% | 82.21% | 89.24% |
Model | Batch size | Dropout | Hidden layer size | Optimizer | Learning rate |
ViT | 16 | 0.5 | 8 | AdamW | 0.001 |
LeViT | 16 | 0.2 | 8 | AdamW | 0.0001 |
Swin Transformer | 16 | 0.2 | 5 | AdamW | 0.001 |
Model | Batch size | Dropout | Hidden layer size | Optimizer | Learning rate | |
The Kaggle dataset | BERT | 16 | 0.1 | 10 | Adam | 0.001 |
RoBERTa | 16 | 0.1 | 5 | Adam | 0.001 | |
ELECTRA | 16 | 0.2 | 10 | Adam | 0.001 | |
The PAN-18 dataset | BERT | 32 | 0.2 | 10 | Adam | 0.001 |
RoBERTa | 32 | 0.1 | 5 | Adam | 0.001 | |
ELECTRA | 32 | 0.2 | 10 | Adam | 0.001 |
Vision model | NLP Model | Batch size | Dropout | Hidden layer size | Optimizer | Learning rate | |
The Kaggle dataset | ViT | BERT | 16 | 0.2 | 5 | AdamW | 10-5 |
RoBERTa | 16 | 0.2 | 5 | AdamW | 10-5 | ||
ELECTRA | 16 | 0.2 | 5 | AdamW | 10-5 | ||
LeViT | BERT | 16 | 0.2 | 5 | AdamW | 10-5 | |
RoBERTa | 16 | 0.2 | 5 | AdamW | 10-5 | ||
ELECTRA | 16 | 0.2 | 5 | AdamW | 10-5 | ||
Swin Transformer | BERT | 16 | 0.5 | 5 | AdamW | 10-5 | |
RoBERTa | 16 | 0.5 | 5 | AdamW | 10-5 | ||
ELECTRA | 16 | 0.5 | 5 | AdamW | 10-5 | ||
The PAN-18 dataset | ViT | BERT | 8 | 0.5 | 5 | AdamW | 10-5 |
RoBERTa | 8 | 0.5 | 5 | AdamW | 10-5 | ||
ELECTRA | 8 | 0.5 | 5 | AdamW | 10-5 | ||
LeViT | BERT | 16 | 0.2 | 8 | AdamW | 10-5 | |
RoBERTa | 16 | 0.2 | 8 | AdamW | 10-5 | ||
ELECTRA | 16 | 0.2 | 8 | AdamW | 10-5 | ||
Swin Transformer | BERT | 16 | 0.5 | 5 | AdamW | 10-5 | |
RoBERTa | 16 | 0.5 | 5 | AdamW | 10-5 | ||
ELECTRA | 16 | 0.5 | 5 | AdamW | 10-5 |
Model | Class | Accuracy | Precision | Recall | F1-score | |
Image-classification models | ViT | Female | 76.87 | 76.05 | 79.11 | 77.54 |
Male | 77.44 | 76.82 | 77.13 | |||
Brand | 75.93 | 75.34 | 75.63 | |||
LeViT | Female | 72.8 | 76.82 | 75.12 | 75.96 | |
Male | 70.18 | 70.68 | 70.43 | |||
Brand | 75.69 | 74.22 | 74.94 | |||
Swin Transformer | Female | 78.86 | 79.92 | 75.32 | 77.55 | |
Male | 81.61 | 74.42 | 77.85 | |||
Brand | 74.12 | 79.09 | 76.52 | |||
Text-classification models | BERT | Female | 83.69 | 82.11 | 83.25 | 82.68 |
Male | 82.02 | 83.17 | 82.59 | |||
Brand | 85.54 | 81.93 | 83.7 | |||
RoBERTa | Female | 84.92 | 83.11 | 85 | 84.04 | |
Male | 83.09 | 85.21 | 84.14 | |||
Brand | 85.91 | 83.11 | 84.49 | |||
ELECTRA | Female | 84.66 | 82.64 | 85.78 | 84.18 | |
Male | 82.81 | 85.16 | 83.97 | |||
Brand | 84.81 | 82.04 | 86.35 |
Vision | NLP | Class | Accuracy | Precision | Recall | F1-score |
ViT | BERT | Female | 82.84 | 79.80 | 83.74 | 81.72 |
Male | 80.09 | 83.22 | 81.63 | |||
Brand | 86.13 | 80.13 | 83.02 | |||
RoBERTa | Female | 83.42 | 81.12 | 85.67 | 83.33 | |
Male | 80.79 | 85.21 | 82.94 | |||
Brand | 86.49 | 81.52 | 83.93 | |||
ELECTRA | Female | 83.11 | 80.02 | 86.04 | 82.92 | |
Male | 80.13 | 85.86 | 82.9 | |||
Brand | 86.17 | 80.42 | 83.2 | |||
LeViT | BERT | Female | 81.47 | 77.23 | 83.15 | 80.08 |
Male | 76.49 | 83.41 | 79.8 | |||
Brand | 84.33 | 78.12 | 81.11 | |||
RoBERTa | Female | 81.79 | 78.71 | 84.39 | 81.45 | |
Male | 78.48 | 84.84 | 81.53 | |||
Brand | 85.12 | 78.97 | 81.93 | |||
ELECTRA | Female | 81.56 | 76.81 | 85.93 | 81.11 | |
Male | 77.28 | 86.10 | 81.45 | |||
Brand | 86.33 | 78.51 | 82.23 | |||
Swin Transformer | BERT | Female | 88.11 | 85.81 | 90.42 | 88.05 |
Male | 86.14 | 89.2 | 87.64 | |||
Brand | 91.39 | 86.12 | 88.68 | |||
RoBERTa | Female | 85.32 | 82.23 | 88.11 | 85.07 | |
Male | 81.91 | 87.87 | 84.78 | |||
Brand | 89.19 | 82.18 | 85.54 | |||
ELECTRA | Female | 85.74 | 82.48 | 88.32 | 85.3 | |
Male | 82.14 | 88.48 | 85.19 | |||
Brand | 88.94 | 82.11 | 85.89 |
Model | Class | Accuracy | Precision | Recall | F1-score | |
Image-classification models | ViT | Female | 80.82 | 81.46 | 78.75 | 80.8 |
Male | 79.11 | 82.27 | 80.65 | |||
LeViT | Female | 74.22 | 73.76 | 75.53 | 74.63 | |
Male | 74.89 | 73.11 | 73.99 | |||
Swin Transformer | Female | 82.21 | 83.70 | 80 | 81.81 | |
Male | 80.84 | 84.42 | 82.59 | |||
Text-classification models | BERT | Female | 81.27 | 79.98 | 83.71 | 81.80 |
Male | 83.08 | 79.05 | 81.02 | |||
RoBERTa | Female | 81.89 | 80.54 | 84.11 | 82.29 | |
Male | 83.37 | 79.68 | 81.48 | |||
ELECTRA | Female | 81.42 | 79.24 | 85.16 | 82.09 | |
Male | 83.96 | 77.68 | 80.7 |
Vision | NLP | Class | Accuracy | Precision | Recall | F1-score |
ViT | BERT | Female | 86.79 | 87.01 | 86.92 | 86.97 |
Male | 85.63 | 87.12 | 86.37 | |||
RoBERTa | Female | 85.39 | 83.82 | 88.06 | 85.89 | |
Male | 87.34 | 83.24 | 85.24 | |||
ELECTRA | Female | 85.48 | 83.03 | 87.74 | 85.32 | |
Male | 88.31 | 82.96 | 85.55 | |||
LeViT | BERT | Female | 76.87 | 78.41 | 74.21 | 76.25 |
Male | 74.88 | 79.09 | 76.92 | |||
RoBERTa | Female | 79.42 | 82.41 | 78.12 | 80.21 | |
Male | 77.33 | 81.17 | 79.2 | |||
ELECTRA | Female | 78.91 | 80.89 | 77.64 | 79.23 | |
Male | 76.43 | 79.14 | 77.76 | |||
Swin Transformer | BERT | Female | 89.24 | 91.27 | 88.12 | 89.66 |
Male | 87.49 | 90.95 | 89.18 | |||
RoBERTa | Female | 88.36 | 90.13 | 86.97 | 88.52 | |
Male | 86.73 | 89.86 | 88.26 | |||
ELECTRA | Female | 88.22 | 89.93 | 86.92 | 88.4 | |
Male | 87.01 | 89.14 | 88.06 |
Text-based | Image-based | Overall | ||
The Kaggle dataset | Text-based with RF [56] | 71.22% | - | - |
Text-based with SVM [56] | 69.14% | - | - | |
Our Model (RoBERTa) | 84.09% | - | - | |
The PAN-18 dataset | Multimodal [44] | 79.68% | 81.63% | 85.84% |
Text-based [60] | 82.21% | - | - | |
Multimodal [43] | 80.74% | 69.63% | 81.32% | |
Our Model (RoBERTa) | 81.89% | - | - | |
Our Model (Swin Transformer-BERT) | 81.27% | 82.21% | 89.24% |