
Citation: I. Banegas, I. Prieto, A.B. Segarra, M. Martínez-Cañamero, M. de Gasparo, M. Ramírez-Sánchez. Angiotensin II, dopamine and nitric oxide. An asymmetrical neurovisceral interaction between brain and plasma to regulate blood pressure[J]. AIMS Neuroscience, 2019, 6(3): 116-127. doi: 10.3934/Neuroscience.2019.3.116
[1] | Faiz Ul Islam, Guangjie Liu, Weiwei Liu . Identifying VoIP traffic in VPN tunnel via Flow Spatio-Temporal Features. Mathematical Biosciences and Engineering, 2020, 17(5): 4747-4772. doi: 10.3934/mbe.2020260 |
[2] | Xiaolin Gui, Yuanlong Cao, Ilsun You, Lejun Ji, Yong Luo, Zhenzhen Luo . A Survey of techniques for fine-grained web traffic identification and classification. Mathematical Biosciences and Engineering, 2022, 19(3): 2996-3021. doi: 10.3934/mbe.2022138 |
[3] | Jin Wang, Liping Wang, Ruiqing Wang . MFFLR-DDoS: An encrypted LR-DDoS attack detection method based on multi-granularity feature fusions in SDN. Mathematical Biosciences and Engineering, 2024, 21(3): 4187-4209. doi: 10.3934/mbe.2024185 |
[4] | Shivani Gaba, Ishan Budhiraja, Vimal Kumar, Aaisha Makkar . Advancements in enhancing cyber-physical system security: Practical deep learning solutions for network traffic classification and integration with security technologies. Mathematical Biosciences and Engineering, 2024, 21(1): 1527-1553. doi: 10.3934/mbe.2024066 |
[5] | Zilong Liu, Jingbing Li, Jing Liu . Encrypted face recognition algorithm based on Ridgelet-DCT transform and THM chaos. Mathematical Biosciences and Engineering, 2022, 19(2): 1373-1387. doi: 10.3934/mbe.2022063 |
[6] | Chenhao Wu, Lei Chen . A model with deep analysis on a large drug network for drug classification. Mathematical Biosciences and Engineering, 2023, 20(1): 383-401. doi: 10.3934/mbe.2023018 |
[7] | Kaimeng Chen, Chin-Chen Chang . High-capacity reversible data hiding in encrypted images based on two-phase histogram shifting. Mathematical Biosciences and Engineering, 2019, 16(5): 3947-3964. doi: 10.3934/mbe.2019195 |
[8] | Shufen Niu, Wei Liu, Sen Yan, Qi Liu . Message sharing scheme based on edge computing in IoV. Mathematical Biosciences and Engineering, 2023, 20(12): 20809-20827. doi: 10.3934/mbe.2023921 |
[9] | Yiqin Bao, Qiang Zhao, Jie Sun, Wenbin Xu, Hongbing Lu . An edge cloud and Fibonacci-Diffie-Hellman encryption scheme for secure printer data transmission. Mathematical Biosciences and Engineering, 2024, 21(1): 96-115. doi: 10.3934/mbe.2024005 |
[10] | Chunkai Zhang, Ao Yin, Wei Zuo, Yingyang Chen . Privacy preserving anomaly detection based on local density estimation. Mathematical Biosciences and Engineering, 2020, 17(4): 3478-3497. doi: 10.3934/mbe.2020196 |
Various studies are being conducted on privacy protection in the field of computer communication [1,2,3]. It is known that when plaintext is transmitted without encryption, it can be eavesdropped on and intercepted. Hence, encrypted communication protocols are being used to protect privacy by encrypting transmission data. In 2017, Gartner predicted that more than 80% of web traffic can be encrypted [4]. According to the statistics reported as of August 2022, more than 80% of web pages were loaded as Hypertext Transfer Protocol Secure (HTTPS) over Transport Layer Security (TLS) protocols [5].
Encryption communication protocols protect transmission data using encryption algorithms. Clients and servers negotiate the encryption methods and share encryption keys before transmitting the data. After the negotiation, the data are encrypted and transmitted. This method ensures that the confidentiality of the data is maintained by making it impossible to know the content, even if a third- party attempts to eavesdrop and intercept the data. However, encryption communication protocols can be used for malicious purposes, thus posing cybersecurity issues. Cisco expected that by 2021, more than 70% of web malware would encrypt traffic, whereas 60% of organizations would fail to detect web malware traffic [6].
Attackers threaten cybersecurity by exploiting the fact that third parties are incapable of accessing the contents of transmission data when encryption communication protocols are used, as shown in Figure 1.
In general, the network security equipment is installed in the actual network path or replicates communication packets during traffic inspection. However, the data being transmitted cannot be decrypted if the traffic is encrypted. Therefore, the network security equipment cannot check the contents and take necessary steps based on analyses. Hence, attackers exploit the limited visibility of network security equipment to conduct cyberattacks such as leakage of internal information assets or sending of triggers to malicious codes to penetrate the internal network. It is, therefore, difficult to specify the attempt and scope of the attacks that use encrypted traffic, posing a significant threat.
To manage the risk of encryption communication protocol abuse, decryption must be performed to ensure the visibility of encrypted traffic data and for performing inspections. However, decrypting data for inspection risks privacy infringement while incurring additional computing resources and decryption time. In addition, the time delay between the client and server due to decryption and inspection increases, which can reduce the quality of the user experience. To overcome these disadvantages, it is necessary to classify and analyze the encrypted traffic without using a decryption process. To this end, network fingerprinting technology is one of the most promising candidates.
Traditional network fingerprinting techniques have been used to perform OS identification using header information in the TCP/IP stack or identify network topology using IP address information [7]. The procedure depends on the existing network environment configuration, on-premises server, and network configuration. However, with the advent of modern cloud computing and software-defined network technologies, network fingerprinting is expected to be less effective as the boundaries of the network become blurred and the configuration of services that are not dependent on traditional network address schemes increases. In addition, traffic-type classifiers utilizing traditional network fingerprinting operate based on the TCP/IP packet length [8]. The padding of encryption algorithms used in cryptographic communication protocols is expected to act as noise obstructing the input of traffic-type classifiers, thus reducing classifier accuracy. TLS fingerprinting is emerging as a method to compensate for these shortcomings.
TLS fingerprinting uses the handshake and header information of the TLS stack and encrypted application data to identify client applications and web servers and identify and classify the types of content sent. Instead of relying on IP addresses, it generates fingerprints by extracting specific information (such as encryption algorithm chutes and extensions) characterized by a TLS handshake. It also targets encrypted packets to extract features and generate classifiers to respond to encrypted traffic. This technology can be used alone and in conjunction with technology and other network fingerprinting technologies in the existing TCP/IP stack to improve accuracy. Cybersecurity, therefore, requires a broad understanding of TLS fingerprinting to respond to changing networks.
This study provides a broad overview of encrypted network traffic fingerprinting techniques that analyze encrypted traffic without decryption. Fingerprinting is used to classify and identify client applications or servers. Fingerprints are generated through non-encrypted data from cryptographic negotiation (handshake) messages. They are then compared against fingerprint databases, or machine learning, artificial intelligence (AI), and statistical techniques are applied to a large number of datasets to generate identifiers and classifiers.
The contributions of this study are divided into two categories. First, encrypted network traffic fingerprinting techniques are investigated, and several fingerprint generation and accuracy improvement methods are described. Second, the investigation results are analyzed by organizing the identifier and classifier generation techniques. They are categorized and described based on the method and features used for identification and classification. Lastly, journals, conferences, research papers, and internet documents are surveyed. The findings are classified and described using the taxonomy shown in Figure 2.
The manuscript consists of six sections. Section 2 explains the background information required for understanding the fingerprinting techniques. Section 3 describes encrypted traffic fingerprint techniques and studies based on them. Section 4 discusses identifier and classifier generation techniques. The last section concludes the paper by presenting the results and future scope of the current research.
Our work provides extensive information on TLS fingerprinting gained through investigation and analysis. Prior to the investigation of detailed techniques, we discuss them in relation to a survey study of techniques targeting encrypted traffic [9,10,11,12]. Survey studies have been performed for classification [9,10], detection [12], and analysis [11] as problem domains. Among them, Refs. [9,10,11] focus on techniques using machine learning. In addition, we refer to relevant studies [13,14,15,16] for explanations of techniques using privacy, graph networks, time series data, etc., discussed in the paper. We summarize the related survey papers below, with Table 1 providing a comparison of the problem domains, methods, and protocols of the survey papers described.
Survey | Problem domains | Method | Protocols |
[9] | Classification | Focus on ML-based | Various |
[10] | Classification | ML-based | Various |
[11] | Detection | Focus on ML-based | TLS |
[12] | Analysis | Various | Various |
Present study | Fingerprinting | Various | TLS |
Velan et al. [9] discussed classification and analysis techniques for Internet Protocol Security, TLS, Secure Shell Protocol, BitTorrent, and Skype protocol traffic. The classification technique divided the information extraction steps into non-encrypted initialization steps and the encrypted data transfer steps into payload- and feature-based methods. Payload-based methods include techniques that perform pattern matching on payload, which is the transmission of data or operations based on information such as the payload size, port number, and IP address that can be obtained from packets without processing. Feature-based methods extract features from encrypted communication patterns and utilize maps, unsupervised methods, hybrid machine learning, and basic statistical analysis. Pacheco et al. [10] presented a study that systematically described techniques using machine learning for the classification of encrypted network traffic. Basic machine learning required for traffic classification and the representative workflows of traffic classification techniques using machine learning are discussed. Based on the workflow, an overview of the data collection, feature engineering, algorithm selection, and model deployment methods by category is provided.
Oh et al. [11] discussed methods used for analyzing malicious network traffic encrypted with TLS available at the Security Operation Center. Machine-learning-based algorithms and encryption traffic analysis techniques using middleboxes were mainly described. An overview of how TLS interception can be performed over middlebox without a secret key or through a machine learning pipeline, for passive inspection of TLS-encrypted traffic to perform analysis by sniffing traffic and extracting features or performing malware detection using TLS flow fingerprinting was provided.
Papadogiannaki and Ioannidis [12] described cryptographic network analysis applications, technologies, and countermeasures in four use cases. These use cases consist of analytics, security, user privacy, and middleboxes. Analytics identifies protocols and users, security detects malicious traffic, user privacy detects data leakage and fingerprinting, and middlebox performs a deep packet inspection for man-in-the-middle attacks. Each case provides an overview of the application, technology, and response measures adopted. The datasets used in each study are also mentioned.
However, the researchers did not specify or describe TLS fingerprinting, which can be utilized additionally or alone in the technology stack of network fingerprinting. Existing survey papers on TLS fingerprinting provide only partial information on its use as a cryptographic traffic analysis technique, insufficient for a broad understanding of TLS fingerprinting. This paper aims to provide a broad perspective on TLS fingerprinting techniques without being specific to the purpose, technique, or data type.
TLS is an encryption communication protocol designed to prevent eavesdropping, tampering, and message forgery of client-server data by providing end-to-end encryption. TLS 1.0, based on Secure Socket Layer (SSL) 3.0, achieved communication privacy over the internet [17]. When TLS 1.3 was announced in March 2021, TLS 1.0 and TLS 1.1 were discontinued by the Internet Engineering Task Force (IETF) [18]. Transmission data are protected by authenticating servers and clients using a certificate and public key (asymmetric key) algorithms and performing encryption algorithm negotiation and key exchange for transmission data encryption to encrypt data after negotiation [19,20,21,22]. The TLS structure is illustrated in Figure 3. It protects the data transmitted from application layer protocols (HTTP and FTP) to transport layer protocols (TCP and UDP payload). Traffic between communication peers is protected through record protocols, and handshakes sharing encryption specifications and keys for transmission data protection are conducted.
Key exchange, server parameter setting, and authentication are the three stages of a handshake, as shown in Figure 4. The Client/ServerHello messages sent during the key exchange phase are not encrypted. A handshake encryption key is shared to protect the messages during this phase. The handshake encryption key is used until an application encryption key is shared in the server parameters and authentication phases. The application data are encrypted using N number of application encryption keys generated through key sharing. The Client/ServerHello messages, which are non-encrypted data, are used to generate fingerprints using cipher suites and extensions.
A Markov chain is a stochastic probability model with a Markov property in which the probability of a particular event depends only on the state attained in the previous event [23]. A stochastic process is a set of probability variables whose state changes stochastically over a certain period. It can be termed as a collection of values that observe the state of an object over time. The probability process can be categorized into discrete and continuous time depending on the time of observation. A Markov chain generally refers to the discrete-time Markov process [24]. If Xn+1 is a specific state and Xn is a historical state, then the Markov chain can be expressed as
$ \underset{x\to {a}^{+}}{\mathrm{lim}}f\left(x\right) = \pm \infty P\left({X}_{n+1} = j\right|{X}_{n} = i, \\{X}_{n-1} = {i}_{n-1}, \dots , {X}_{0} = {i}_{0}) = P\left({X}_{n+1} = j\right|{X}_{n} = i) . $ | (1) |
If $ P\left({X}_{n+1} = j\right|{X}_{n} = i) $, which is a pair of ($ i, j) $, is expressed as $ {p}_{ij} $, then a Markov chain with m states can be encoded as in Eq (2). The matrix is a two-dimensional transition probability matrix, where $ i $, $ j, \mathrm{a}\mathrm{n}\mathrm{d} $ $ {p}_{ij} $ represent the row, column, and transition probability elements of the matrix. Since the sum of all elements in each row is the sum of the probabilities of transition to a particular state, it is always 1.
$ \left(p11⋯p1m⋮⋱⋮pm1⋯pmm \right) $
|
(2) |
A Markov chain can also be represented by a state transition graph using a set of states and a transition probability matrix. For example, Figure 5 illustrates a Markov chain with $ m = 2 $ and $ {p}_{ij} = 4 $
Markov processes and chains are mathematical techniques for analyzing state changes and state characteristics by approaching complex probability processes with simple assumptions using a set of states and transition probability matrices [25,26]. The state transition graph of a Markov chain can be used to generate fingerprints based on changes in the message type and state of encrypted communication traffic.
AI is a field of study based on the speculation that machines can be used to simulate all aspects of learning and other features of intelligence with accuracy as its principle [27]. AI, which aims to simulate intelligent human behavior, is widely used in the field of data engineering, power demand forecasting, etc. [28]. It aims for an accurate prediction in the form of machine learning systems.
Machine learning extracts features from a large number of data and then learns them into artificial neural networks to perform classification and regression on future inputs [29]. With the development of deep learning techniques that combine human neural networks with deep neural networks, research in various fields and use cases have emerged [30]. It is used to analyze encrypted traffic, such as extracting features from encrypted communication traffic, classifying positive/malicious software traffic, or identifying client applications and services.
A technique based on fingerprint collection generates fingerprints for encrypted traffic, stores them in a database, and performs fingerprinting by complete or approximate matching. Fingerprints are generated based on Client/ServerHello messages, statistical techniques, and behavior such as responding to a request message from a particular sequence. They are represented by strings, data formats (XML, JSON), state transition graphs, etc. Fingerprints that are forged, altered, or unregistered have the disadvantage of poor accuracy, as the technique relies on fingerprint information. Research on improving accuracy using statistics and AI techniques is being conducted.
This technique generates fingerprints by extracting values from ClientHello/ServerHello and messages sent and received during the key exchange phase. It is primarily a technique that identifies the processes of a client and the services provided by a server and compares them to a prebuilt fingerprint database used for fingerprinting.
Ristic [31] proposed a technique that distinguished client processes, taking advantage of the fact that the list of supported encryption algorithms on the server differed depending on the client using the SSL. Subsequently, several studies have proposed fingerprint generation methods using various ClientHello fields based on cipher suits [32,33,34,35]. The fields used are presented in Table 2.
ClientHello field | [31] | [32] | [33] | [34] | [35] |
SSL/TLS record version | √ | √ | |||
SSL/TLS version | √ | √ | √ | √ | |
Cipher suites length | √ | ||||
Cipher suites | √ | √ | √ | √ | √ |
Extensions | √ | √ | √ | √ | |
Data of extensions | √ | ||||
Flag | √ | ||||
Compression length | √ | ||||
Compression | √ | ||||
Server name* | √ | ||||
Elliptic curves* | √ | ||||
Elliptic curve point formats* | √ |
Cipher suites are commonly used for fingerprinting while extensions are used with SSL fingerprinting for p0f [32]. Brotherston [33] presented a demonstration of identifying real-world processes by adding various fields. Althouse [34] discussed JA3 fingerprint, which uses extension and elliptic curve algorithm information for fingerprint generation to simplify the use field and for discrimination. Fingerprint generation codes and methods were shared as open sources to broaden the base. JA3 fingerprint is also used as pulse information in Open Threat Exchange, a threat intelligence-sharing system. An example of a JA3 fingerprint obtained using the ClientHello message is shown in Figure 6. The network flow data capturing and aliasing package named Joy by Cisco uses TLS fingerprints [35]. Similar to the JA3 fingerprint, these fingerprints do not extract the contents of a specific extension for fingerprint usage, but the data of the extension can be included in the fingerprint. Although this technique has the advantages of fast identification speed and ease of use, fingerprints are generated only in the ClientHello messages. Therefore, the technique has a disadvantage in that several processes using the same ClientHello messages may belong to one fingerprint.
Accuracy can be improved by utilizing additional techniques, such as machine learning, statistical techniques, and usage of ServerHello information, to compensate for the shortcomings. This is done because the ClientHello of the client may remain the same but the ServerHello of the servers, which are connected, may differ for different applications. JA3S uses ServerHello to generate server fingerprints, while Joy uses the cipher suit selected by the server [35,36]. JA3S with JA3 fingerprints provides additional information on connections. Thus, a classification between connections with ClientHello may be performed. Joy allows the classification between connections, but it uses different fingerprinting information such as different protocols and Operating Systems.
To improve ClientHello/ServerHello-based techniques, Anderson and McGrew [37] used knowledge bases combined with end host and network data to identify the direction of trends in enterprise TLS applications. It also enhanced the understanding of application behavior. Anderson and McGrew [38] presented a method to improve the identification and classification accuracy of ClientHello-based TLS fingerprints using destination context and pre-collected knowledge bases. For this study, a similar fingerprint group was selected through approximate matching, using the Levenshtein distance algorithm, when no exact match for the fingerprint information was found. The matching probability was then computed using the weighted naive Bayes model learned with the destination context and knowledge bases. The process was identified as most likely.
Korczynésk and Duda [39] collected all handshake connections that occurred when accessing a particular service (Paypal, Twitter, Dropbox, etc.). Probabilities for the TLS protocol versions and the handshake message occurrences were derived and classified as Markov chains with parameters. Liu et al. [40] generated features by using length Markov models in addition to message-type Markov models [39]. Classification using machine-learning-based classifiers was performed. Liu et al. [41] proposed a method to improve the classification accuracy of application traffic by performing fingerprinting using cipher suite distribution as multi-attributions in addition to the length Markov models [40]. Classification accuracy was derived for applications such as MaMPF and improved accuracy was demonstrated [40]. An example of a statistical fingerprint obtained using a Markov chain and state transition graph is shown in Figure 7.
Chao [42] conducted a study to identify malicious encrypted traffic. The fingerprint was generated by adding a TCP handshake, SSL/TLS message type, and TCP four-way wave as fingerprint elements to the SSL/TLS handshake state transition-based fingerprint [39]. Encrypted traffic was also identified by generating a feature based on a 2-order Markov chain derived from the generated fingerprint and utilizing it for machine learning.
Zhao et al. [43] classified and identified encrypted traffic by replicating and analyzing traffic characteristics for pre-filtered traffic with header and handshake. Fingerprints based on a hidden Markov model were extracted to classify and identify applications. For this study, classification was performed on web applications, Real-time transport protocol, voice over internet protocol, and video-audio streaming media traffic.
In 2019, Garn et al. [44] identified a web browser, during a client application, by sending a test set consisting of a specific combination of sequences from the server during the TLS handshake process. When the browser sends a ClientHello message to the server, the server, configured with the proposed framework, sends a specific combination of server-response messages. The browser transmits a response message to the server message. During this handshake process, the browser fingerprints the message sent by the client as a feature vector. In 2022, Garn et al. [45] discussed a two-step method for identifying a browser, and the ClientHello-based fingerprinting was performed while connecting to the browser. The browser was identified using the old method when no unique matching results were found [44]. An example of a behavioral fingerprint using a combination of sequences is shown in Figure 8.
AI-based techniques extract features from encrypted traffic and learn artificial neural networks to perform fingerprinting. Feature extraction utilizes statistical, time series, graph, and hybrid (or miscellaneous) techniques. Statistical analysis extracts features using statistical techniques on data obtained from the traffic, while the time series technique extracts feature through meaningful information arranged in chronological order among data obtained from the traffic. A graph extracts feature using a traffic-tracking graph. A hybrid technique uses several complex techniques and other features. In particular, the technique can identify the transmitted content and the algorithm used for encrypting transmission data.
In 2017, Dubin et al. [46] proposed a method using machine learning for extracting features from video streaming traffic and for classifying the video titles uploaded to video platforms. The bit-per-peak feature, derived from the peak value in the download speed pattern during video streaming, is used. The video content transmitted through encrypted traffic was classified by a third party. Yang et al. [47] proposed a method specifying which images were viewed on performing fingerprinting with Markov chain for fragments of encrypted video traffic in addition to the previous method [46]. The state change diagram of the fragment sequence, modeled for video streaming with a specific title for the video, was uploaded as a Markov chain on YouTube, a video platform operator. Thereafter, the title of the image, viewed through the machine learning model learned with the modeled information, was a YouTube image.
Al-Naami et al. [48] extracted the burst size and time of length, downlink, and uplink from the header of the packet and configured a feature set called bi-directional dependent fingerprinting to study the machine learning model by performing a two-way application on a mobile website.
In 2021, Kanda and Hashimoto [49] proposed a technique for identifying encryption algorithms and libraries on encrypted payloads to compensate for the shortcomings of randomizing or modifying the handshake message parameters to bypass TLS fingerprinting. Based on the test features of NIST SP 800-22 "A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications, " features were extracted, and the TLS version, cipher suite, and encryption algorithm were predicted [50]. An example of AI-based behavioral fingerprinting using statistical features is shown in Figure 9.
Böttinger et al. [51] used the length of The TLS record protocol as a feature to train a machine learning model to generate a classifier. The classifier classified the file format to be transmitted. The Classification was performed on application (ELF), voice (MP3), document (PDF), and image (JPG, WEBM) format files. In real scenarios, it was reported that fragment offset, uncertain compressor state, and symmetric block cipher padding acted as noise.
Zhang et al. [52] proposed features that could be used in techniques for fingerprinting websites encrypted with SSL/TLS and applied the features to a Deep Forest model. The proposed local request and response sequence feature was a value derived by grouping the size and packet number of in/outgoing packets according to various criteria. The developed model was then compared with other models and f1 score.
Lu et al. [53] used traffic-tracking graphs to fingerprint websites. The traffic-tracking graph was generated by dividing a packet into flows, by referring to a collection of packets with the same IP, and by using the packet length, time interval, directional sequence, and timestamp of the first packet. Subsequently, website classification was performed using the proposed graph-based machine learning model. A system overview of the graph attention pooling network-website fingerprinting (GAP-WF) is shown in Figure 10.
Richter et al. [54] presented four methods using HMAC size changes based on the selected cipher suite and TLS fragmentation. Five application layer protocols (namely, Hypertext Transfer Protocol, Simple Mail Transfer Protocol, Internet Relay Chat, Post Office Protocol3, and Internet Message Access Protocol) using Bayesian classifiers were classified. The TLS fragmentation, analyzed traffic structures, packet length, and inter-arrival time used as features are shown in Figure 11. Each method was made unique by using different feature extractions.
Anderson et al. [55,56,57] attempted to increase the accuracy of malware classification using various features. A dataset with a TLS version, offered cipher suits, TLS extension, selected cipher suite, client key length, sequence of record length, time, and type as features were constructed, and malware detection was performed [55]. Malicious traffic can be classified using a sequence of packet lengths (SPLTs), time, byte distribution (BD), with TLS, HTTP, and DNS metadata as features [56]. In the case of metadata for each protocol, data were extracted from the TLS handshake message, HTTP header, and DNS response without decryption. Malware detection was performed by learning the L1-logistic regression model. Malicious traffic can also be classified using SPLT, BD, and TLS metadata, and a custom feature server certificate was self-signed [57].
Based on the techniques discussed so far, a step-by-step TLS fingerprinting technique is presented in Figure 12.
First, in encrypted traffic fingerprinting, the TLS packet between the client and server is analyzed and executed. If the packet to be analyzed is classified by time, it can be divided into the key exchange, server parameter and authentication, and application data phases, which are termed Phase 1, 2, and 3, respectively, for convenience. Phase 1 is understood as a connection establishment attempt phase, Phase 2 is understood as a connection establishment phase, and Phase 3 is understood as a connected phase.
In Phase 1, pre-connection analysis is performed based on the Client/ServerHello fingerprints. In Phase 2, analysis is performed during the connection setup (handshake) using the metadata of the entire handshake phase. In Phase 3, analysis is performed while connected. Phase 1 has the advantage of quick analysis and pre-detection, but it is vulnerable to handshake modulation and is relatively less accurate when the analysis is performed using limited information. Phase 2 needs more information to perform high-accuracy detections and may still be vulnerable to handshake modulation. It has the disadvantage of being incapable of detecting content (e.g., insider information leakage) transmitted after being disconnected. In Phase 3, the information extracted from the application is used to detect the transmitted content based on the connection information that was analyzed until Phase 2.
Detailed fingerprinting can be performed using a greater number of phases, but a significant time and computing resources are consumed as more information is used. Therefore, applying a step-by-step technology according to the policy will increase the efficiency of the fingerprinting process. For example, the application phases may be configured differently depending on the importance and type of asset. For systems where identification of "who approaches" is important, the client should be analyzed using Phases 1 and 2, and for important assets that can cause serious damage in case of leakage, all three Phases should be used to identify content and detect insider information leakage.
In this study, encrypted traffic fingerprinting is analyzed by dividing it into fingerprint collection and AI techniques. Several advantages and disadvantages for each technique are identified.
Fingerprint collection techniques have the advantage of easy system construction when the fingerprint generation methods are clear. In addition, they are identified and classified through a fingerprint database comparison, which consumes less time and computing resources. In these techniques, pre-detection is possible because identification and classification can proceed during the handshake process. However, their accuracy is poor when fingerprints are not registered in the fingerprint database or when the client and server falsify and modulate the information used for fingerprint generation.
AI can perform calculations using various features and can infer encrypted and transmitted contents along with the encryption algorithms used. This is advantageous because it can identify leaked contents and report whether the information was leaked during detection. However, detection in advance is difficult because it can target the application layer or collect information on the entire connected traffic and then extract features and read the results.
As shown in Figure 12, a study capable of performing fingerprinting step-by-step should be conducted in the future to offset the advantages and disadvantages of each technique.
This work was supported by an Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No.2022-0-01200, Training Key Talents in Industrial Convergence Security).
The authors declare no conflict of interest.
[1] |
Oparil S, Acelajado MC, Bakris GL, et al. (2018) Hypertension. Nat Rev Dis Primers 4: 18014. doi: 10.1038/nrdp.2018.14
![]() |
[2] |
Prieto I, Villarejo AB, Segarra AB, et al. (2014) Brain, heart and kidney correlate for the control of blood pressure and water balance: role of angiotensinases. Neuroendocrinology 100: 198–208. doi: 10.1159/000368835
![]() |
[3] |
Prieto I, Segarra AB, Martinez-Canamero M, et al. (2017) Bidirectional asymmetry in the neurovisceral communication for the cardiovascular control: New insights. Endocr Regul 51: 157–167. doi: 10.1515/enr-2017-0017
![]() |
[4] | Prieto I, Segarra AB, de Gasparo M, et al. (2018) Divergent profile between hypothalamic and plasmatic aminopeptidase activities in WKY and SHR. Influence of beta-adrenergic blockade. Life Sci 192: 9–17. |
[5] |
Parati G, Ochoa JE, Lombardi C, et al. (2013) Assessment and management of blood-pressure variability. Nat Rev Cardiol 10: 143–155. doi: 10.1038/nrcardio.2013.1
![]() |
[6] |
Banegas I, Prieto I, Vives F, et al. (2006) Brain aminopeptidases and hypertension. J Renin Angiotensin Aldosterone Syst 7: 129–134. doi: 10.3317/jraas.2006.021
![]() |
[7] |
Banegas I, Prieto I, Vives F, et al. (2009) Asymmetrical response of aminopeptidase A and nitric oxide in plasma of normotensive and hypertensive rats with experimental hemiparkinsonism. Neuropharmacology 56: 573–579. doi: 10.1016/j.neuropharm.2008.10.011
![]() |
[8] |
Banegas I, Prieto I, Segarra AB, et al. (2011) Blood pressure increased dramatically in hypertensive rats after left hemisphere lesions with 6-hydroxydopamine. Neurosci Lett 500: 148–150. doi: 10.1016/j.neulet.2011.06.025
![]() |
[9] |
Banegas I, Prieto I, Segarra AB, et al. (2017) Bilateral distribution of enkephalinase activity in the medial prefrontal cortex differs between WKY and SHR rats unilaterally lesioned with 6-hydroxydopamine. Prog Neuropsychopharmacol Biol Psychiatry 75: 213–218. doi: 10.1016/j.pnpbp.2017.02.015
![]() |
[10] |
Prieto I, Segarra AB, Villarejo AB, et al. (2019) Neuropeptidase activity in the frontal cortex of Wistar-Kyoto and spontaneously hypertensive rats treated with vasoactive drugs: a bilateral study. J Hypertens 37: 612–628. doi: 10.1097/HJH.0000000000001884
![]() |
[11] | Cuspidi C, Tadic M, Grassi G, et al. (2018) Mancia G. Treatment of hypertension: The ESH/ESC guidelines recommendations. Pharmacol Res 128: 315–321. |
[12] |
Ramírez-Sánchez M, Prieto I, Wangensteen R, et al. (2013) The renin-angiotensin system: new insight into old therapies. Curr Med Chem 20: 1313–1322. doi: 10.2174/0929867311320100008
![]() |
[13] |
Ferdinand KC, Balavoine F, Besse B, et al. (2019) Efficacy and safety of Firibastat, a first-in-class brain aminopeptidase A inhibitor, in hypertensive overweight patients of multiple ethnic origins: A Phase 2, open-label, multicenter, dose-titrating study. Circulation 140: 138–146. doi: 10.1161/CIRCULATIONAHA.119.040070
![]() |
[14] |
Gao J, Marc Y, Iturrioz X, et al. (2014) A new strategy for treating hypertension by blocking the activity of the brain renin-angiotensin system with aminopeptidase A inhibitors. Clin Sci (Lond) 127: 135–148. doi: 10.1042/CS20130396
![]() |
[15] |
Keck M, De Almeida H, Compère D, et al. (2019) NI956/QGC006, a potent orally active, brain-penetrating aminopeptidase a inhibitor for treating hypertension. Hypertension 73: 1300–1307. doi: 10.1161/HYPERTENSIONAHA.118.12499
![]() |
[16] |
Wright JW, Mizutani S, Murray CE, et al. (1990) Aminopeptidase-induced elevations and reductions in blood pressure in the spontaneously hypertensive rat. J Hypertens 8: 969–974. doi: 10.1097/00004872-199010000-00013
![]() |
[17] |
Wright JW, Jensen LL, Cushing LL, et al. (1989) Leucine aminopeptidase M-induced reductions in blood pressure in spontaneously hypertensive rats. Hypertension 13: 910–915. doi: 10.1161/01.HYP.13.6.910
![]() |
[18] |
Zini S, Masdehors P, Lenkei Z, et al. (1997) Aminopeptidase A: distribution in rat brain nuclei and increased activity in spontaneously hypertensive rats. Neuroscience 78: 1187–1193. doi: 10.1016/S0306-4522(96)00660-4
![]() |
[19] |
Banegas I, Prieto I, Segarra AB, et al. (2017) Study of the neuropeptide function in Parkinson's disease using the 6-Hydroxydopamine model of experimental Hemiparkinsonism. AIMS Neuroscience 4: 223–237. doi: 10.3934/Neuroscience.2017.4.223
![]() |
[20] |
Kondo K, Ebihara A, Suzuki H, et al. (1981) Role of dopamine in the regulation of blood pressure and the renin--angiotensin-aldosterone system in conscious rats. Clin Sci (Lond) 61: 235s–237s. doi: 10.1042/cs061235s
![]() |
[21] |
Zeng C, Jose PA (2011) Dopamine receptors: important antihypertensive counterbalance against hypertensive factors. Hypertension 57: 11–17. doi: 10.1161/HYPERTENSIONAHA.110.157727
![]() |
[22] |
Förstermann U, Sessa WC (2012) Nitric oxide synthases: regulation and function. Eur Heart J 33: 829–837. doi: 10.1093/eurheartj/ehr304
![]() |
[23] | Calver A, Collier J, Vallance P (1993) Nitric oxide and the control of human vascular tone in health and disease. Eur J Med 2: 48–53. |
[24] |
Venturelli M, Pedrinolla A, Boscolo Galazzo I, et al. (2018) Impact of nitric oxide bioavailability on the progressive cerebral and peripheral circulatory impairments during aging and Alzheimer's disease. Front Physiol 9: 169. doi: 10.3389/fphys.2018.00169
![]() |
[25] |
Steinert JR, Chernova T, Forsythe ID (2010) Nitric oxide signaling in brain function, dysfunction, and dementia. Neuroscientist 16: 435–452. doi: 10.1177/1073858410366481
![]() |
[26] | Maia-de-Oliveira JP, Trzesniak C, Oliveira IR, et al. (2012) Nitric oxide plasma/serum levels in patients with schizophrenia: a systematic review and meta-analysis. Braz J Psychiatry 34: S149–155. |
[27] |
Nakano Y, Yoshimura R, Nakano H, et al. (2010) Association between plasma nitric oxide metabolites levels and negative symptoms of schizophrenia: a pilot study. Hum Psychopharmacol 25: 139–144. doi: 10.1002/hup.1102
![]() |
[28] |
Shabeeh H, Khan S, Jiang B, et al. (2017) Blood pressure in healthy humans is regulated by neuronal no synthase. Hypertension 69: 970–976. doi: 10.1161/HYPERTENSIONAHA.116.08792
![]() |
[29] |
Chen ZQ, Mou RT, Feng DX, et al. (2017) The role of nitric oxide in stroke. Med Gas Res 7: 194–203. doi: 10.4103/2045-9912.215750
![]() |
[30] |
Zheng R, Qin L, Li S, et al. (2014) CT perfusion-derived mean transit time of cortical brain has a negative correlation with the plasma level of nitric oxide after subarachnoid hemorrhage. Acta Neurochir (Wien) 156: 527–533. doi: 10.1007/s00701-013-1968-6
![]() |
[31] |
Taffi R, Nanetti L, Mazzanti L, et al. (2008) Plasma levels of nitric oxide and stroke outcome. J Neurol 255: 94–98. doi: 10.1007/s00415-007-0700-y
![]() |
[32] |
Li S, Wang Y, Jiang Z, et al. (2018) Impaired cognitive performance in endothelial nitric oxide synthase knockout mice after ischemic stroke: A pilot study. Am J Phys Med Rehabil 97: 492–499. doi: 10.1097/PHM.0000000000000904
![]() |
[33] | Bernard C (1878) Etude sur la physiologie du coeur. La science experimentale: 316–366. Available from: https://gallica.bnf.fr/ark:/12148/bpt6k69992b/f316.image. |
[34] |
Thayer JF, Lane RD (2009) Claude Bernard and the heart-brain connection: further elaboration of a model of neurovisceral integration. Neurosci Biobehav Rev 33: 81–88. doi: 10.1016/j.neubiorev.2008.08.004
![]() |
[35] |
Segarra AB, Prieto I, Banegas I, et al. (2012) Asymmetrical effect of captopril on the angiotensinase activity in frontal cortex and plasma of the spontaneously hypertensive rats: expanding the model of neuroendocrine integration. Behav Brain Res 230: 423–427. doi: 10.1016/j.bbr.2012.02.039
![]() |
[36] | Segarra AB, Prieto I, Banegas I, et al. (2013) The brain-heart connection: frontal cortex and left ventricle angiotensinase activities in control and captopril-treated hypertensive rats-a bilateral study. Int J Hypertens 2013: 156179. |
[37] | Segarra AB, Banegas I, Prieto I, et al. (2016) Brain asymmetry and dopamine: beyond motor implications in Parkinson's disease and experimental hemiparkinsonism. Rev Neurol 63: 415–421. |
[38] |
de Jager L, Amorim EDT, Lucchetti BFC, et al. (2018) Nitric oxide alterations in cardiovascular system of rats with Parkinsonism induced by 6-OHDA and submitted to previous exercise. Life Sci 204: 78–86. doi: 10.1016/j.lfs.2018.05.017
![]() |
[39] |
Alexander N, Kaneda N, Ishii A, et al. (1990) Right-left asymmetry of tyrosine hydroxylase in rat median eminence: influence of arterial baroreflex nerves. Brain Res 523: 195–198. doi: 10.1016/0006-8993(90)91487-2
![]() |
[40] |
Hersh LB (1985) Characterization of membrane-bound aminopeptidases from rat brain:identification of the enkephalin-degrading aminopeptidase. J Neurochem 44: 1427–1435. doi: 10.1111/j.1471-4159.1985.tb08779.x
![]() |
[41] |
Wright JW, Harding JW (1997) Important roles for angiotensin III and IV in the brain renin-angiotensin system. Brain Res Rev 25: 96–124. doi: 10.1016/S0165-0173(97)00019-2
![]() |
1. | Yuba R. Siwakoti, Danda B. Rawat, 2024, Detecting Malicious Traffic using JA3 Fingerprints Attributed ML Approach, 979-8-3503-5471-3, 128, 10.1109/ICDCSW63686.2024.00024 | |
2. | Jong Hyuk Park, Editorial: Artificial Intelligence-based Security Applications and Services for Smart Cities, 2024, 21, 1551-0018, 7012, 10.3934/mbe.2024307 | |
3. | Zelin Cui, Pu Dong, Dongxu Han, Bo Jiang, Zhigang Lu, Huamin Feng, 2025, Cross Page Recognition Methods for Encrypted Web Application Fingerprinting, 979-8-3315-1305-4, 423, 10.1109/CSCWD64889.2025.11033489 |
ClientHello field | [31] | [32] | [33] | [34] | [35] |
SSL/TLS record version | √ | √ | |||
SSL/TLS version | √ | √ | √ | √ | |
Cipher suites length | √ | ||||
Cipher suites | √ | √ | √ | √ | √ |
Extensions | √ | √ | √ | √ | |
Data of extensions | √ | ||||
Flag | √ | ||||
Compression length | √ | ||||
Compression | √ | ||||
Server name* | √ | ||||
Elliptic curves* | √ | ||||
Elliptic curve point formats* | √ |
Survey | Problem domains | Method | Protocols |
[9] | Classification | Focus on ML-based | Various |
[10] | Classification | ML-based | Various |
[11] | Detection | Focus on ML-based | TLS |
[12] | Analysis | Various | Various |
Present study | Fingerprinting | Various | TLS |
ClientHello field | [31] | [32] | [33] | [34] | [35] |
SSL/TLS record version | √ | √ | |||
SSL/TLS version | √ | √ | √ | √ | |
Cipher suites length | √ | ||||
Cipher suites | √ | √ | √ | √ | √ |
Extensions | √ | √ | √ | √ | |
Data of extensions | √ | ||||
Flag | √ | ||||
Compression length | √ | ||||
Compression | √ | ||||
Server name* | √ | ||||
Elliptic curves* | √ | ||||
Elliptic curve point formats* | √ |