Methods to Perform Encrypted Traffic Analysis (ETA)
January 4, 2023
In addition to considerably enhancing security and user privacy, the introduction of network traffic encryption, such as TLS, has also made it more difficult for network managers to keep an eye on their infrastructure for malicious traffic and the leakage of critical data.
Background
In addition to considerably enhancing security and user privacy, the introduction of network traffic encryption, such as TLS, has also made it more difficult for network managers to keep an eye on their infrastructure for malicious traffic and the leakage of critical data. An eavesdropper can easily observe the contents of network packets in unencrypted network traffic, whether the eavesdropper is malicious (such as an attacker) or benign (such as a network administrator monitoring his or her equipment). However, while an eavesdropper can still record packets, he can no longer interpret their content or modify them covertly thanks to traffic or packet encryption techniques like TLS, IPSec, etc. Because of this, many TLS users believe that their connection to a web server is secure from outside interpretation. This is only partially true, though, as current encrypted traffic analysis (also known as ETA) has the ability to mitigate these privacy improvements. On the other hand, network administrators can also benefit from encrypted traffic analysis. In order to identify malicious traffic, such as connections to Command & Control (C&C) servers, virus spreading, or the exfiltration of sensitive data, standard rule-based monitoring and detection security controls, such as Intrusion Detection and Prevention System (IDPS), Application Firewalls, and Data Loss Prevention and Protection (DLP), rely on payload analysis.Use cases of ETA
- Application Identification: It is highly advantageous to identify and categorize traffic in accordance with ISO/OSI layer-7 applications in terms of business intelligence, network maintenance, IT security, and quality of service. This enables a business, for instance, to ban illegitimate apps like peer-to-peer file-sharing using dynamic access control. Another use case for matching network traffic to the application is trend analysis, which makes it easier to estimate network demands.
- Encrypted Malware traffic: Even while all data are available in the unencrypted traffic, that does not indicate that every malicious behavior can be discovered there, making the detection of malware traffic that is not encrypted a difficult process. The reason is straightforward: unlike other machine learning tasks, where the targeted objects (such as pictures of people or English sentences) do not change significantly over the course of our lives, attackers continue to develop new attack vectors and malicious behavior continues to change dynamically. From a machine learning perspective, malware traffic detection is generally an extremely challenging problem.
- Fingerprinting: Even if the traffic is encrypted, there are techniques to determine the files, songs, or videos a user requested. It is feasible to make data records that link specific attributes of the encrypted data to the corresponding files or websites by keeping an eye on certain characteristics of the encrypted data. It is known as “fingerprinting.” It is feasible to identify a file or website even on encrypted transmission by finding a match in the fingerprint database given a similar database of fingerprints.
- DNS Tunneling: A method called DNS tunneling enables a two-way transmission of data across the DNS protocol. DNS, which was created with the intention of converting human-readable domain names into IP addresses, is being abused in this situation. Information can be transferred in both directions in this fashion when the client has established a connection. A malicious DNS server is used by viruses or other malware in a company network to load more malicious code or to accept commands. A domain can be readily registered by the attacker, who then has complete power to respond to related DNS requests.
Techniques used to perform ETA
- ML and Deep Learning: The majority of studies that use machine learning methods to test the viability of decrypting and categorizing encrypted network traffic concentrate on supervised learning approaches. This indicates that these methods first assess the performance of their models before employing ground-truth datasets to train their models. Additionally, many studies concentrate on unsupervised learning methods, using algorithms that examine data similarities.The key to correctly classifying and describing traffic is feature selection. Additionally, an approach that can significantly enhance the performance of typical machine learning algorithms with a focus on malware traffic detection is the combination of various views of the data, such as features pertaining to how the application is transmitting data with features that are representative of the application.
- Mapping: Mapping is a technique used for identifying meaningful relationships between network entities on the globe, in the network, and with similar features. Multiple malicious websites frequently reside on multi-tenant servers at the same time which uses shared IP and mapping can be used to detect such servers.
- Profiling: Profiling is a technique used for observing how network entities’ behavior changes over time and comparing it to predefined baselines. Monitoring traffic at odd hours of the day or night can indicate malicious traffic Also, the emergence of new fingerprints can indicate the presence of malware or other unwanted software on the network.
- Interception of encrypted traffic: By encrypting data and protecting against man-in-the-middle attacks through certificate validation, which allows the client to verify the authenticity of the destination server and reject impostors, TLS is intended to make such eavesdropping harder. Local software injects a self-signed CA certificate into the client browser’s root store at installation time to get around this validation. Mirroring the traffic to a central IDS, able to decrypt the traffic and perform deep packet inspection, yet, without any privacy-preserving guarantees.