Identifying Malicious Encrypted Traffic Using Random Forest Algorithm

Discovery of Threats Hidden in Encrypted Traffic By Huawei Network Product Line

1.  Development Trend of Encrypted Traffic

Deployment of more and more applications on the cloud and use of the Internet demand careful attention to cyber security threats. Attacks against web applications are the most frequent challenge.

To ensure communication security and privacy, more and more websites use HTTPS rather than HTTP for transmission. HTTPS network connections protect information from snoopers, middlemen, and hackers who attempt to spoof trusted websites.That is, encryption prevents user information from being intercepted and ensures the integrity of information received and sent.

Currently, the proportion of HTTPS traffic on networks is increasing and can even approach 100%, as shown in the following figures.

firefox https://letsencrypt.org/stats/
Transparency Report

However, HTTPS-encrypted communication is a double-edged sword for security protection. Traditional security protection functions need to analyze communication data to identify abnormal data. However, when attack traffic is mixed with HTTPS-encrypted traffic, detection engines cannot analyze the traffic and thus permit the traffic to pass.

Gateway devices in the industry mainly use bidirectional proxy (also known as man-in-the-middle technology) to decrypt HTTPS-encrypted traffic to detect attacks. This approach has disadvantages:

First, low performance. Encryption and decryption consume a large amount of resources. To resolve this problem, vendors can deploy dedicated encryption and decryption chips to accelerate these tasks, but the cost is high.

Second, bidirectional proxy does not apply to all traffic. To protect communication data from man-in-the-middle attacks, some vendors use two-way certificate authentication. In this case, bidirectional proxy becomes invalid.

2.  Opportunity

To cope with these problems, Huawei Wei Ran Lab has studied a large amount of malicious communication traffic and found that HTTPS-encrypted communication traffic is significantly different from non-encrypted HTTPS access traffic. Based on this research, Wei Ran Lab has conducted big data analysis, trained millions of samples, and selected the Random Forest algorithm to identify threats in encrypted communication traffic. This mature Machine Learning (ML) algorithm produces an identification accuracy higher than 99%.

The resulting Encrypted Communication Analytics (ECA) technology has advantages over traditional methods for malicious web traffic detection. In addition to having no need for decryption, ECA offers high performance, user privacy protection, and the ability to detect zero-day attacks without requiring updating rules.

3.  Interaction Process of Encrypted Traffic

HTTPS is actually HTTP communication over Secure Sockets Layer (SSL) or Transport Layer Security (TLS).

interatction Huawei

HTTPS communication begins with the TCP three-way handshake, followed by the TLS/SSL four-way handshake. The following figure shows the communication process of TLS 1.2:

screen shot 2018 10 11 at 8.46.49 am Huawei

Although the HTTP communication traffic is encrypted, the encryption negotiation process (TLS/SSL) starting from the handshakes is not encrypted. The handshake process contains a message sequence for security parameter and cipher suite negotiation, identity authentication, and key exchange.

4.  Encryption Traffic Detection Technology

Many signatures can be extracted from the HTTPS (TLS) communication process. We can add some background traffic information (statistical signatures of TCP flows, TLS handshake information, DNS flow signatures associated with TLS flows, and HTTP flow signatures associated with TLS flows), encode some signatures, and then send them to a model for judgment. The model uses the Random Forest algorithm. The signatures do not have weights initially. Signature weights are automatically determined through training.

picture 3 Huawei

We extract dozens of signatures from communication traffic and train millions of samples to generate the ECA model for encrypted traffic detection. The ECA model has been deployed on the Huawei Cybersecurity Intelligence System (CIS) to improve the ability to identify malicious traffic to an accuracy greater than 95%.

TLS 1.3 has been officially released in 2018. The overall communication process of TLS 1.3 is similar to that of TLS 1.2, except for the abandonment of compression algorithms and insecure algorithms such as MD5 and the need to adjust signatures.

Due to the wide variety of Huawei network products, we can deploy the traffic probe for TLS signature extraction on switches or other devices so that they can extract and send the metadata of communication traffic to the big data detection platform. The platform can use the ECA model to detect malicious traffic and protect customer networks.

Huawei places cyber security and privacy protection at the top of the company’s agenda. At 14:00 to 15:30 of October 11, the security summit “Build an Intelligent End-to-End Security Assurance System” will be held at HUAWEI CONNECT 2018. At this summit, Huawei will release innovative security solutions for 5G, IoT, SoftCOM, Safe City, and private cloud. Please join us at the conference and engage with Huawei on the future of intelligent security.

For more information, please visit https://www.huawei.com/en/press-events/events/huaweiconnect2018.