New Challenges in DPI Protocol Detection

Posted · Add Comment

In the early Internet days, each network protocol was designed for a specific purpose: SMTP for sending emails, HTTP for the web and so on. In order to make sure that implementations where compliant with the specification, there was an RFC per protocol describing it. If a connection was starting with a protocol, let’s say SMTP, for the duration of the connection that was a SMTP connection meaning that the protocol behind a given connection was persistent for its duration. This in the early days.

Unfortunately the modern Internet does not respect this rule anymore. The use of NAT and firewalls started in the late 90s, created several issues to Internet communications and some companies decided that ease of use was a better that being standard compliant. For instance VoIP is a typical application that is not firewall-friendly as SIP/RTP/H.323/STUN are complex to operate with respect to double clicking on the Skype icon. In order Skype to operate through the firewall, it was impersonating protocols likely to pass through a firewall (for instance HTTP) and when the communication was established, Skype was using the open connection to exchange data using a new protocol (the proprietary Skype protocol but no longer HTTP). This is where the mess started: a HTTP connection turned into a different protocol.

Today this practice has exploded with protocols such as Facebook WhatsApp/Messenger, Google Hangout/Duo/Meet. These protocols start with STUN and then become something else after a few packets. See the example below: this is a WhatsApp call that started as STUN and then at packet 13 become something else that Wireshark was unable to decode.

There are many other protocols that behave like that: for instance the popular Signal messenger application starts as STUN and then changes to DTLS.

Another problem with DPI and apps, is that companies like Google and Facebook have several applications that overlap in term of functionalities and thus that from the DPI standpoint are almost alike. Furthermore such apps share services as they are very similar. For instance WhatsApp, Instagram and Messenger chat are all based on HTTPS services provide by edge-mqtt.facebook.com. So traffic from/to edge-mqtt.facebook.com should be classified as WhatsApp, Instagram or Messenger? You might consider this as a non important question but it’s very important instead when using nDPI to monitor inline traffic (e.g. in ntopng Edge), because you might want to block Instagram but not WhatsApp.

To make things short we have started to enhance nDPI to support these changes in Internet protocols. The latest nDPI versions implement a STUN cache to handle protocols based on it such as WhatsApp. This cache is used to detect as WhatsApp the main connection as well the sub-connection that otherwise would be marked as STUN and not as WhatsApp.

For the other problem (i.e. the same service shared by multiple protocols) we are implementing another cache that in case, for instance, a Instagram user accessed edge-mqtt.facebook.com that connection would be marked as Instagram instead of Messenger as it is today. As soon as we have finalised this implementation we will merge the code in nDPI.

Bottom line. DPI is still relevant today, but new protocols (that don’t follow stands but jeopardise them) are creating new challenges. nDPI is tackling them, but over time things are getting more complicated due to encryption and these bad practices. At ntop we like challenges so we’re implementing solutions, however it is a pity that Internet protocols are becoming so messy, and completely non-standard. I really miss the early Internet days!

Enjoy!