Fixing Packet Deduplication: Introducing nDedup

Posted · Add Comment

When it comes to monitor a busy network, network monitoring tools can become bogged down, or even worse produce misleading information for your analysis, by a hidden culprit: duplicate packets. Imagine a firehose of data streaming across your network, much of this data can be redundant, with identical packets being sent multiple times due to retransmissions or mirroring configurations. As an example, when a SPAN (Switch Port Analyzers) port is used to mirror ingress and egress direction of switch ports, the resulting mirrored traffic might contain up to 50% of duplicates. These duplicates eat up bandwidth and processing power on your monitoring tools, not to mention disk resources when dumping traffic to disk.

This is where packet deduplication tools come to the rescue.

A dedicated packet deduplication tool sits between the network source and the monitoring tools, acting as a filter. There are network packet brokers able to remediate this by finding and removing duplicate packets before they can reach the analytics tools. However the higher is their capacity of filtering out duplicated packets, the higher is usually the price. If you look at the specs, there is a time window on which they are able to detect duplicated packets, which is limited by the buffering power of the equipment, and in many cases they are only able to detect them when they are one after the other.

For this reason we decided to develop a new utility, ndedup, able to detect and eliminate packet duplication in software, on a server with a couple of network interfaces, acting as a bridge. This tool has no hard limit on the buffering power and time window used for detecting duplicates, as it mainly depends on the amount of RAM available on the system (and modern systems have plenty of RAM). Furthermore, it leverages on our kernel-bypass zero-copy drivers and the PF_RING ZC library and optimised data structures to implement fast duplicates detection and packet forwarding, achieving high speed even with a large window size. This can run on top of any adapter (Intel, Mellanox/nVidia, Napatech) and natively supports the RSS technology to scale the performance up to 100 Gbps.

How does it work under the hood?

When a packet is received by the tool, a one-way hash is used to turn the packet into a strong collision-resistant hash value. This hash is compared to the hash of every packet that came before it in a pre-configured time window (for example the last 50 msec), and if no match is found, the packet is forwarded to the twin interface to be delivered to the monitoring tools, otherwise it is discarded (duplicate!).

This makes this tool deployable as transparent bridge in front of any appliance running Network monitoring software, including n2disk, nprobe or ntopng.

Example – bridge interfaces eth1 and eth2 – window size 10 msec – link speed 10 Gbps:

ndedup -i zc:eth1 -o zc:eth2 -d 10 -s 10 -S 0 -g 1 -B

Example – bridge interfaces eth1 and eth2 – use 4 RSS queues – window size 10 msec – link speed 10 Gbps:

ndedup -i zc:eth1@[0-3] -o zc:eth2@[0-3] -d 10 -s 10 -S 0 -g 1:2:3:4 -B

Please check the user’s guide for the full list of options. Also check the PF_RING ZC user’s guide for configuring ZC drivers and RSS.

The ndedup tool is part of (and installed with) the n2disk package and it does not require an additional license to operate.

Enjoy!