How To Implement Packet and Flow Deduplication

Posted · Add Comment

Depending on the network topology and configuration, your monitoring tools can receive the same traffic multiple times. This problem is called data duplication. Duplication can happen at packet or flow level:

  • Packet duplication
    The same packet is received multiple (usually twice) times, either one after the other, or within a short mount of time. Note that this has nothing to do with TCP data retransmission that is a totally different scenario.
  • Flow duplication
    Two or more flow-devices observe the same traffic, and emit the same flow at the same time.

In both cases, the goal is to keep the first packet/flow and discard the duplicates. The main difference between these two use cases, is that with packets the deduplication time window measured in msec for flows in seconds. 

Packet Deduplication

In this case we can offer a few options

  • If you have a few Gbits of traffic, nProbe features packet deduplication via the –enable-ipv4-deduplication command line flag that is used to discard consecutive IPv4 packet copies. You can use nProbe to feed ntopng and thus deduplicate in ntopng.
  • if you have more traffic or if your duplicated packets are not one after the other, you need to use nDedup, a tool part of the n2disk package that basically buffers packets in memory and discard duplicates.
    You need to deploy nDedup as a bridge tool that discards duplicated packets and forward “clean” traffic to an interface to which you attach you favorite monitoring tools. Note that the faster is the network, the more memory you need to keep packets in memory, so avoid large windows (50 ms or more) that they might require a lot of memory at high speed rates.

Flow Deduplication

As with packet deduplication, nProbe can be used to deduplicate flows when used to collect flows. In this case nProbe keeps flows in cache for some time and if for a collected flow there is already another flow with the same key in memory, such flow is discarded. As explained in this post, you can use –flow-deduplication <interval (sec)> can be used to specify the sliding window duration, during which the duplication check is applied. Considered that flow timeout in exporters can be different, you can use windows of 30 sec, compared to msec you can use for packets. Also in this case, thanks to nProbe, your ntopng installation can benefit from flow deduplication when used in flow collection mode.

Enjoy !