10 Gbit (Line Rate) NetFlow Traffic Analysis using nProbe and DNA

Posted · Add Comment

In the past couple of years, 10 Gbit networks are gradually replacing multi-1 Gbit links. Traffic analysis is also increasingly demanding as “legacy” NetFlow v5 flows are not enough to network administrators who want to know much more of their network than simple packets/bytes accounting. In order to satisfy these needs, we have added in the latest nProbe 6.9.x releases many new features including:

  • Flow application detection (via nDPI)
  • Network/application latency
  • Support of encapsulations such as GTP/Mobile IP/GRE
  • Various metrics for computing network user-experience
  • Extension to plugins to provide even further information for selected protocols such as HTTP.
You might ask yourself how the nProbe performance has been affected by all these extensions. Obviously the more the nProbe provides, the more CPU cycles are necessary. Nested encapsulations (e.g. Mobile IP encapsulated on GTP, encapsulated on VLANs are pretty common on mobile operators) require more time than “plain old” IP over ethernet. Today with a low-end Xeon (we use Intel E31230) we can handle from 1 Mpps/core (encapsulated GTP traffic with plugins [VoIP, HTTP] enabled) to over 3 Mpps/core with standard NetFlow v5/9. We have also implemented a new command line option called –quick-mode that when used, further speeds up operations a bit (this option is compatible when nProbe is used without plugins enabled).
Now if you really want to handle 10G line rate (14.88 Mpps) the only solution is to distribute the traffic across cores. The latest PF_RING DNA release allows to distribute in hardware (and thus without wasting CPU cycles) symmetric flows (i.e. both directions, sender->receiver and receiver->sender) to the same RX queue (and thus to the same core as explained below), contrary to what standard drivers do. In essence the DNA drivers make sure that your traffic is balanced across cores, so that you can multiply the traffic analysis performance. Example, supposing to have a 4 core + HT (total 8 cores such as Intel E31230) system, with DNA you have 8 RX queues (dna0@0….dna0@7) to which bind nProbe to. So you will bind a nProbe instance per RX queue/per core (8 instances in total) as follows:
  • nprobe -i dna0@0 -w 512000 --cpu-affinity 0
  • ....
  • nprobe -i dna0@7 -w 512000 --cpu-affinity 7
This way each core is kind of a closed system, where scalability is almost linear as each nProbe instance is bound to a core with no cross-core interference. This setup works if your traffic can be balanced across queues, that is usually the case in real networks, and doing some little/simple math you can immediately figure out that with a low-end Xeon you can handle 10 Gbit line rate using a nProbe when emitting standard v5/v9/IPFIX flows. As said earlier your mileage varies according to traffic encapsulation and number of plugins enabled, not to mention other parameters such as the number of concurrent active flows.
Using the above setup you can create a cheap box for 10 Gbit line rate traffic analysis using commodity hardware and ntop software. Commercial solutions featuring the same performance, will probably cost at least two order of magnitudes more than it.