Since the introduction of PF_RING ZC drivers for Mellanox/NVIDIA, and the new family of Intel E810 adapters, the activity of selecting the best, cost-effective adapter, based on the use case and the performance we need to achieve, has become more complicated. Let’s try to shed some light.
Most commodity adapters, including Intel and Mellanox, are based on ASIC chipsets, which are cheap and provide simple RX/TX operations, with no (or limited) programmability.
Those adapters have been designed for general purpose connectivity and are not really optimized for moving traffic at high throughput (especially with worst-case traffic). However, by using PF_RING ZC drivers, in combination with hardware offloads capabilities like RSS (Receive Side Scaling) it is possible to scale the performance and process traffic up to 100 Gbps.
There are a few families of Intel adapters, based on the chipset architecture:
- e1000e (Intel 8254x/8256x/8257x/8258x) and igb (Intel 82575/82576/82580/I350) supporting 1 Gbit links
- ixgbe (Intel 82599/X520/X540/X550) supporting up to 10 Gbit links
- i40e (Intel X710/XL710) supporting up to 40 Gbit links
- ice (Intel E810) supporting up to 100 Gbit links
Intel adapters are good for traffic analysis and can scale up to 100 Gbps by load-balancing traffic to multiple CPU cores with RSS. They can be used in combination with nProbe Pro/Enterprise to process up to 10-20 Gbps by spawning multiple processes, or nProbe Cento to process up to 100 Gbps (on Intel E810).
Since Intel adapters implement simple RX operations, and can only work per-packet (there is one PCIe transaction for every packet received), they can be used for traffic recording with n2disk up to 10 Gbps (or a bit more, depending on the hardware configuration). The reason is that RSS cannot be used in this case, as a single stream is required to move data from the wire to disk to preserve packet order (traffic traces that have been manipulated by any mean are not useful for providing real evidence of a Network/Cybersecurity issue, or even for troubleshooting), and working per-packet does not provide enough performance to scale above 10-20 Gbps.
Please note that access to the Network adapter with ZC drivers is mutually exclusive, as the userspace library running inside the application fully controls the adapter. This means you cannot run multiple applications on the same interface (unless you use traffic duplication in software, running an applications like zbalance_ipc, which adds CPU overhead and affects the performance).
Mellanox adapters supported by PF_RING ZC, including ConnectX-5 and ConnectX-6, are also based on ASIC chipsets and they share with Intel similar performance limitation when it comes to dump traffic to disk with n2disk (they can be used up to 10-20 Gbps).
They are very good for traffic analysis, in fact it is possible to process up to 100 Gbps wire rate when using nProbe Cento. Instead, they cannot be used in combination with nProbe Pro/Enterprise when RSS is required, because the RSS implementation on Mellanox adapters supports multithreaded applications (e.g. nProbe Cento), but it cannot be used with multi-process applications (e.g. nProbe Pro/Enterprise).
On the other hand, Mellanox adapters deliver interesting features as introduced in a previous post, including hardware packet timestamping (with nanosecond resolution), hardware packet filtering (which is flexible and supports many rules) and traffic duplication.
Traffic duplication means that it is possible to capture the same traffic from multiple applications (e.g. n2disk for traffic recording, and nProbe Cento for traffic analysis), which is not possible with Intel. It is also possible to use a different load-balancing configuration per application (e.g. send all traffic using a single data stream to n2disk, while load-balancing traffic using multiple RSS queues to nProbe Cento).
FPGA-based Network adapters are specialized, programmable adapters able to deliver high performance and enhanced features. Adapters from different brands, including Napatech and Silicom Denmark (former Fiberblaze), fall into this category. Those adapters are definitely suitable for traffic analysis at high speed, however they are usually not really required with nProbe for instance, as an ASIC-based card (Intel or Mellanox) is already able to scale up to 100 Gbps by leveraging on RSS load-balancing and the price is much lower.
An interesting features available in most FPGA adapters is the ability to aggregate traffic from multiple interfaces in hardware. This is useful when using a Network TAP to monitor a link. In fact in this case the two directions of the full duplex link needs to be aggregated (and eventually load-balanced to multiple CPU cores) for providing the whole traffic to the traffic analysis application. This is something can be done in software up to a few Gbps, but it is mandatory to do this in hardware on 10 Gbps links which are fully loaded.
In case of traffic recording with n2disk instead, those adapters make a difference. In fact, FPGA adapters usually include huge buffers, and some of them are able to move packets to the host memory in blocks (clustering together packets up to several Megabytes per block) rather than individual packets as with ASIC adapters. This dramatically reduces the pressure on the PCIe bus and the memory subsystem, delivering packet capture performance up to 50 Gbps using a single data stream with n2disk. This capture mode, combined with the hardware timestamping support with nanosecond resolution, allows us to dump to disk up to 100 Gbps. This is the case of Napatech and Fiberblaze adapters for instance.
Please note that in most cases access to the Network adapter with FPGA adapters is mutually exclusive, similar to ZC drivers for Intel, especially when you need to use different load-balancing policies per application. The only exception is Fiberblaze, which allows you to create different load-balancing groups, similar to Mellanox.
If your system is equipped with a network adapter which is not supported by PF_RING ZC or FPGA, please expect (low) capture performance that can range from 100 Mbps to 2-3 Gbps per core, according to the adapter model and system specs. In fact any adapter is supported by PF_RING using standard drivers, with performance limited by the standard kernel mechanisms. On the other side, in this case you can run packet capture from multiple applications using the same Network interface, as interface access is not mutually exclusive (the kernel module takes care of delivering traffic, sending a copy to each application).
Below we summarise the results of our survey. The tables report the best adapter in terms of price/performance for the use case selected
Up to 10 Gbit
|Application||Best Price/Performance Adapter|
|ntopng||Any PF_RING ZC supported adapter|
|nProbe||Any PF_RING ZC supported adapter|
|nProbe Cento||Any PF_RING ZC supported adapter|
|n2disk||Any PF_RING ZC supported adapter|
|nScrub||Any PF_RING ZC supported adapter|
|Application||Best Price/Performance Adapter|
|ntopng||We advise to preprocess traffic with nProbe Cento|
|nProbe||Intel E810 with PF_RING ZC, with RSS and multiple processes (performance are limited by the nProbe configuration and can range from 10 to 20 Gbps)|