Benchmarking PF_RING DNA

Posted · Add Comment

For years networking companies have used the buzzword zero-copy to qualify those hardware/software solutions that allow applications to play with packets without the need to copy them at all. Zero-copy needs DMA (Direct Memory Access) for operating so that applications do not get a (shallow) copy of packets but they actually get the pointer to the packet. As you probably know, PF_RING DNA allows applications to access packets in zero-copy so that in the pfring_recv() call you get a pointer to the packet just receive. Whereas in traditional PF_RING you always get a copy of a packet portion (limited to the snaplen) sitting on a memory ring living on the kernel that is accessed in DMA.

In DNA, zero-copy applies not just to packet capture but also to packet transmission, and bouncing (i.e. you receive a packet on one interface and forward it to another interface). All those who have tested DNA have realized that an old Core2Duo 1.86 GHz is enough for handling TX line rate at 10G, so in general if you own an adequate server, DNA can solve all your RX and TX needs.

Over the past 6 months we have made many changes to DNA, in particular for making it more flexible not just for packet capture but also for packet processing. We have then decided to run some new performance tests in order to position DNA with similar solution such as netmap, and  Intel DPDK that is probably the reference software in terms of packet processing on commodity hardware.

For our tests we used a entry-level server Supermicro X9SCL powered by a Xeon E31230 (4 cores + HyperThreading) fitted with two dual 10 Gbit NICs. For traffic generator we used a Spirent (4 x 10 Gbit ports) kindly provided by Silicom. For packet capture we used pfcount, and for packet receive+forwarding we have used pfdnabounce with a pre-release version of the libzero that will be releasing soon and that allows to operate in zero-copy across network interfaces.

Test 1: Packet Capture

We have connected all 4 ports of the Spirent to the four 10 Gbit ports. We injected 64 bytes packets on all ports. pfcount’s were started as follows

# ~/PF_RING/userland/examples/pfcount -i dna0 -a -g 0
# ~/PF_RING/userland/examples/pfcount -i dna1 -a -g 1
# ~/PF_RING/userland/examples/pfcount -i dna2 -a -g 2
# ~/PF_RING/userland/examples/pfcount -i dna3 -a -g 3

where each pfcount is bound to a different core. The ixgbe DNA driver has been loaded with a single RX queue (as you will read later on this article, one queue is enough).

rmmod ixgbe
insmod ./PF_RING/drivers/DNA/ixgbe-3.7.17-DNA/src/ixgbe.ko RSS=0,0,0,0
ifconfig dna0 up
ifconfig dna1 up
ifconfig dna2 up
ifconfig dna3 up
echo "1" > /proc/irq/52/smp_affinity
echo "2" > /proc/irq/52/smp_affinity
echo "4" > /proc/irq/54/smp_affinity
echo "8" > /proc/irq/56/smp_affinity

Each pfcount has reported minimal packet drop, happened when the Spirent started to inject traffic.A generic pfcount stat is the following:

Absolute Stats: [1041759799 pkts rcvd][14323 pkts dropped]  Total Pkts=1041774122/Dropped=0.0 %
1'041'759'799 pkts - 87'507'823'116 bytes [15'096'354.76 pkt/sec - 10'144.75 Mbit/sec]
Actual Stats: 14882462 pkts [1'000.10 ms][14'880'973.90 pps/10.00 Gbps]

So we basically have lost (at startup) about 14k packets over a billion packets received. This while 4 pfcount instances were running simultaneously for a total packet capture rate of 4 x 14.880 Mpps, with each pfcount instance loading the CPU at about 90%.

Test 2: Packet Forwarding

We have connected the Spirent as traffic generator, received and then forwarded packets back to the Spirent using pfdnabounce, and then compared the number of received packets with the original number of packets sent. This is a setup similar to RFC 2544.

As you can read we have lost 139 packets out of 1 billion, while forwarding 64 bytes packets at line rate. Also on this case the loss happened when we started the test.

Final Remarks

So while we’re still trying to improve DNA so that we can see zero loss also on these extreme conditions, we are pretty happy of the tests outcome. First of all because when in 2005 we started the PF_RING project, we would have never imagined that it would have been feasible to capture almost 60 million/pps on a machine that costs about 500 Euro (plus NICs). Second because DNA isn’t second to DPDK  according to some benchmarks you can find googling a bit. We believe that this is a great result for our open source project, although we are working to continuously improve DNA.