Not All Servers Are Alike (With DNA)

Posted · Add Comment

PF_RING DNA is a great success for us as we see the users community grow every day. At the same time, sometimes we receive complains of people who say that they can’t reach the performance we observed (i.e. 1/10 Gbit RX and TX wire-rate with any packet size) in our laboratory.

Today thanks to Donald Skidmore of Intel we have found a way to measure whether a certain server is adequate (from the hardware point of view) for the wire rate in particular with small packets. The problem is apparently due to the memory bandwidth that on some systems is surprisingly limited. It is worth to remark that memory type/speed is important but it is also important how this memory is connected to the rest of the system.



The above figure depicts the architecture of system (Supermicro X9SCL) we use in our labs. As you see, DDR3 memory is connected directly to the CPU: this means that the CPU can access it directly (good). Unfortunately not all PCIe 8x slots, used for fitting 10 Gbit cards, are alike. If you look at this figure you will see that:

  • The two top left PCIe 8x slots are attached using a x8 slot
  • The bottom PCIe 8x slot (the one attached to Cougar Point PCH), it has a physical x8 slot (i.e. it looks like a x8 slot) but internally it is connected with a 5.0 Gbit bus shared across many other components (e.g. SATA and 1 Gbit ports).

This means that when you connect your 10Gbit NIC into a PCI 8x slot, you do not have to choose a free slot, but you must choose a slot that can offer the best performance. Just to give you an idea with our X9SCL system, installing the NIC into the bottom left x8 PCI slot the top performance is 11 Mpps, whereas on the top left slots we can achieve wire rate.

The tool Donald suggested to use is numademo that is part of the numactl package and that is included in some Linux distributions (e.g. RedHat Linux 6.x). For our tests we use the latest version available (numactl-2.0.8-rc2) for download at time of writing. We have compiled the package and run the command numademo 128M memcpy on some systems we have access to, all equipped with 82599-based 10 Gbit NICs and a recent Linux distribution (e.g. RedHat 6.1 or Ubuntu Server 11.04). We have put the results we collected on the table below along with the result of pfsend -i dna:ethX -l 60 -n 0. Some of them are very surprising.

 

Server
Chipset
Avg Memory Bandwidth
(as measured by numademo)
10 Gbit DNA Max TX Rate
Dell R810 / 4 x Xeon L7555  Intel E7510  6102.86 MB/s  No (< 2 Mpps)
Dell R710 (a)  Intel 5520  17839.09 MB/s  Almost (14.23 Mpps)
Dell R710 / Xeon E5620 (b)  Intel 5520 7779.38 MB/s  No (way)
Dell R710 / Xeon X5677 (c)  Intel 5520 12100.95 MB/s (Not tested yet)
Dell R710 / Xeon X5550 (d) Intel 5520 28355.46 MB/s Wire-rate (14.88 Mpps)
Dell R610 / 2 x Xeon E5620  Intel 5520 13826.47 MB/s No (11.75 Mpps)
Dell R510 / Xeon X3430  Intel 5520 9089.40 MB/s Yes (@ 1 Gbit, not tested at 10 Gbit)
Dell R310 / Xeon X3430  Intel 3420 22374.00 MB/s No (12.7 Mpps)
Supermicro X8SIL / Xeon X3440  Intel 3420 22105.83 MB/s  Wire-rate (14.88 Mpps)
Supermicro X9SCL/ Xeon E3-1230  Intel C202 PCH 34888.03 MB/s  Wire-rate (14.88 Mpps)
IBM Blade/ 2 x Xeon E5645 @ 2.40 GHz  Intel 5520 8518.62 MB/s No (11.04 Mpps)

 

After performing the tests, and comparing the results, our conclusions are:

  • DNA is pushing the hardware to its limits. If you do not need 10 Gbit wire rate, and you have never tested DNA you have probably never realized these issues, although a system with limited memory bandwidth is not desirable.
  • With a memory bandwidth of 16GB/s we can reasonably expect to achieve wire rate with minimum packet sizes.
  • Better results do not necessarily mean better performance. For instance our X8SIL-based server can both send and receive from two ports of the same NIC at wire rate, whereas the X9SCL-based server on the same test conditions will not be able to exceed 10 Mpps.
  • We need to understand why two servers belonging to the same product family (Dell R710) perform so differently. Unfortunately we do not own them, so we do not have full specs of the machines (yet).
  • CPU is important, but chipset type and memory bandwidth are much more important.
  • If you need 10 Gbit wire rate, the prerequisite is that you need an adequate server.
Finally, many thanks to Andrew Lehane of Agilent for his persistence in tests, until we found a reason to explain the weird performance we observed.
Continue reading part 2 of this post.