BYO10GPR: Build Your Own 10 Gbit Packet Recorder

Posted · Add Comment

Packet recorder appliances are one of the last network components that have insane prices. Years ago this was justified by the fact that in order to capture traffic at high speed it was mandatory to use costly custom packet capture cards and often custom-designed hardware. With the advent of multi-10 Gbit packet capture technologies on commodity hardware such as PF_RING DNA, and the availability of high-performance computers such as those based on the Intel Sandy Bridge chipset the game has changed. Modern 10K RPM 6Gb/s SATA disks enable with 8 disks in RAID-0 the creation of an inexpensive storage system able to write to disk 10 Gbit of traffic. Of course you can use fewer disks if you plan to use SSD drives, as SSD duration issues seem to have been mostly solved on the latest drives generation.

However the hardware is just a part of the game, as putting speedy components into the same box does not make a fast packet recorder system. The reasons are manyfold:

  • Modern CPUs are going to be more energy efficient, so the CPU clock changes according to the load. Users must pay attention to configure the system in a way that CPU efficiency is constant (tools like cpufreq allows you to specify this, and i7z allows you to see what happens) or this might lead to packet loss during traffic peaks as the CPU is not quick enough to increase its speed as network traffic changes.
  • Latest Intel CPUs such as the E5 series used in high-end (uniprocessor and multiprocessor NUMA) servers, have typically low clock speeds (typically 1.8-2.0 GHz) in its entry/mid-range, whereas if you need clocks over 2.4 GHz, be prepared to spend a significant budget.
  • 10 Gbit NICs must be attached to the same node (of a NUMA system) where your n2disk application is running. As people often use dual 10 Gbit ports, do not make the mistake to run one n2disk instance per port, each on a different node to balance the system. In fact the dual-10 Gbit NIC is physically attached to a single PCIe slot, and thus to a single node. This way, the second n2disk instance running on the second NUMA node will not access packets directly but via the QPI bus thus decreasing its performance. In this case it’s way better to use two single-port 10 Gbit NICs, and connect each card to a PCIe slot attached to the node where n2disk is active, thus avoiding to cross the QPI bus. Note that we do not claim that in order to monitor 2 x 10 Gbit links (or a single 10 Gbit link monitored via a network tap that splits the two traffic directions across two ports) you need a two node NUMA system as one uniprocessor system might be enough; instead we want you to realise that simple equations such as one node = one 10 Gbit port, might be more complicated than expected.
  • At 10 Gbit, in particular if you want to index packets (for quick packet search without the need to sequentially search your multi-Terabyte archive) during packet capture as n2disk does, you need to use a hardware system able to deliver enough horse-power to carry on the packet capture job. In other words you need to have a CPU that has enough GHz to provide n2disk enough CPU cycles for carrying-on its task. Typically 2 GHz are almost sufficient (i.e. they are sufficient but you are very close to the physical limit of number of cycles available per packet), but at least 2.4 GHz are better to be on the safe side.

With our reference box based on Supermicro X9SCL powered by a Xeon E3-1230 (for storage we use a SATA LSI raid controller), we can capture to disk on dna0 while indexing packets in realtime with no loss, while injecting 10 Gbit (14.88 Mpps with 60+4 bytes packets) on dna1. The commands we used are:

  •  n2disk10g -i dna1 -o /tmp -p 1000 -b 2000 -q 0 -C 256 -S 0 -w 1 -c 2 -s 64 -R 3,4,5 -I
    .....
    23/Nov/2012 11:09:18 [n2disk.c:1196] [writer] Creating index file /tmp/35.pcap.idx
    23/Nov/2012 11:09:18 [n2disk.c:409] [PF_RING] Partial stats: 13760489 pkts rcvd/13760489 pkts filtered/0 pkts dropped [0.0%]
    23/Nov/2012 11:09:18 [n2disk.c:1101] [writer] Creating pcap file /tmp/36.pcap
    23/Nov/2012 11:09:19 [n2disk.c:1196] [writer] Creating index file /tmp/36.pcap.idx
    23/Nov/2012 11:09:19 [n2disk.c:409] [PF_RING] Partial stats: 13759959 pkts rcvd/13759959 pkts filtered/0 pkts dropped [0.0%]
    .....
  • pfsend -i dna0 -g 0
    .....
    TX rate: [current 14'880'821.24 pps/10.00 Gbps][average 14'877'984.82 pps/10.00 Gbps][total 5'977'466'918.00 pkts]
    .....
During this test, i7z was reporting:
Cpu speed from cpuinfo 3192.00Mhz
cpuinfo might be wrong if cpufreq is enabled. To guess correctly try estimating via tsc
Linux's inbuilt cpu_khz code emulated now
True Frequency (without accounting Turbo) 3192 MHz
 CPU Multiplier 32x || Bus clock frequency (BCLK) 99.75 MHz

Socket [0] - [physical cores=4, logical cores=8, max online cores ever=4]
 TURBO ENABLED on 4 Cores, Hyper Threading ON
 Max Frequency without considering Turbo 3291.75 MHz (99.75 x [33])
 Max TURBO Multiplier (if Enabled) with 1/2/3/4 Cores is  36x/35x/34x/33x
 Real Current Frequency 3291.96 MHz [99.75 x 33.00] (Max of below)
       Core [core-id]  :Actual Freq (Mult.)      C0%   Halt(C1)%  C3 %   C6 %   C7 %  Temp
       Core 1 [0]:       3291.75 (33.00x)       100       0       0       0       0    45
       Core 2 [1]:       3291.96 (33.00x)      21.6    77.8       0       0       0    41
       Core 3 [2]:       3291.86 (33.00x)      50.3    48.1       0       0       0    39
       Core 4 [3]:       3291.89 (33.00x)      39.5    59.2       0       0       0    38

C0 = Processor running without halting
C1 = Processor running with halts (States >C0 are power saver)
C3 = Cores running with PLL turned off and core cache turned off
C6 = Everything in C3 + core state saved to last level cache
 Above values in table are in percentage over the last 1 sec
[core-id] refers to core-id number in /proc/cpuinfo
'Garbage Values' message printed when garbage values are read

 

In essence if you have a fast storage, 20 Gbit to disk with 60+4 bytes packets is no longer a dream on commodity hardware.

Just to make this long story short, in order to BYO10GPR you need:

  • A fast (but not too fast) storage system.
  • Format and mount the disks properly (see the n2disk user’s guide that explains you how to do that) as a fast storage handled as a dummy disk won’t deliver the performance you need.
  • A CPU of at least 2.0 GHz, although we suggest 2.4 GHz or better. If you want to save money and be on the safe side, you better consider E3 CPU series (we use E3-1230 as explained previously) that for about 200$ delivers 3.2 GHz. If you want to use an E5 processor with 2.7 GHz, you need to spend almost an order of magnitude more.
  • A wise configuration of n2disk, so that the threads are allocated properly on cores.
  • Use of DNA drivers on the interfaces used for packet capture.

For those who do not want to deal with all these low-end details, of course can use one of the nBox recorders we pre-build for you.