One of the great consequences of the DNA design, is that user-space applications can now transmit and receive packets without going through the kernel TCP/IP stack at all. This can be profitably used to reduce network latency bypassing the stack, and reading the number of user-space stacks that have been developed in the past years (e.g. OpenOnload) it seems that low-latency is becoming increasingly important these days. In particular there are specific markets such as finance and trading, where all the operators need to have the same chance to trade and thus minor (for the rest of us) details such as fibre length (that has to be equal for all operators) are very important.
Before speaking of latency, let’s consider a simple test scenario where a traffic generator sends minimal size packets to a bounce server that as soon as it receives the packet, sends it back to the sender unmodified. The servers are Linux based, use a 10 Gbit Intel-based 82599 NIC and are connected using optical fibre.
The latency is measured using a NIC with hardware timestamps. The bounce server is implemented using pfdnabounce (part of the PF_RING example code) started as pfdnabounce -i dna:ethX -a where ethX is the bounce interface on which the probe packet is received and sent back to the sender.
Let’s now understand some basic concepts. Suppose we use 64 bytes probe packets without VLAN (14 bytes of header, 4 bytes of CRC and 46 bytes of payload in the middle). To this we need to add inter-packet gap (12 bytes) and preamble (8 bytes). In total we have 14+4+46+12+8 = 84 bytes that boil down to 84 x 8 = 672 bits. At 10 Gbit the time to serialize those bits take 67.2 nsec. Doing that twice in a round trip with zero cable length (i.e., equipment loopback) would be 2 x 67.2 = 134.4 nsec. This is the theoretical minimum latency that you can expect for the shortest packet on the wire. Add the length of your cable to that by taking into account the speed of light in fibre, if your cable is longer than a few metres. So the minimum round-trip latency would be ~0.13 usec. Similarly, the longest standard packet is 1518 bytes plus preamble plus inter packet gap, and in this case the minimum theoretical latency is ~2.5 usec. This is the minimal amount of time needed only for moving data back and forth. On top of these minima, we need to add processing layers (including the 10 Gbit controller/MAC, packet polling, memory latency, packet transmission, data copy between RX and TX memory areas) that add consistent latency to those numbers.
With DNA using small probe packets, we have measured a round-trip latency of < 5 usec. We believe that these numbers are very good considered that:
- the 82599 chipset is used in commodity adapters, and it does not feature (nor pretend to have) any hardware acceleration for low-latency.
- DNA reduces latency to the minimum as the user-space application we used in our tests (pfdnabounce) speaks directly with the NIC, completely bypassing the stack.
- Googling what specialized companies offer in term of latency, DNA is definitively on track with the state of the art.