Hardware-based Symmetric Flow Balancing in DNA

Posted · Add Comment

Years ago, Microsoft defined RSS (Receive-Side Scaling) with the goal of improving packet processing by enabling multiple cores to process packets concurrently. Today RSS is implemented in modern 1-10 Gbit network adapters as a way to distribute packets across RX queues. When incoming packets are received, network adapters (in hardware) decode the packet and hash the main packet header fields (e.g. IP address and port). The hash result is used to identify into which ingress RX queue the packet will be queued.

In order to balance the traffic evenly on the RX queues, RSS implements asymmetric hash. This means that packets belonging to a  TCP connection between host A and B, will go to two different queues: A-to-B will go to queue X, and B-to-A will go to queue Y, where X is different than Y. This mechanism guarantees that the traffic is distributed as much as possible on all available queues, but it has some drawbacks as applications that need to analyze bi-directional traffic (e.g. network monitoring and security applications) will need to read packets from all queues simultaneously in order to receive both traffic directions. This means that asymmetric RSS limits application scalability, as it is not possible to start one application per RX queue (and thus the more queues you have, the more apps you can start) it’s necessary to read packets from all queues in order to receive both traffic directions. In a scalable system instead, applications must be able to operate independently so that each of them is a self-contained system as depicted in the figure below.

In PF_RING DNA (starting with version 5.4.3) we have added the ability to reconfigure the RSS mechanism via software, so that DNA/libzero applications can decide what RSS type they need (non-DNA applications cannot reconfigure RSS yet). In general, asymmetric RSS is enough for apps that operate per-packet (e.g. a network bridge), whereas symmetric RSS is the ideal solution for apps such as IDS/IPS and network monitoring applications that instead need full flow visibility.

The advantage of symmetric RSS is that it is now possible to achieve scalability, by binding applications to individual queues. Example, suppose you have a 8-queue DNA-aware network adapter, you can start 8 snort instances, binding each of them to a different queue (i.e. dna0@0, dna0@1…, dna0@7) and core. Each instance is then independent from the other instances, and it can operate properly as it sees both directions of the traffic.

For those who need advanced traffic balancing not based on packet headers (e.g. you want to balance VoIP calls based on the telephone number of the caller), you can take advantage of libzero. In the PF_RING demo applications, we have created a couple of examples based on libzero (pfdbacluster_master and pfdnacluster_multithread), that demonstrate how you can implement flexible packet balancing (see -m command line option of both applications).

Reference
[1] Scalable TCP Session Monitoring with Symmetric Receive-side Scaling