Accelerating packet capture and processing is a constant race. New hardware innovations, modern computing architectures, and improvements in packet capture (e.g. PF_RING) allow applications to reduce the (both CPU and real) time they need for processing packets. But the main question still holds: how much time do I have for processing packets? This is the main point.
A common misconception on this field is that hardware-accelerated cards will do the magic and solve all problems. This is a wrong statement. Technologies such as PF_RING, DNA, and those cards reduce the time you need to capture packets to (potentially) zero while reducing the number of CPU cycles your system needs for bringing packets from the wire up to the application. But still how much time do you have? Let’s compute it:
- 1 Gbit/s
1.48 Mpps => 1 sec / 1’488’000 => 0.672 usec / packet
- 10 Gbit
14.88 Mpps => 1 sec / 14’880’000 => 0.067 usec / packet
So in the worst case, at 1 Gbit you have 0.672 usec/packet. If your application is able to process packets (no matter what) within this time boundary, then you can say your application can handle wire rate at any packet size. At 10 Gbit you have 1/10 of the time you have at 1 Gbit. This is the total time you have, and on that time you have to both capture and process the packet. So you now understand why packet capture is important, because if it exceeds this time available, even if your processing time is zero, you have no chance to handle wire rate traffic.
In general the situation is not that bad, as your application has usually more time available, because usually the bandwidth used is not more that 50-60%, and packet sizes are around 512 bytes. This means that at 1 Gbit you have 235 Kpps, and at 10 Gbit 2.35 Mpps, leaving your apps some more time to relax. Note that this does not mean that your top target is this, but that your average target is close to this. This because network traffic is not constant, but you have peaks and spikes, and thus your application should be prepared to handle them properly.
Buffering can help of course, but only for a few packets. Here you have to think in terms of time and not packets that you can store in memory, as many people do. In fact you should think a buffer as a way to temporary absorb peaks while your application handles traffic. Or in other words as a way to enlarge the time and avoid dropping packets. Unfortunately buffers are pretty small, and making them larger is not a good idea (yet as many people think) as their memory will not stay in cache, and thus the system will spend time handling this buffer. Thus a buffer is a good solution for handling a spike but it does not increase your packet processing speed.
At the 10Gbit (but on a smaller scale also at 1 Gbit) there’s another important aspect to consider. As you can expect to process all packets, if and only if your application is faster than (or at least as fast as) the network you want to analyze. If this is the case, it means that from time to time the application has not enough packets to process and thus it needs to wait for them. In order to avoid wasting CPU cycles, your application can call poll(), select() or anything else that would allow your app to wait for packets. These functions are all system calls. If you measure the cost of s system call (you can compute this easily by creating a loop and calling a dummy sys call, then compute the number of sys calls you can issue per second), you will see that it costs around 1 usec. This is the cost of a dummy sys call, just to cross the boundary from user space to kernel; this cost will increase significantly if the sys call has a a buffer associated on which for instance you exchange information. If you go back to the top of this post, you see that 1 usec is enough for receiving 1.5 packets at 1 Gbit or 15 packets at 10 Gbit. This means that calling poll() for waiting for packets at 10 Gbit costs too much in terms of latency, as this sys call is not dummy at all, so on top of the cost of crossing the kernel boundary, it does something that further increases its duration. The consequence is that if your application is much faster than the network it might call poll() often, and when the sys call returns control, the buffer might be mostly full due to the poll() latency. So it’s probably better to consider doing active wait and call usleep(1) so that you have the (partial) guarantee that every usec your app can see what happened.
- Think in terms of time and not packets
- Be prepared to handle traffic spikes
- Large buffers do not increase packet processing speed
- Set an upper boundary to your application for processing packets
- Packet capture is important, but it is not the only ingredient for handling wire-rate traffic.