Many ntop products such as ntopng, nProbe, and PF_RING FT just to name a few are based on network flows. However not all our users know in detail what is a network flow, and how it works in practice. This blog post describes what they are and how they work in practice.
What is a network flow?
A network flow is a set of packets with common properties. They often are identified by a 5-tuple key meaning that all packets of a given flow have the same source and destination IP, source and destination port, and application protocol (e.g. TCP). In practice the flow key also includes at least the VLAN Id and eventually other attributes such as the tunnel Id for encapsulated traffic. A flow is a way to classify traffic by clustering packets with a common key, and it is similar to what you see on computers when you run commands such as netstat -na. Each flows has various counters that keep track of flow packets/bytes and various other attributes such as the flow timers (time of first and last flow packet) , statistics (retransmission, packets out-of-order,…) and security attributes (e.g. the flow risk).
How flows are stored in memory?
Network flows are kept in a data structure named flow cache (often implemented using a hash table) that is constantly fed with incoming packets. The flow cache stores in memory the active flows (i.e. those that are still active as packets belonging to the flow are received). Below you can see how ntopng displays the live flow cache and their 5-tuple key.
When does a network flow start?
A network flow starts as soon as the first flow packet is observed. At startup the flow cache is empty and it is filled up as packets are received. Each incoming packet is decoded and the flow key computed. Such key is searched in the flow cache: if not found a new entry is added to the flow cache, otherwise the existing entry with such key is updated, i.e. the counter for flow packets/bytes and timers are updated. So in essence a flow starts when the first flow packet is observed.
When does a network flow end?
Each flow has two aging timers: idle (it keeps track of how much time is past since the last flow packet has been received) and duration timers (it keeps track of how long the flow is lasting). A flow ends when one of these two aging timers are expired, namely when a flow is idle for too long (e.g. no packets have been received for a while) or when a flow is stored for too long in the flow cache. In nProbe and PF_RING FT. when a flow is expired it is removed from the flow cache and sent to the collector. In ntopng instead, a flow is removed from the flow cache only for idleness, as long-lasting flows are not removed from the cache. The reason is that a flow probe such as nProbe needs to periodically report to the collector (e.g. ntopng) information about the monitored traffic and thus flows are “cut” and sent to the collector. Instead in ntopng there is no need to inform collectors and thus flows stay in memory as long as necessary as configured in preferences.
Flow Keys and Directions
If flows are created when the first flow packet is received, we can expect to see the flow client as the real network client. For instance is client on host 18.104.22.168 SSHs to host 22.214.171.124, the flow for such communication will be 126.96.36.199:X <-> 188.8.131.52:22 (we assume that SSH is running on port 22). Look good right? Well sometimes you see that in the flow cache such flow is reported as 184.108.40.206:22 <-> 220.127.116.11:X. Why this? This can be due to various reasons:
- The application (e.g. ntopng) started after the flow begun, and the first packet observed by ntopng is 18.104.22.168:22 -> 22.214.171.124:X instead of 126.96.36.199:X -> 188.8.131.52:22.
- The flow was stored in the cache with the correct key, but no packet was exchanged for a while (e.g. 2 minutes) and thus the application has declared the flow as expired, removing it from the flow cache. Then if suddenly a new packet is observed, such packet might be sent in the wrong direction (e.g. 184.108.40.206:22 -> 220.127.116.11:X) as this might be a keep-alive packet of the server. In this case the flow is placed in the cache with the reverse (9and thus wrong) direction.
ntopng (via preferences) and nProbe (using the command line with -t and -d) flow timeouts can be configured, hence these issues mitigated (albeit not fully addressed). However timeout tweaking is not enough in particular with UDP flows because contrary to TCP, there are no TCP flags that can be used to guess the real flow direction. For this reason, ntopng implements some heuristic to swap flow directions but this heuristic cannot be too aggressive as we might report invalid information.
We hope that this post clarified how flow-based network traffic analysis works and why some “unexpected” behaviour is sometimes observed, not because of a bug, but because of the nature of these measurements.