Why nProbe+JSON+ZMQ instead of native sFlow/NetFlow support in ntopng?

Posted · Add Comment

Both sFlow and NetFlow/IPFIX are the two leading network monitoring protocols used today on the market. They are two binary protocols encapsulated over UDP, with data flowing (mono-directional) from the probe (usually a physical network device or a software probe such as nProbe)  to the collector (a PC that receives traffic and handles is or dumps it on a database). This architecture has been used for decades, it still makes sense from the device point of view but not for the application (developer) point of view for many reasons:

  1. The transport in NetFlow/sFlow has been created from the point of view of the device (probe) that has to send flows to all configured collectors. This means that all collectors will receive all the flows, and that all flows (regardless of their nature) will be thus sent to all collectors. Example: if you want to send to collector A only HTTP flows, and to collector B only VoIP flows it is not possible. The probe will send everything to everyone. All the time. Imagine to have a TV that is not tuned to your favourite channel at a given time, but that shows all the channels simultaneously.
  2. UDP has limitations with MTUs. The use of VPNs (with a smaller than 1500 bytes MTU) is relatively common so probes have to deal with this problem. Another problem is that it is not possible to deliver data larger than a 1400 bytes or so into a UDP packet. This means that a large HTTP cookie won’t fit onto a UDP packet and thus that you have to cut your information up to a specific upper bound. Not nice in particular if the information must be received uncut such as URLs for instance.
  3. NetFlowV9/IPFIX (and in part sFlow too) have been created with the concept of template, so the collector must store and understand the flow template in order to decode data. This means complications, errors, retransmission of templates, and so on.
  4. Due to the need to keep NetFlow templates small in size, sending a flow that contains an email header (Subject, From, To, CC, Bcc) can become a nightmare as this flow must be divided into sub flows all linked with a unique key. Example <MessageId, To Address 1>, <MessageId, To Address 2>, … Not so nice.
  5. The collector has to handle the probes idiosyncrasies with the results that flows coming from different probes might not necessarily have the same format (or flow template if you wish).
  6. Adding new fields (e.g. the Mime-Type to a HTTP Flow) to existing templates might require extra effort on the collector side.
  7. The probe cannot send partial flows easily or periodic updates (e.g. every sec a probe sends VoIP monitoring metrics) unless further specific templates are defined.

All the above facts have been enough to let us move to a different way of shipping monitoring data from the device to the collector. The application that uses monitoring data must:

  1. Receive data ready to be used. Handling templates is old fashion and must be confined on a territory/place near the probe but this complexity should not pollute all the actors that are planning to use monitoring data.
  2. The data transport should be sitting on top of TCP, so that a probe can send arbitrary long data without having to cut this data or care of MTUs.
  3. The TCP-based transport must be connectionless, namely if the probe or the collector die/disconnect the transport will handle the problem, as well it will transparently handle a future reconnection. In essence we want the freedom of a connection-less protocol over a connection oriented probe.
  4. Monitoring data should be divided in channels/topics, so that the app that has monitoring data will publish the available channels, and the data consumers will subscribe to one or multiple channels and thus receive only the information they are interested in.
  5. The data format can change over time, new fields can be added/removed as needed. For instance if nProbe monitors a YouTube video, it should send the VideoID into the flow, but in case of non-YouTube a flow can be emitted but without such field. In NetFlow doing that means create as many templates as all the combinations, or send templates with field with empty values (but that still take space at the network transport level).
  6. Receive data on a format that is plain and easy. For instance in NetFlow the flow start time (FIRST_SWITCHED) is the “sysUptime in msec at which the first packet of this Flow was switched”.So the application is limited to ms precision and in order to know this time we must first know the sysUpTime, do some math, and compute this time. Odd and naïve I believe. If there is a field, its value must be immediately available and not precomputed based on other fields that complicate the application logic.
  7. Interpret the fields it handles, and discard those that cannot be handled as they are unknown. This grants application evolution over time so that new fields are added and only the old ones are handled by legacy apps that continue to work unmodified while new apps can also handle the new fields.

ZMQ

In order to implement all this we made some design choices:

  1. Data is shipped in JSON format. Everyone can read and understand it, and in particular web browsers. The format is human-friendly and easy to read, but in the near future we might move to compressed or binary (or both) formats for efficiency reasons. The flowField is identified by a number as specified int the NetFlow RFC (FIRST_SWITCHED is mapped to 22), and the field value is printed in string format. For instance   {8:"192.168.0.200",12:"64.243.24.160",15:"0.0.0.0",10:0,14:0,2:13,1:987,22:1379457349,21:1379457349,7:50141,11:80,6:27,4:6,5:0,16:0,17:3561,9:0,13:0} represents the flow [tcp] 64.243.24.160:80 -> 192.168.0.200:50142 [12 pkt/11693 bytes].
  2. nProbe can be used as pure probe (i.e. it convert flows into flows) or as proxy (i.e. it acts as a sFlow/NetFlow collector with respect to ntopng. In no case ntopng will receive raw flows, but only via JSON.
  3. ZMQ is a great transport that allows ntopng to connect to nProbe and fetch data via ZQM only for the topics it is interested in. Currently ntopng subscribes to “flows” topic, but in the future this will change and be configurable as more topics can be subscribed. So on the above picture the arrow from nProbe to ntopng depicts the information flow, but physically is ntopng that connects (as client) to nProbe that instead acts as data source. If nProbe or ntopng are restarted, the transport takes cares of all these issues so the apps do not see any degradation or have to explicitly reimplement reconnections.

As explained in the ntopng README file, nProbe and ntopng must be started as follows:

  1. Flow collection/generation (nProbe)
    nprobe --zmq "tcp://*:5556" -i eth1 -n none (probe mode)
    nprobe --zmq "tcp://*:5556" -i none -n none --collector-port 2055 (sFlow/NetFlow collector mode)
  2. Data Collector (ntopng)
    ntopng -i tcp://127.0.0.1:5556

This means that nProbe creates a TCP endpoint available on all interfaces (* stands for all) active at the port 5556. ntopng instead is instructed to connect via TCP to such endpoint as client (in essence it is the opposite of NetFlow/sFlow). To the same nProbe endpoint you can connect multiple probes or even a zmq listener application.

Like said before, this is just the beginning. Using the above solution we can create new apps that would be much more complicated to develop by relying just on sFlow/NetFlow.

NOTE:

  1. In order to use this solution you MUST have a recent copy of nProbe that is supported with ZMQ. If unsure please check this first (nprobe -h|grep zmq).
  2. This is an interesting thread about the use of JSON in network monitoring,
  3. We (at ntop) do not plan to discontinue sFlow/NetFlow/IPFIX support on our products. We just want to say that their complexity cannot be propagated to all apps, most of which live in web browser or are coded with modern languages whose developers like to focus on the problem (network monitoring) rather than on how data is exchanged across monitoring apps. In a way think of sFlow/NetFlow/IPFIX of a old serial port, and JSON as a USB port. You can use a serial-to-USB converter, but serial ports on PCs are now legacy. nProbe is our serial-to-USB converter, and ntopng is a USB-only app.