nProbe with FastBit database: an innovative flows storage solution

Posted · Add Comment

nProbe, acronym for NetFlow probe, is an open-source probe that supports both NetFlow and sFlow collection. It has been designed to keep up with Gigabit speeds on commodity hardware and it can be used for capturing packets and analyzing networks at full speed with no (or very moderate) packet loss using PF_RING.

Each captured packet is analyzed, associated to a flow record, and periodically, the expired flows are emitted and exported to the specified collectors. nProbe is fully inter-operable with commercial collectors and open source tools such as ntop.

The new version of nProbe (that will be released soon) has been extended and now contains a new storage system designed primarily to answer queries efficiently.

The new storage system

When nProbe is used as probe and collector, it supports flow collection and storage, both on raw files and relational databases such as MySQL and SQLite.

Support of relational databases has always been controversial as nProbe users appreciated the ability to query flow records using SQL, but at the same time flow dump to database could lead to flow records loss due to the database-processing overhead. On the contrary, the speed advantage of dumping flow records in raw format is paid at each search operation in terms of amount of data to read. Furthermore, the query language that can be used is limited when compared to SQL facilities.

In order to overcome the limitations of existing flow-management systems, an extension of nProbe has been developed. The new version of nProbe allows flow records to be stored on disk, using an innovative column-oriented database with an efficient compressed bitmap indexing technology named FastBit.

New nProbe flow record collection and export architecture

Conceptually FastBit is a database that stores its content by column, rather than by row (this structure is known as “vertical organization”). Data is represented as tables with rows and columns. A large table may be partitioned into many data partitions and each of them is stored on a distinct directory, with each column stored as a separated file in raw binary form. Users can configure partition duration (in minutes) at runtime and when a partition reaches its maximum duration, a new one is automatically created.

Furthermore, for tasks that demand the fastest possible query processing speed, bitmap indexes perform extremely well. These because the intersection between the search results on each variable is a simple AND operation over the resulting bitmaps. The consequence of this major speed improvement is that it is now possible to query data in real-time.

Additional details

The new extended nProbe creates FastBit partitions depending on the flow templates being configured (in probe mode) or read from incoming flows (in collector mode). Below there is a simple example where nProbe is configured to dump flow records using a temporary directory with a rotation period of 10 minutes:

nprobe -n none -i eth0 --fastbit /tmp/fastbit/ --fastbit-rotation 10 --fastbit-template "%IPV4_SRC_ADDR %IPV4_DST_ADDR %IN_PKTS %IN_BYTES %OUT_PKTS %OUT_BYTES %FIRST_SWITCHED %LAST_SWITCHED %L4_SRC_PORT %L4_DST_PORT %TCP_FLAGS %PROTOCOL"

Flow records can be dumped at full speed with no index-build overhead. Thus, not considering flow receive/decoding overhead, it is possible to save on disk more than one million flow records/sec on a standard Serial ATA (SATA) disk.

Additional advantages of this technology are listed below:

  • Ability to save flow records on disk with minimal overhead allowing no-loss on-the-fly flow-to-disk storage, as it happens with tools based on raw files.
  • Compact data storage to limit disk usage as this enables users to store months of flow records on a cheap hard-disk with no need to use expensive storage systems.
  • Simple data archive structure in order to move ancient data on off-line storage systems, without having to use complex data partitioning solutions.
  • On tens of millions of records: sub-second search time when performing cardinality searches (e.g. count the number or records that satisfy a certain criteria) and sub-minute search time when extracting records matching a certain criteria (e.g. top X hosts and their total traffic on TCP port Y).

If you want to know more about this topic or view the results of comparative tests that were performed, you can read the research paper named “Collection and Exploration of Large Data Monitoring Sets Using Bitmap Databases” (Proceedings of TMA 2010, Zurich – April 2010).

To know the new parameters of next release of nProbe allowing to store flow records in the FastBit database and to see some examples of use, you can read this manual.

If you are interested in nProbe, follow this link to know how to get it!