.. _ClickHouseTimeseriesAdvanced:

ClickHouse Timeseries
#####################

This page describes the ClickHouse timeseries driver in detail. For a high-level overview of
all timeseries drivers, see :ref:`BasicConceptsTimeseries`.

.. note::

   ClickHouse timeseries support requires an **Enterprise M** or better license and the
   :code:`HAVE_CLICKHOUSE` compile-time flag. It is not available on Windows.

Overview
--------

The ClickHouse timeseries driver stores all traffic metrics produced by ntopng (interface
traffic, per-host statistics, application protocol breakdowns, etc.) in a ClickHouse columnar
database. This provides high write throughput, flexible time-range queries, automatic data
retention via TTL, and optional cluster/cloud deployment.

All timeseries data lands in a single table (``timeseries``) whose rows carry a schema name,
a timestamp, a tag map (key-value pairs used for filtering), and a metric map (the actual
numeric measurements). The driver is a drop-in replacement for the RRD and InfluxDB drivers —
the same ntopng charts and API calls work regardless of which driver is active.

Configuration
-------------

Deployment Modes
~~~~~~~~~~~~~~~~

The ClickHouse timeseries driver is selected via the ``-F`` option, which also configures the
connection parameters. Please check the **Flows Dump** section for configuring ClickHouse.

.. note::

   The same ``-F`` flag is used for ClickHouse **flow dump** (historical flows) and ClickHouse
   **timeseries**. When ``-F clickhouse`` (or a variant) is set, both feature sets share the
   same ClickHouse server connection and database.

Timeseries Driver Preference
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The active timeseries driver can also be changed at runtime from the ntopng preferences UI
under *Timeseries* → *Driver*. Setting it to ``clickhouse`` activates this driver for all new
time-series writes.

Data Retention
~~~~~~~~~~~~~~

Retention is controlled by the *Timeseries Data Retention* preference (in days, default 365).
The value is applied as a ClickHouse ``TTL`` clause on the ``timeseries`` table. When retention
changes, the table DDL is re-issued with the new TTL and ClickHouse handles background cleanup
automatically — no manual purge is required.

Database Schema
---------------

Table Definition
~~~~~~~~~~~~~~~~

On first startup (or when the retention period changes), ntopng creates the following table
in the configured ClickHouse database (default: ``ntopng``):

.. code-block:: sql

   CREATE TABLE IF NOT EXISTS `ntopng`.`timeseries`
   (
       `schema_name`  LowCardinality(String),
       `tstamp`       DateTime CODEC(Delta, ZSTD),
       `tags`         Map(LowCardinality(String), String),
       `metrics`      Map(LowCardinality(String), Float64)
   )
   ENGINE = MergeTree()
   PARTITION BY toYYYYMM(tstamp)
   ORDER BY (schema_name, tstamp)
   TTL tstamp + toIntervalDay(<retention_days>)

Column Details
~~~~~~~~~~~~~~

``schema_name`` (``LowCardinality(String)``)
  Identifies the type of measurement, e.g. ``iface:traffic``, ``host:traffic``,
  ``interface:ndpi``. Using ``LowCardinality`` avoids redundant string storage for the small
  set of distinct schema identifiers.

``tstamp`` (``DateTime CODEC(Delta, ZSTD)``)
  Unix timestamp of the data point. The Delta codec stores differences between consecutive
  timestamps (reducing entropy for regular time series), then ZSTD compresses the result.

``tags`` (``Map(LowCardinality(String), String)``)
  Key-value pairs used to filter and group data. Keys are bounded-cardinality identifiers such
  as ``ifid``, ``host``, or ``protocol``. Values are the corresponding entity identifiers.
  Example: ``{'ifid': '0', 'host': '192.168.1.1'}``.

``metrics`` (``Map(LowCardinality(String), Float64)``)
  The numeric measurements. Keys are metric names (``bytes``, ``packets``, ``duration_ms``,
  etc.); values are ``Float64`` measurements.
  Example: ``{'bytes': 1048576.0, 'packets': 1024.0}``.

Partitioning and Ordering
~~~~~~~~~~~~~~~~~~~~~~~~~

Data is partitioned monthly (``toYYYYMM(tstamp)``), which allows ClickHouse to prune entire
partitions when querying a specific time range. Within each partition, rows are sorted by
``(schema_name, tstamp)``, optimising the most common query pattern: filter by schema, scan
a time range.

Query Behaviour
---------------

Counter vs. Gauge Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~

Each timeseries schema declares whether its metrics are *counters* (monotonically increasing
totals, e.g. cumulative bytes) or *gauges* (instantaneous values, e.g. CPU load percentage).

- **Counter** metrics: the driver fetches one extra time bucket before the requested range,
  computes per-bucket deltas using a ``lag()`` window function, and divides by the bucket
  width to produce rates (e.g., bytes/second).
- **Gauge** metrics: the driver aggregates (``avg``, ``max``, or ``min``) directly within
  each time bucket.

Comparison with Other Drivers
------------------------------

.. list-table::
   :header-rows: 1
   :widths: 20 25 25 30

   * - Aspect
     - RRD
     - InfluxDB
     - ClickHouse
   * - Storage
     - Per-entity ``.rrd`` files
     - InfluxDB server (line protocol)
     - Centralised columnar table
   * - Write model
     - Synchronous file write per entity
     - HTTP line-protocol POST
     - Async in-memory queue → batch INSERT (TCP)
   * - Flush interval
     - Each periodic activity
     - Each periodic activity
     - Every 5 seconds (up to 2 000 rows/batch)
   * - Retention model
     - Fixed RRA archives (1 year total)
     - Retention policies
     - TTL clause, configurable in days
   * - Top-K queries
     - Scan many files; slow at scale
     - InfluxDB continuous queries
     - Single SQL query; fast at scale
   * - Cluster/HA
     - None (local filesystem)
     - InfluxDB Enterprise / OSS cluster
     - ClickHouse native cluster replication
   * - License required
     - Community+
     - Community+
     - Enterprise M+