Skip to content

Optimize Redis node walking

Closes #2141 (closed)

The issue

  1. Initialization phase of walking a DataNode takes long due to existing keys (#2141 (closed)). This affects Flint, the writer and ScanDataListener (F5).

  2. The writer shows elevated CPU (30% on my machine), even when scanning with save=False. This is because the session listener subscribes to all streams, even though it only needs the scan events.

    The solution below causes the CPU usage to drop to 0% when scanning with save=False.

The solution

The DataNode.walk method now has additional arguments for filtering (applies to all the walk methods):

  • include_filter: only yield nodes/events for these nodes (identical to the old filter argument)
  • exclude_children: no events from children of these nodes (recursive)
  • exclude_existing_children: no events from existing children of these nodes (recursive). Defaults to exclude_children.

In addition, when filtering on the node type, filtering is performed before DataNode creation (only getting the absolute necessary from Redis) instead of afterwards.

The old filter argument is still there with a deprecation warning in favour of include_filter.

Usage

  • watch_session_scans has an argument exclude_existing_scans (disabled by default to preserve old behaviour) which uses exclude_existing_children=("scan", "scan_group") to skip children of existing scans.
  • Flint uses watch_session_scans(..., exclude_existing_scans=True). This should solve #2141 (closed).
  • Flint uses session_node.walk(..., include_filter=scan_types, exclude_children=scan_types) to get the scan history
  • ScanDataListener (F5) uses watch_session_scans(..., exclude_existing_scans=True).
  • NexusSessionWriter uses exclude_children=("scan", "scan_group") for the session listener.
Edited by Wout De Nolf

Merge request reports