Optimize Redis node walking
Closes #2141 (closed)
The issue
-
Initialization phase of walking a
DataNode
takes long due to existing keys (#2141 (closed)). This affects Flint, the writer andScanDataListener
(F5). -
The writer shows elevated CPU (30% on my machine), even when scanning with
save=False
. This is because the session listener subscribes to all streams, even though it only needs the scan events.The solution below causes the CPU usage to drop to 0% when scanning with
save=False
.
The solution
The DataNode.walk
method now has additional arguments for filtering (applies to all the walk methods):
-
include_filter
: only yield nodes/events for these nodes (identical to the oldfilter
argument) -
exclude_children
: no events from children of these nodes (recursive) -
exclude_existing_children
: no events from existing children of these nodes (recursive). Defaults toexclude_children
.
In addition, when filtering on the node type, filtering is performed before DataNode
creation (only getting the absolute necessary from Redis) instead of afterwards.
The old filter
argument is still there with a deprecation warning in favour of include_filter
.
Usage
-
watch_session_scans
has an argumentexclude_existing_scans
(disabled by default to preserve old behaviour) which usesexclude_existing_children=("scan", "scan_group")
to skip children of existing scans. - Flint uses
watch_session_scans(..., exclude_existing_scans=True)
. This should solve #2141 (closed). - Flint uses
session_node.walk(..., include_filter=scan_types, exclude_children=scan_types)
to get the scan history -
ScanDataListener
(F5) useswatch_session_scans(..., exclude_existing_scans=True)
. -
NexusSessionWriter
usesexclude_children=("scan", "scan_group")
for the session listener.