Processing source node spin-lock on the FIFO output consumes CPU cycles on low frame-rate regimes
In low frame-rate regimes, one CPU thread is completely consumed by reading the processing FIFO. Such approach is not (very) OS-friendly.
The boost::lockfree::spsc_queue
requires memory_fence
s on both the producer and the consumer. A more OS-friendly implementation could introduce a std::mutex + std::condition_variable
pair on top of the lockfree::spsc_queue
, which are systematically used by the producer while the consumer only uses it when pop
fails. Such solution ensures a deterministic latency in both operations while not over-consuming CPU cycles in low-speed regimes.
Another possibility is to use Intell OneTBB concurrent_bounded_queue
. It seems to have a multi-producer/multi-receiver approach, with the corresponding additional complexity.