Processing source node spin-lock on the FIFO output consumes CPU cycles on low frame-rate regimes

In low frame-rate regimes, one CPU thread is completely consumed by reading the processing FIFO. Such approach is not (very) OS-friendly.

The boost::lockfree::spsc_queue requires memory_fences on both the producer and the consumer. A more OS-friendly implementation could introduce a std::mutex + std::condition_variable pair on top of the lockfree::spsc_queue, which are systematically used by the producer while the consumer only uses it when pop fails. Such solution ensures a deterministic latency in both operations while not over-consuming CPU cycles in low-speed regimes.

Another possibility is to use Intell OneTBB concurrent_bounded_queue. It seems to have a multi-producer/multi-receiver approach, with the corresponding additional complexity.