Refactor getting data from nodes
There is currently no reasonable way of reading DataStream
slices in pipeline other than all available data (see also !3172 (merged)). Also the utility functions like get_data
are not very useful in the new streaming age (see #2342 (closed)).
I want to take the stream reading/indexing part out of ChannelDataNode
into a new class derived from the current DataStream
(I'm talking about __getitem__
, get
and get_as_array
). As opposed to DataStream
, where the stream index is a b"timestamp-number"
, the index would become b"start-end"
(which basically represents the slice contained in this event, right now this is just "start"
which is why we need to decode the event to get "end" or get the length of the ChannelDataNode
). The API of this new class would be as close as possible to that of a python list
and hide all the Redis stream specific stuff. The only difference with a python array is that the first part could be missing. In addition, the usage in pipelines need to be expanded to getting any slice, not just the available data.
Long story short: this is the moment to really think what the API for getting data from ChannelDataNode
and LimaDataNode
should be. I already added __getitem__
in the past which raises IndexError
whenever appropriate. However I did not change the behaviour of get
and get_as_array
as they would return None or an empty array when data was missing. I think there is no reason the preserve that behaviour.