Resolve "nexus writer: chunking and compression"
Closes #2362 (closed)
Needs !3184 (merged)
Proper HDF5 chunking and compression:
- The chunk shape is calculated with h5py's
guess_chunk
- can handle variable length dimensions marked by a zero (variable length scans or variable detector dimensions like a sampling diode in SAMPLES mode)
- the border case of scalar datasets (0D detector of a
ct
scan) cannot be handled byguess_chunk
but we don't need chunking anyway in this case
-
gzip
compression is used when the total dataset size > 10KB- variable length scans like a timescan always use
gzip
compression
- variable length scans like a timescan always use
The writer will buffer data and save it in multiples of the chunk shape (so datasets are also resized in multiples of the chunk shape). When the data arrives too slow (it takes longer than 3 seconds to accumulate 1 chunk of data) it will save data not aligned to the chunks. This will reduce write performance but as this happens only for slow data rates, it shouldn't be a problem.
In addition, buffered data is flushed (so ignoring chunk-aligned writing) as part of the finalization and error handling.
Note that none of this affects the lima data. This is for 0D detectors (diodes) and 1D detectors (MCAs).
@wright @sole @jerome.kieffer What do you think? The 10KB and 3 seconds are currently an uneducated guess.
Edit: rules for chunking and compression changed (discussion below)