Decompress data on GPU

The data is compressed with bitshuffle/LZ4.

Currently, the processing sequence is roughly

1. load raw data (from GPFS usually)
2. decompress on CPU (bitshuffle/LZ4 implementation from hdf5plugin)
3. send full frame on GPU
4. Perform AI

Now if a GPU decompressor is available, the processing sequence becomes

1. load raw data (from GPFS usually)
2. send raw data to GPU
3. decompress on GPU
4. Perform AI

We benefit from both sending raw data to GPU (10X less data) and (hopefully) decompressing on GPU (5X faster ?).

There are, however, two difficulties.

The first difficulty is to read raw data (instead of transparently-decompressed by hdf5plugin). This can be done as follows:

for i in range(ds.id.get_num_chunks()): 
    filter_mask, chunk = ds.id.read_direct_chunk(ds.id.get_chunk_info(i).chunk_offset)

The second difficulty is that the above code won't work for virtual datasets. So the distribution of integration tasks has to be done on the scan????/pilatus_????.h5 files directly.

But StackIntegrator is designed to process data from start_idx and end_idx, which is incompatible with direct chunk read. Only entry points like process_full_dataset() should be used.

To do:

Add a init parameter: decompress_on_GPU
browse the current dataset: list the virtual sources, for each chunk, get the path to the actual data.
Add load_data_raw (see above snippet)
Add a decompress_lz4 (using GPU decompressor)
Modify process_full_dataset

Note that the original process_stack() will be incompatible with decompress_on_GPU=True

Edited Nov 07, 2022 by Pierre Paleo