Decompress data on GPU
The data is compressed with bitshuffle/LZ4.
Currently, the processing sequence is roughly
1. load raw data (from GPFS usually)
2. decompress on CPU (bitshuffle/LZ4 implementation from hdf5plugin)
3. send full frame on GPU
4. Perform AI
Now if a GPU decompressor is available, the processing sequence becomes
1. load raw data (from GPFS usually)
2. send raw data to GPU
3. decompress on GPU
4. Perform AI
We benefit from both sending raw data to GPU (10X less data) and (hopefully) decompressing on GPU (5X faster ?).
There are, however, two difficulties.
The first difficulty is to read raw data (instead of transparently-decompressed by hdf5plugin). This can be done as follows:
for i in range(ds.id.get_num_chunks()):
filter_mask, chunk = ds.id.read_direct_chunk(ds.id.get_chunk_info(i).chunk_offset)
The second difficulty is that the above code won't work for virtual datasets.
So the distribution of integration tasks has to be done on the scan????/pilatus_????.h5
files directly.
But StackIntegrator
is designed to process data from start_idx
and end_idx
, which is incompatible with direct chunk read. Only entry points like process_full_dataset()
should be used.
To do:
- Add a init parameter:
decompress_on_GPU
- browse the current dataset: list the virtual sources, for each chunk, get the path to the actual data.
- Add
load_data_raw
(see above snippet) - Add a
decompress_lz4
(using GPU decompressor) - Modify
process_full_dataset
Note that the original process_stack()
will be incompatible with decompress_on_GPU=True