Speed-up binning
Binning is supposed to speed-up the process, it's especially important for the BM18 project where volumes have to be reconstructed as fast as possible.
However in practice:
- Full radios must still be loaded (the subsampling is done after averaging if we want to do things correctly), so I/O are the same
- The binning operation is slow, even the "clever" numpy approaches are single-threaded and take several ms for each frame.
So binning actually makes the data loading slower: we have IO + binning
instead of IO
.
# No binning, 4600 projs
C0 = ChunkReader(proc.dataset_info.projections, sub_region=(None, None, 0, 510), convert_float=True)
%time C0.load_data() # 5.88s for 12 GB => 2.04 GB/s
# Horizontal binning
C1 = ChunkReader(proc.dataset_info.projections, sub_region=(None, None, 0, 510), convert_float=True, binning=(2, 1))
%time C1.load_data() # 12s for 6 "final GB"
# (2, 2) binning
C1 = ChunkReader(proc.dataset_info.projections, sub_region=(None, None, 0, 510), convert_float=True, binning=2)
%time C1.load_data() # 11.2s for 3 "final GB"
The only way to speed-up IO
is to use rough subsampling at the HDF5 level, not sure it's acceptable as there might be aliasing.
Therefore, binning
has to be sped up. I tried:
- Numba,
@numba.njit(parallel=True)
(multi-threadingbinning()
on each image): no real speed-up, and it does not work on power9 (needsllvmlite
) - Cython (Multi-threading
binning()
on each image): Good: 5.87s with 16 threads - ThreadPool (distribute binning on threads): simple and efficient, 4.25s with 32 threads
Edited by Pierre Paleo