mpi / OpenCL issue
Hello, while working on the hermes ptycho code, I found that the OpenCL mpi does not work out of the box.
I identify the issue in the pynx/processing_unit/init.py
@ -425,12 +425,13 @@ class ProcessingUnit(object):
r = mpi.scatter(local_ranks, root=0)
# Before assigning the GPU, sort the found devices by their address, because the device
# order can be different on two process on the same node (is that also true for opencl?)
- benchmark_results = [(r[0], r[1], r[0].int_ptr) for r in benchmark_results]
+ import pyopencl
+ benchmark_results = [(r[0], r[1], r[0].get_info(pyopencl.device_info.PCI_BUS_ID_NV)) for r in benchmark_results]
benchmark_results = list(sorted(benchmark_results, key=lambda t: t[2]))
if verbose:
print("select_gpu using MPI: node=%s mpi_rank=%d, using GPU #%d/%d ptr:" %
(platform.node(), mpi.Get_rank(), r % nb, nb), benchmark_results[r % nb][0].int_ptr)
- return self.set_device(benchmark_results[gpu_rank][r % nb], test_fft=False, verbose=verbose)
+ return self.set_device(benchmark_results[r % nb][0], test_fft=False, verbose=verbose)
if ranking in ['fft', 'bandwidth']:
benchmark_results = sorted(benchmark_results, key=lambda t: -t[1])
There is two issues here.
- the int_ptr value is not unique between the process, so it is possible to select the same GPU card.
- the benchmark_results is not indexed the right way. We can see thaht the cuda code use the right forme (the one with the +)
In this version of the code, I did a quick fix for (1), but this is not great. If we have an AMD card the PCI_BUS_ID_NV is not available (I guess). So we should find a way to extract an unique identifier for each process unit and something which does not depends on the process.
Cheers