select_gpu is not always called with the gpu_name

Re salut Vincent, j'ai une machine comme ca...

J'utilise dans mon cas deux cartes sur les 4. J'imagine la 0 et la 1

picca@re-grades-01:~/src/gitlab.synchrotron-soleil.fr/hermes-beamline/ptychohermesscripts$ nvidia-smi
Tue Oct 24 15:46:42 2023      
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:84:00.0 Off |                  N/A |
| 30%   31C    P8    16W / 350W |      8MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            On   | 00000000:85:00.0 Off |                    0 |
| N/A   30C    P8     9W /  70W |      9MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  On   | 00000000:86:00.0 Off |                  N/A |
| 30%   30C    P8    20W / 350W |      8MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  On   | 00000000:87:00.0 Off |                  N/A |
| 30%   28C    P8    23W / 350W |      8MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                              
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3059      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      3059      G   /usr/lib/xorg/Xorg                  4MiB |
|    2   N/A  N/A      3059      G   /usr/lib/xorg/Xorg                  4MiB |
|    3   N/A  N/A      3059      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

Le output de pynx en mode mpi me donne ca

###############
Ptycho runner: preparing processing unit
Computing speed for available CUDA GPU [ranking by global memory bandwidth]:
Ptycho runner: preparing processing unit
Computing speed for available CUDA GPU [ranking by global memory bandwidth]:
                                     NVIDIA GeForce RTX 3090:   23 Gb,   365 Gbytes/s
                                     NVIDIA GeForce RTX 3090:   23 Gb,   372 Gbytes/s
                                     NVIDIA GeForce RTX 3090:   23 Gb,   364 Gbytes/s
                                     NVIDIA GeForce RTX 3090:   23 Gb,   371 Gbytes/s
                                     NVIDIA GeForce RTX 3090:   23 Gb,   362 Gbytes/s
                                     NVIDIA GeForce RTX 3090:   23 Gb,   368 Gbytes/s
                                                    Tesla T4:   14 Gb,   107 Gbytes/s
                                                    Tesla T4:   14 Gb,   106 Gbytes/s
select_gpu using MPI: node=re-grades-01 mpi_rank=1, using GPU #1/4 PCI: 0000:85:00.0
select_gpu using MPI: node=re-grades-01 mpi_rank=0, using GPU #0/4 PCI: 0000:84:00.0
Using CUDA GPU: NVIDIA GeForce RTX 3090
Using CUDA GPU=> setting large stack size (613) (override with stack_size=N)
Using CUDA GPU: Tesla T4
Using CUDA GPU=> setting large stack size (613) (override with stack_size=N)

Donc il prend bien la 0 et la 1.

De ce que je comprends avec les lignes suivantes

mpi multiscan
MPI # 1 analysing scans: (2,)
###############
Processing nrj number 2
###############
MPI # 0 analysing scans: (1, 3)
###############
Processing nrj number 1
###############

Il va traiter l'nrj 2 sur la deuxieme carte donc la T4 et l'nrj 1 et 3 sur lq carte 0 donc la 3090

Je vois bien le traitement se faire sur la 3090.

mais je pense que j'ai ce message d'erreur sur la T4.

Please give the number of GPUs to be used (if nrj_points < 12, give nrj_points, otherwise 12): 2
Traceback (most recent call last):
  File "/mnt/home-re-grades-02/experiences/instrumentation/picca/src/gitlab.synchrotron-soleil.fr/hermes-beamline/ptychohermesscripts/./pynx-ptycho-hermes", line 44, in <module>
    main()
  File "/mnt/home-re-grades-02/experiences/instrumentation/picca/src/gitlab.synchrotron-soleil.fr/hermes-beamline/ptychohermesscripts/./pynx-ptycho-hermes", line 30, in main
    w.process_scans()
  File "/home/experiences/instrumentation/picca/src/gitlab.esrf.fr/picca/PyNX/pynx/ptycho/runner/runner.py", line 3041, in process_scans
    self.ws.run(reuse_ptycho=reuse_ptycho)
  File "/home/experiences/instrumentation/picca/src/gitlab.esrf.fr/picca/PyNX/pynx/ptycho/runner/runner.py", line 1689, in run
    self.p = ScaleObjProbe(verbose=True) * self.p
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
  File "/home/experiences/instrumentation/picca/src/gitlab.esrf.fr/picca/PyNX/pynx/operator/__init__.py", line 61, in __mul__
    self.apply_ops_mul(w)
  File "/home/experiences/instrumentation/picca/src/gitlab.esrf.fr/picca/PyNX/pynx/ptycho/cu_operator.py", line 812, in apply_ops_mul
    return super(CUOperatorPtycho, self).apply_ops_mul(pty)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/experiences/instrumentation/picca/src/gitlab.esrf.fr/picca/PyNX/pynx/operator/__init__.py", line 177, in apply_ops_mul
    o.prepare_data(w)
  File "/home/experiences/instrumentation/picca/src/gitlab.esrf.fr/picca/PyNX/pynx/ptycho/cu_operator.py", line 871, in prepare_data
    p._cu_psi = cua.empty(shape=(len(p._obj), len(p._probe), self.processing_unit.cu_stack_size, ny, nx),
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pycuda/gpuarray.py", line 268, in __init__
    self.gpudata = self.allocator(self.size * self.dtype.itemsize)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pycuda._driver.MemoryError: memory_pool::allocate failed: out of memory - failed to free memory for allocation
invalid command name "140499303860160delayed_destroy"
    while executing
"140499303860160delayed_destroy"
    ("after" script)
invalid command name "140499305030912delayed_destroy"
    while executing
"140499305030912delayed_destroy"
    ("after" script)
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[9394,1],1]
  Exit code:    1
--------------------------------------------------------------------------

L'allocation mémoire n'est pas suffisente sur la T4

Edited Nov 07, 2023 by Picca Frédéric-Emmanuel