Skip to content

GitLab

Explore

Sign in

Vincent Favre-Nicolin
PyNX
Issues
#173

Runner re-organisation (CDI): separate and parallelise tasks

The runners have become hard to maintain due to their complexity, mixing loading, saving, plotting, MPI, etc... This needs to be reorganised for maintainability and performance.

Tasks:

use argparse instead of the non-standard approach to parameters
test 0MQ for communication:
- message format to pass messages ? (JSON or dictionnary)
- throughput for (large) numpy arrays
Separate tasks with classes:
- Input: load data and specialise parser. This would be derived for all instrument-specific versions (e.g. id01, id10 etc..)
- Preprocess
- Phasing / Phaser / Reconstructor / Processor/ CDIProcessor: run the algorithms
- Output: save/export data, including saving plots (not exclusively) - but would also be able to serve data
- separate plotting functions so they can be used by different classes
Implement parallel process with 0MQ for the different tasks:
- Worker: base class 0MQ-aware and able to communicate with the master and other workers
- Master: launching the different workers (derived from Worker)
- Input: derived from Input and Worker, loads data and transmits it
- Phasing: Phasing + Worker
- Output: Output + Worker

Proposed hierarchy:

pynx.cdi.runner
- input
- phasing
- output
- utils: could also be in pynx.cdi.plot if it's just plotting functions
- workers or zmq ?
  - worker
  - master
  - input
  - output
- scripts
  - pynx_cdi_master: base master class (abstract, not implemented as a real script)
  - pynx_cdi_id01: same functionality as existing id01 runner
  - pynx_cdi_id10: same functionality as existing id10 runner
  - pynx_cdi_worker: depending on the option used (--master,--loader, --writer, --phaser), will launch a specific worker, with only a few options including the port to use and -for the workers- the url (host:port) of the master

Notes:

this will likely create an incompatible change for those relying on the runner API. Not sure if we can keep a backward compatibility
some difficulty may arise from MPI (the Phasing process needs it)
benefits:
- main performance from parallelisation of input/processing/output
- ability to add UIs as clients, sending requests to the master
- in principle could also scale up by spawning slurm jobs as clients but this should better be orchestrated by ewoks

Edited Aug 23, 2023 by Vincent Favre-Nicolin

Assignee

Select assignees

Time tracking