Commit a9fb767d authored by Pierre Paleo's avatar Pierre Paleo
Browse files

Add doc on distribution method

parent c1d34ac9
Pipeline #20322 passed with stage
in 1 minute and 27 seconds
......@@ -6,11 +6,39 @@ An essential feature of Nabu is to distribute the computations on the local mach
Nabu is designed to process large volumes of data acquired in synchrotrons. Usually, these facilities have an on-site computing cluster to process the acquired data.
Nabu makes advantage of the parallel-beam geometry of synchrotron beam. The data is divided in [Chunks](definitions.md#radios-chunks) ; and each chunk gives a series of slices after the reconstruction. Critically, the parallel-beam geometry allows to dispatch work done on chunks in a completely independent way (see [computations distribution](nabu_tasks.md#computations-distribution)). This means that each chunk of data can be processed on a separate computing node.
Thanks to the parallel-beam geometry of synchrotron beams, the computations distribution is fairly simple. The data is divided in [Chunks](definitions.md#radios-chunks) ; and each chunk gives a series of slices after the reconstruction. The chunks are processed in a completely independent way (see [Limitations#computations distribution](nabu_tasks.md#computations-distribution)). This means that each chunk of data can be processed on a separate computing node.
## How to distribute the computations ?
Nabu makes a clear distinction between "what to do" (processing steps) and "how to do it" (computations distribution). This means that you can modify the processing steps without modifying the tasks distribution configuration, and conversely.
Nabu makes a clear distinction between "what to do" (processing steps) and "how to do it" (computations distribution). This means that
- on the user side, the processing steps can be modified without modifying the tasks distribution configuration, and conversely ;
- on the developer side, parts related to computations distribution are decoupled from the tasks definitions and processing components ;
The computations distribution is specified from the [configuration file](nabu_config_file.md) (or alternatively the [ProcessConfig](apidoc/nabu.resources.processconfig) class), in the section `[resources]`.
## In details
Distributing the computations means mapping the tasks to be done, to computing resources. In the following, we use the following terminology:
- A *worker* is a computing resource. Each worker is defined by its available resources (memory, GPU, CPU cores) and the address it can be reached with. There are *Nw* workers, each possibly having different resources.
- A *task* is the description of the work to be done. In our case, the high-level task is "process the chunk of data number *k*". There are *Nt* tasks in total.
A computation distribution is a mapping between the set of *Nt* tasks to the set of *Nw* workers.
There are two main approaches:
1. The mapping "tasks <-> workers" is known in advance. A given worker will execute specific tasks, depending for example on its resources.
2. The mapping "tasks <-> workers" is not known in advance.
Approach (1) is well-suited for heterogeneous computing, where workers have different resources. Knowing each worker resources enables to do an efficient tasks distribution. However, it entails to program this cumbersome distribution logic, while off-the-shelf software like `dask.distributed` provide a scheduler with tasks dispatching. If the scheduling is not carefully done, workers might become idle if they finish their work earlier than other workers, because of they have different resources.
Approach (2) is typical of homogeneous computing. It is assumed that computations are distributed on a computing cluster made of similar machines.
In Nabu, approach (2) was chosen. The rationale is to delegate the scheduling work to the readily available `dask.distributed` software, and focus the development efforts on tomography. Tasks are distributed to workers by a scheduler, ensuring that no worker stays idle. If a worker cannot handle a given task (ex. chunk too big), it can either
- Give the task back to the scheduler (so the task becomes available for other workers)
- Cut the task into sub-tasks, if possible
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment