Commit 174d5d5e authored by Pierre Paleo's avatar Pierre Paleo
Browse files

Remove obsolete doc

parent 889d98e4
# Computations distribution
An essential feature of Nabu is to distribute the computations on the local machine or on a computing cluster.
## Rationale
Nabu is designed to process large volumes of data acquired in synchrotrons. Usually, these facilities have an on-site computing cluster to process the acquired data.
Thanks to the parallel-beam geometry of synchrotron beams, the computations distribution is fairly simple. The data is divided in [Chunks](definitions.md#radios-chunks) ; and each chunk gives a series of slices after the reconstruction. The chunks are processed in a completely independent way (see [Limitations#computations distribution](nabu_tasks.md#computations-distribution)). This means that each chunk of data can be processed on a separate computing node.
## How to distribute the computations ?
Nabu makes a clear distinction between "what to do" (processing steps) and "how to do it" (computations distribution). This means that
- on the user side, the processing steps can be modified without modifying the tasks distribution configuration, and conversely ;
- on the developer side, parts related to computations distribution are decoupled from the tasks definitions and processing components ;
The computations distribution is specified from the [configuration file](nabu_config_file.md) (or alternatively the [ProcessConfig](apidoc/nabu.resources.processconfig) class), in the section `[resources]`.
## In details
Distributing the computations means mapping the tasks to be done, to computing resources. In the following, we use the following terminology:
- A *worker* is a computing resource. Each worker is defined by its available resources (memory, GPU, CPU cores) and the address it can be reached with. There are *Nw* workers, each possibly having different resources.
- A *task* is the description of the work to be done. In our case, the high-level task is "process the chunk of data number *k*". There are *Nt* tasks in total.
A computation distribution is a mapping between the set of *Nt* tasks to the set of *Nw* workers.
There are two main approaches:
1. The mapping "tasks <-> workers" is known in advance. A given worker will execute specific tasks, depending for example on its resources.
2. The mapping "tasks <-> workers" is not known in advance.
Approach (1) is well-suited for heterogeneous computing, where workers have different resources. Knowing each worker resources enables to do an efficient tasks distribution. However, it entails to program this cumbersome distribution logic, while off-the-shelf software like `dask.distributed` provide a scheduler with tasks dispatching. If the scheduling is not carefully done, workers might become idle if they finish their work earlier than other workers, because of they have different resources.
Approach (2) is typical of homogeneous computing. It is assumed that computations are distributed on a computing cluster made of similar machines.
In Nabu, approach (2) was chosen. The rationale is to delegate the scheduling work to the readily available `dask.distributed` software, and focus the development efforts on tomography. Tasks are distributed to workers by a scheduler, ensuring that no worker stays idle. If a worker cannot handle a given task (ex. chunk too big), it can either
- Give the task back to the scheduler (so the task becomes available for other workers)
- Cut the task into sub-tasks, if possible
# The Nabu processing pipeline
This page explains how to use `nabu` as a tomography processing pipeline.
## 1. Introduction
Nabu is a library for tomography processing. As such, it can be used in various ways:
1. Calling the individual components of the library via the [Python API](apidoc/nabu). This is typically done when there are several specific tasks to perform, or when you are building a program for a specific purpose
2. Define a processing pipeline through [Nabu internal processing steps representation](nabu_tasks.md). This is not recommended as many checks will not be done.
3. Define a processing pipeline through the [configuration file](nabu_config_file.md) or the [ProcessConfig class](apidoc/nabu.resources.processconfig).
This page covers the points (2) and (3).
## 2. From configuration file to internal representation
This section explains how `nabu` digests a configuration file to build its final internal representation of processing steps.
A configuration file is first created by the user - either manually, or through a tool like `tomwer` or `nabu-config`.
From this configuration file, a [ProcessConfig object](apidoc/nabu.resources.processconfig) is built. This object has two main fields:
- `nabu_config` : the Python dictionary equivalent of the configuration file content, after some [validation steps](validators.md).
- `dataset_infos` : the result of the analysis of the dataset given in the configuration file. This structure gives information on how to access the data, along with some metadata (energy, distance, ...). Building this structure also entailed some checks (like removing unused radios).
The `ProcessConfig` object contains all the necessary information to perform the tomography processing steps. However, it is not structured in a way that can be directly used to distribute computations. This object is therefore converted to an internal representation describing the processing steps in a more formal representation.
This conversion is done by the [build_processing_steps()](apidoc/nabu.resources.tasks.rst#nabu.resources.tasks.build_processing_steps) function, resulting in two structures described in [processing steps representation](nabu_tasks.md). These structures are the final internal representation of Nabu processing pipeline.
## 3. Processing steps
In the current version, nabu defines the following processing steps:
- Read the data
- Flat-field normalization
- Radios-based rings artefacts correction
- CCD corrections
- Phase retrieval
- Sinogram-based rings artefacts correction
- Reconstruction
Each step can be enabled/disabled, but the order cannot be changed.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment