Skip to content
Snippets Groups Projects
README.md 2.7 KiB
Newer Older
Wout De Nolf's avatar
Wout De Nolf committed
# workflow_concepts

Wout De Nolf's avatar
Wout De Nolf committed
This project is meant to find the optimal workflow eco-system to be maintained by the ESRF DAU.
Wout De Nolf's avatar
Wout De Nolf committed

Wout De Nolf's avatar
Wout De Nolf committed
## Getting started

Wout De Nolf's avatar
Wout De Nolf committed
Developers install of all projects in this workflow eco-system
Wout De Nolf's avatar
Wout De Nolf committed
```bash
./devinstall.sh
```

Run the tests

```bash
pytest
```

Wout De Nolf's avatar
Wout De Nolf committed
Jupyter notebook examples
Wout De Nolf's avatar
Wout De Nolf committed

```bash
Wout De Nolf's avatar
Wout De Nolf committed
python -m jupyter notebook examples/
Wout De Nolf's avatar
Wout De Nolf committed
Run a script example
Wout De Nolf's avatar
Wout De Nolf committed

```bash
python examples/running_taskgraphs.py --plot
Wout De Nolf's avatar
Wout De Nolf committed
```

Wout De Nolf's avatar
Wout De Nolf committed
## Eco-system
Wout De Nolf's avatar
Wout De Nolf committed

### Core projects

Wout De Nolf's avatar
Wout De Nolf committed
Common runtime and persistent representation of tasks and task graphs
Wout De Nolf's avatar
Wout De Nolf committed
* *esrftaskgraph*: based on *networkx* with task and data management (currently a proof-of-concept with JSON files)
Wout De Nolf's avatar
Wout De Nolf committed

Bindings
Wout De Nolf's avatar
Wout De Nolf committed
* *esrf2pypushflow*: multiprocessing task scheduler
Wout De Nolf's avatar
Wout De Nolf committed
* *esrf2dask*: local and centralized task schedulers (DAGs only)
Wout De Nolf's avatar
Wout De Nolf committed
* *esrf2orange3*: graph design and execution GUI (DAGs only)
Wout De Nolf's avatar
Wout De Nolf committed

Wout De Nolf's avatar
Wout De Nolf committed
Bindings we will probably not use
Wout De Nolf's avatar
Wout De Nolf committed
* *esrf2luigi*: local and centralized task scheduler (DAGs only)
Wout De Nolf's avatar
Wout De Nolf committed
* *esrf2multiprocessing*: multiprocessing task scheduler (DAGs only)
* *esrf2paradag*: multitheading task scheduler (DAGs only)

Wout De Nolf's avatar
Wout De Nolf committed
### Technique specific projects
Only dependent on *esrftaskgraph*
Wout De Nolf's avatar
Wout De Nolf committed
* *tasklib*: example library of task implementations based on the *esrftaskgraph* abstraction.
* *graphlib*: example library with graphs of *tasklib* tasks. These are beamline/proposal specific graphs.
Wout De Nolf's avatar
Wout De Nolf committed

Using the bindings
Wout De Nolf's avatar
Wout De Nolf committed
* *orange3widgetlib*: example library of Orange widgets for *tasklib* tasks.
Wout De Nolf's avatar
Wout De Nolf committed

Some task libraries could become part of the core projects (PCA, FFT, SIFT, ...).

Wout De Nolf's avatar
Wout De Nolf committed
## Nomenclature

### Graphs
* *Graph* (a.k.a. Task Graph, Workflow): list of Tasks and Links
* *Graph instance*: a *Graph* with fixed static inputs
Wout De Nolf's avatar
Wout De Nolf committed
* *Task* (a.k.a. Node, Process): node in a task *Graph*
* *Task instance*: a *Task* with fixed inputs
Wout De Nolf's avatar
Wout De Nolf committed
* *Link* (a.k.a Edge): edge of a *Graph*
* *Variable*: task input and output
* *Pipeline*: a linear task *Graph*
Wout De Nolf's avatar
Wout De Nolf committed

Wout De Nolf's avatar
Wout De Nolf committed
#### Types of graphs:
Wout De Nolf's avatar
Wout De Nolf committed
* Directed graphs: edges have orientations (uni/bi-directional)
Wout De Nolf's avatar
Wout De Nolf committed
  * Simple directed graphs: no self-connecting nodes
Wout De Nolf's avatar
Wout De Nolf committed
    * Oriented graphs: no bidirectional edges
      * Directed acyclic graphs (DAGs): no directed cycles

Wout De Nolf's avatar
Wout De Nolf committed
We aim at supporting *Directed graphs*.

Wout De Nolf's avatar
Wout De Nolf committed
#### Graph design:
Wout De Nolf's avatar
Wout De Nolf committed
Apart from persistent representations (json, yaml, ...), graphs can be designed with GUIs. Often these GUIs also allow starting and monitoring a task scheduler.
Wout De Nolf's avatar
Wout De Nolf committed

Examples: Orange3, Rabix, ...

Wout De Nolf's avatar
Wout De Nolf committed
### Schedulers
Wout De Nolf's avatar
Wout De Nolf committed
Scheduler (a.k.a. Engine, Executor)
Wout De Nolf's avatar
Wout De Nolf committed

Wout De Nolf's avatar
Wout De Nolf committed
#### Task Scheduler
Execute/distribute tasks from a *graph instance*
Wout De Nolf's avatar
Wout De Nolf committed

Wout De Nolf's avatar
Wout De Nolf committed
Examples: Luigi, Airflow, dask, Orange3, pypushflow, Rabix, ...
Wout De Nolf's avatar
Wout De Nolf committed

Wout De Nolf's avatar
Wout De Nolf committed
#### Job Scheduler
Wout De Nolf's avatar
Wout De Nolf committed
Execute/distribute jobs

A *job* is one execution of a *graph instance* to completion, error or interrupt. A job could however be anything else, not related to task graphs.
Wout De Nolf's avatar
Wout De Nolf committed

Examples: Zocalo