Commit 6b1fc9ec authored by Matias Guijarro's avatar Matias Guijarro

Merge branch '1341-data-saving-policy-structure-documentation' into 'master'

Resolve "Data saving/policy/structure documentation"

Closes #1341

See merge request !2009
parents 1853223c 406c23dd
Pipeline #21289 failed with stages
in 40 minutes and 40 seconds
Detectors publish their own metadata by default. Here we describe how to add user metadata. A more flexible and presistent way to add metadata is described [here](dev_data_metadata.md).
## Electronic logbook
Send user message to the [electronic logbook](https://data.esrf.fr)
```
DEMO [1]: lprint("user message in electronic logbook ")
```
## Scan comments
Add comments to your scans
```python
DEMO [1]: s = loopscan(10,0.1,run=False)
DEMO [2]: s.add_comment("This is a comment")
DEMO [3]: s.add_comment("This is another comment")
DEMO [4]: s.add_comment("And another one")
DEMO [4]: s.run()
```
Currently Bliss supports only one data format: [Nexus compliant](https://www.nexusformat.org/) HDF5 files written by the [Nexus writer](dev_data_nexus_server.md). Here we describe the logic of this Nexus structure.
In the example below we show the file of [one dataset](data_policy.md) which contains the data of three scans:
1. `ascan(samy, 0, 9, 9, 0.1, diode1, basler1, xmap1)` where diode1 is a diode, basler1 is a camera with one ROI defined and xmap1 an MCA controller with one channel
2. unspecified scan
3. a scan with two independent subscans (for example one subscan can be a temperature monitor scan)
```
sample_dataset.h5
├ 1.1 # first scan
| ├ instrument
| | ├ samy(@NXpositioner)
| | | └ value (10) # motor positions during scan
| | ├ diode1(@NXdetector)
| | | └ data (10)
| | ├ basler1(@NXdetector)
| | | ├ data (10, 2048, 2048)
| | | ├ acq_parameters # camera metadata
| | | | └ ...
| | | └ ctrl_parameters # camera metadata
| | | └ ...
| | ├ basler1_roi1(@NXdetector)
| | | ├ data (10)
| | | ├ avg (10)
| | | ├ std (10)
| | | ├ min (10)
| | | ├ max (10)
| | | └ selection # ROI metadata
| | | └ ...
| | ├ xmap1_det0(@NXdetector)
| | | ├ data (10, 2048)
| | | ├ elapsed_time (10)
| | | ├ live_time (10)
| | | ├ dead_time (10)
| | | ├ input_counts (10)
| | | ├ input_rate (10)
| | | ├ output_counts (10)
| | | └ output_rate (10)
| | ├ positioners
| | | ├ samx (1) # motor position at start
| | | ├ samy (10) # motor positions during scan
| | | └ samz (1) # motor position at start
| | ├ start_positioners
| | | ├ samx (1) # motor position at start
| | | ├ samy (1) # motor positions at start
| | | └ samz (1) # motor position at start
| └ measurement
| ├ samy (10)
| ├ diode1 (10)
| ├ basler1 (10, 2048, 2048)
| ├ basler1_roi1 (10)
| ├ basler1_roi1_avg (10)
| ├ basler1_roi1_std (10)
| ├ basler1_roi1_min (10)
| ├ basler1_roi1_max (10)
| ├ xmap1_det0 (10, 2048)
| ├ xmap1_det0_elapsed_time (10)
| ├ xmap1_det0_live_time (10)
| ├ xmap1_det0_dead_time (10)
| ├ xmap1_det0_input_counts (10)
| ├ xmap1_det0_input_rate (10)
| ├ xmap1_det0_output_counts (10)
| └ xmap1_det0_output_rate (10)
├ 2.1 # second scan
├ 3.1 # third scan
└ 3.2 # also third scan
```
So each scan contains two groups (plots and application definitions are not shown)
* *instrument*:
* all motors moving during the scan (*NXpositioner*: distance, time, energy, ...)
* all detectors enabled for the scan (*NXdetector*)
* *start_positioners*: snapshot of all motors before the scan
* *positioners*: like start_positioners
* *measurement*: flat list of all NXpositioner and NXdetector data
Note that each *NXdetector* contains one primary value called *data* and each *NXpositioner* contains one primary value called *value*. Additional datasets and groups represent secondary detector/positioner data or metadata such as detector settings.
A data policy determines data structure (file format and directory structure) and registeration of data collection with external services. BLISS comes with two data policies
1. The [ESRF data policy](#esrf-data-policy) which allows users to access their data and electronic logbook at https://data.esrf.fr. The data is written in [Nexus compliant](https://www.nexusformat.org/) HDF5 files in a specific directory structure.
2. The [basic data policy](#basic-data-policy) does not impose a data directory structure or register data with any external service. Data can (but does not have to be) written in [Nexus compliant](https://www.nexusformat.org/) HDF5 files.
Installation and configuration of the [ESRF](dev_data_policy_esrf.md) and [basic](dev_data_policy_basic.md) data policy in a BLISS session can be found elsewhere as well as how to create [custom](dev_data_policy_custom.md) data policies. Below we describe how to use the data policies.
## ESRF data policy
This data policy requires the user to specify *proposal*, *sample* and *dataset*. This will completely define how data is organized.
### Change proposal
```
DEMO [1]: newproposal("blc123")
Proposal set to 'blc123`
Data path: /data/id00/inhouse/blc123/id00/sample/sample_0001
```
When no proposal name is given, the default proposal is inhouse proposal `{beamline}{yymm}`. For example at ID21 in January 2020 the default proposal name is `id212001`.
The data root directory is derived from the proposal name
* no name given: `/data/{beamline}/inhouse/`
* *ih** and *blc**: `/data/{beamline}/inhouse/`
* *test**, *tmp** or *temp**: `/data/{beamline}/tmp/`
* all other names: `/data/visitor/`
These root path can be [configured](dev_data_policy_esrf.md#configuration) but these are the defaults.
### Change sample
```
DEMO [2]: newsample("sample1")
Sample set to 'sample1`
Data path: /data/id00/inhouse/blc123/id00/sample1/sample1_0001
```
When no sample name is given, the default sample name "sample" is used. Note that you can always come back to an existing sample.
### Change dataset
#### Named datasets
```
DEMO [3]: newdataset("area1")
Dataset set to 'area1`
Data path: /data/id00/inhouse/blc123/id00/sample1/sample1_area1
```
When the dataset already exists the name will be automatically incremented ("area1_0002", "area1_0003", ...). Note that you can never come back to the same dataset after you changed dataset.
#### Unnamed datasets
```
DEMO [4]: newdataset()
Dataset set to '0002`
Data path: /data/id00/inhouse/blc123/id00/sample1/sample1_0002
```
The dataset will be named automatically "0001", "0002", ... The dataset number is independent for each sample. Note that you can never come back to the same dataset after you changed dataset.
### Policy state
To get an overview of the current state of the data policy
```
DEMO [5]: SCAN_SAVING
Out [5]: Parameters (default) -
.user_name = 'denolf'
.images_path_template = 'scan{scan_number}'
.images_prefix = '{img_acq_device}_'
.date_format = '%Y%m%d'
.scan_number_format = '%04d'
.dataset_number_format = '%04d'
.technique = ''
.session = 'demo'
.date = '20200208'
.scan_name = '{scan_name}'
.scan_number = '{scan_number}'
.img_acq_device = '<images_* only> acquisition device name'
.writer = 'nexus'
.data_policy = 'ESRF'
.template = '{proposal}/{beamline}/{sample}/{sample}_{dataset}'
.beamline = 'id00'
.proposal = 'blc123'
.proposal_type = 'inhouse'
.base_path = '/data/id00/inhouse'
.sample = 'sample1'
.dataset = '0001'
.data_filename = '{sample}_{dataset}'
.images_path_relative = True
.creation_date = '2020-02-08-12:09'
.last_accessed = '2020-02-08-12:12'
-------------- --------- -------------------------------------------------------------------
exists filename /data/id00/inhouse/blc123/id00/sample1/sample1_0001/sample1_0001.h5
exists directory /data/id00/inhouse/blc123/id00/sample1/sample1_0001
Metadata RUNNING Dataset is running
-------------- --------- -------------------------------------------------------------------
```
## Basic data policy
This data policy requires the user to use the [`SCAN_SAVING`](dev_data_policy_basic.md#scan_saving) object directly to define where the data will be saved. The data location is completely determined by specifying *base_path*, *template*, *data_filename* and *writer*
```
DEMO [1]: SCAN_SAVING.base_path = "/tmp/data"
DEMO [2]: SCAN_SAVING.writer = "nexus"
DEMO [3]: SCAN_SAVING.template = "{date}/{session}/{mysubdir}"
DEMO [4]: SCAN_SAVING.date_format = "%y%b"
DEMO [5]: SCAN_SAVING.add("mysubdir", "sample1")
DEMO [6]: SCAN_SAVING.data_filename = "scan{scan_number}"
DEMO [7]: SCAN_SAVING.filename
Out [7]: '/tmp/data/20Feb/demo/sample1/scan{scan_number}.h5'
```
Note that each attribute can be a template string to be filled with other attributes from the [`SCAN_SAVING`](dev_data_policy_basic.md#scan_saving) object.
### Policy state
To get an overview of the current state of the data policy
```
DEMO [8]: SCAN_SAVING
Out [8]: Parameters (default) -
.base_path = '/tmp/data'
.data_filename = 'scan{scan_number}'
.user_name = 'denolf'
.template = '{date}/{session}/{mysubdir}'
.images_path_relative = True
.images_path_template = 'scan{scan_number}'
.images_prefix = '{img_acq_device}_'
.date_format = '%y%b'
.scan_number_format = '%04d'
.mysubdir = 'sample1'
.session = 'demo'
.date = '20Feb'
.scan_name = '{scan_name}'
.scan_number = '{scan_number}'
.img_acq_device = '<images_* only> acquisition device name'
.writer = 'nexus'
.data_policy = 'None'
.creation_date = '2020-02-08-12:04'
.last_accessed = '2020-02-08-12:05'
-------------- --------- -----------------------------------------------------------------
exists filename /tmp/data/20Feb/demo/sample1/scan{scan_number}.h5
exists directory /tmp/data/20Feb/demo/sample1
-------------- --------- -----------------------------------------------------------------
```
Open the [Nexus compliant](https://www.nexusformat.org/) HDF5 file written by the [Nexus writer](dev_data_nexus_server.md) with [pymca](http://pymca.sourceforge.net/)
```bash
pymca /data/visitor/hg123/id21/sample/sample_0001/sample_0001.h5
```
Open the [Nexus compliant](https://www.nexusformat.org/) HDF5 file written by the [Nexus writer](dev_data_nexus_server.md) with [silx](http://www.silx.org/)
```bash
silx view /data/visitor/hg123/id21/sample/sample_0001/sample_0001.h5
```
!!! warning
Do not use a silx version older than 0.12.0
If you want to look at the HDF5 file written by the [Nexus writer](dev_data_nexus_server.md) during a scan, use [silx](data_vis_silx.md) or [pymca](data_vis_pymca.md). Do not use third-party tools or custom scripts that are not approved by the ESRF Data Analysis Unit. More details can be found [here](dev_data_nexus_server.md#concurrent-reading).
!!! warning
A reader should never open the HDF5 file in append mode (which is the default in `h5py`). Even when only performing read operations, this will result in a corrupted file!
!!! warning
A reader which locks the HDF5 file (this happens by default, even in read-only mode) will prevent the Nexus writer from accessing the file and scans in BLISS will be prevented from starting!
![Screenshot](img/scan_data_flow_path.svg)
Data produced by BLISS is published into [Redis](https://redis.io/) (RAM
storage). In Redis, data is stored for a limited period of time (1 day by default) and for a limited amount (1GB by default).
Two primary [Redis](https://redis.io/) subscribers are provided by BLISS
1. The [Nexus writer](dev_data_nexus_server.md) for writing [Nexus compliant](https://www.nexusformat.org/) HDF5 files.
2. [Flint](flint_scan_plotting.md) for online data visualization
[Custom subscribers](dev_data_subscribing.md) can be created for other types of data processing.
......@@ -27,17 +27,34 @@ to be stored as metadata for the object, for the category.
The following example adds the position label of a Multiple Position object under the 'Instrument'
category to each scan metadata:
```python
from bliss.scanning import scan_meta
```python
from bliss.scanning import scan_meta
scan_meta_obj = scan_meta.get_user_scan_meta()
scan_meta_obj = scan_meta.get_user_scan_meta()
# mp is a BLISS Multiple Position object
scan_meta_obj.instrument.set(mp, lambda _: { "position_label": mp.position })
```
# mp is a BLISS Multiple Position object
scan_meta_obj.instrument.set(mp, lambda _: { "position_label": mp.position })
```
The function receives the scan object as argument. In the example above, this argument is ignored.
Each subsequent scan will have an 'instrument' section filled with the metadata:
![Screenshot](img/scan_meta.png)
### Examples
Refer to the [Nexus standard](https://manual.nexusformat.org) when adding metadata.
#### Devices
Choose an appropriate [device definition](https://manual.nexusformat.org/classes/base_classes/NXinstrument.html#nxinstrument) from the Nexus standard. For example an attenuator can be added as follows
```python
scan_meta_obj.instrument.set("myattenuator", {"myattenuator":
{"@NX_class":"NXattenuator",
"status": "in",
"type": "aluminium",
"thickness":{"@data":20., "@units": "um"}}})
```
# Basic data policy
## Architecture
This policy is meant for testing only. It does not enforce data structure (file format)
![Screenshot](img/scan_data_flow_path.svg)
## Summary
To enable the basic data policy
1. Install and run the [Nexus writer](dev_data_nexus_server.md) to write the data in Nexus format (optional)
2. Specify file directory and name using the [SCAN_SAVING](#scan_saving) object in the BLISS session
## SCAN_SAVING
......@@ -64,7 +71,7 @@ a dictionary, whose key `root_path` is the final path to scan files.
- `root_path`: `base_path` + interpolated template
- `data_path`: fullpath for the *data file* without the extension.
- `images_path`: path where image device should save (Lima)
- `parent`: parent node for publishing data via Redis
- `db_path_items`: used to create parent node for publishing data via Redis
- `writer`: Data file writer object.
!!! note
......@@ -75,9 +82,7 @@ a dictionary, whose key `root_path` is the final path to scan files.
#### SCAN_SAVING writer
`.writer` is a special member of `SCAN_SAVING`; it indicates which
writer to use for saving data. BLISS only supports the HDF5 file
format for scan data, although more writers could be added to the
project later.
writer to use for saving data. BLISS supports `"hdf5"` (internal writer in BLISS), `"nexus"` (the [Nexus writer](dev_data_nexus_server.md)) and `"null"` (writing disabled).
### Configuration example
......@@ -146,42 +151,3 @@ DEMO [13]: SCAN_SAVING.user_name='toto'
DEMO [14]: SCAN_SAVING.get_path()
Out [14]: '/data/visitor/unknown/lysozyme'
```
### Programers note
SCAN_SAVING is a `ParametersWardrobe`.
from `bliss/common/session.py`:
```python
class Session:
[...]
def setup(self, env_dict=None, verbose=False):
[...]
env_dict["SCAN_SAVING"] = ScanSaving(self.name)
```
from `bliss/scanning/scan.py`:
```python
class ScanSaving(ParametersWardrobe):
SLOTS = []
WRITER_MODULE_PATH = "bliss.scanning.writer"
[...]
def __init__(self, name=None):
[...]
_default_values = {
"base_path": "/tmp/scans",
"data_filename": "data",
[...]
def get(self):
try:
# calculate all parameters
except KeyError as keyname:
raise RuntimeError("Missing %s attribute in ScanSaving" % keyname)
```
# Custom data policy
SCAN_SAVING is a `ParametersWardrobe` which defines the data policy in the BLISS session. The activate data policy is selected in the session object (see `bliss/common/session.py`):
```python
class Session:
def _set_scan_saving_class(self, scan_saving_class):
scan_saving.set_scan_saving_class(scan_saving_class)
self.scan_saving = scan_saving.ScanSaving(self.name)
if is_bliss_shell():
self.env_dict["SCAN_SAVING"] = self.scan_saving
```
Creating a custom data policy means deriving a class from `bliss.scanning.scan_saving.BaseScanSaving`:
```python
class CustomScanSaving(BaseScanSaving):
DEFAULT_VALUES = {
# default and not removable values
"technique": "",
...
# saved properties in Redis:
"_proposal": "",
...
}
# read only attributes implemented with python properties
PROPERTY_ATTRIBUTES = [
"proposal",
...
]
REDIS_SETTING_PREFIX = "custom_scan_saving"
SLOTS = ["_custom_attr"]
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._custom_attr = None
def get(self):
try:
# calculate all parameters
except KeyError as keyname:
raise RuntimeError("Missing %s attribute in CustomScanSaving" % keyname)
```
# ESRF data policy
The ESRF data policy allows users to access their data and electronic logbook at https://data.esrf.fr. Data is registered with [ICAT](https://data.esrf.fr) and the data written in [Nexus compliant](https://www.nexusformat.org/) HDF5 files in a specific directory structure.
## Summary
To enable the ESRF data policy
1. Install and run the [Nexus writer](dev_data_nexus_server.md) to write the data in Nexus format
2. Install and run the [ICAT servers](dev_data_policy_servers.md) to communicate with ICAT
3. Enable the ESRF data policy in the BLISS session to configure the data directory structure. This is done in the beamline configuration which will contain a mixture of [data policy configuration](#configuration) and [ICAT server configuration](dev_data_policy_servers.md#enable-in-bliss):
```yaml
scan_saving:
class: ESRFScanSaving
beamline: id00
metadata_manager_tango_device: id00/metadata/test
metadata_experiment_tango_device: id00/metaexp/test
tmp_data_root: /data/{beamline}/tmp
visitor_data_root: /data/visitor
inhouse_data_root: /data/{beamline}/inhouse
```
4. Use the [data policy commands in BLISS](data_policy.md)
## Configuration
Define in the beamline configuration
* beamline name
* root directories for inhouse, visitor and tmp proposals
```yaml
scan_saving:
class: ESRFScanSaving
beamline: id00
tmp_data_root: /data/{beamline}/tmp
visitor_data_root: /data/visitor
inhouse_data_root: /data/{beamline}/inhouse
```
The [ESRF data policy](dev_data_policy_esrf.md) allows users to access their data and electronic logbook at https://data.esrf.fr. Two TANGO devices need to be installed, running and enabled for this.
## Summary
To install and use the ICAT servers
1. [Register](#installation) two TANGO devices with the TANGO database
2. [Run](#running) the two TANGO devices
3. [Enable](#enable-in-bliss) the Nexus writer in the BLISS session
## Installation
Two TANGO devices need to be registered with the TANGO database. The `MetaExperiment` server handles the proposal and the sample. The `MetadataManager` server handles the dataset. These are referred to as the ICAT servers. They will inform the ICAT database about the collected datasets during an experiment and they allow BLISS to communicate with the electronic logbook.
The registration can be done by defining server and device properties in the beamline configuration:
```yaml
- class: MetaExperiment
properties:
queueName: ...
queueURLs: ...
- class: MetadataManager
properties:
queueName: ...
queueURLs: ...
API_KEY: ...
icatplus_server: ...
- server: MetadataManager
personal_name: icatservers
device:
- tango_name: id00/metadata/test
class: MetadataManager
properties:
beamlineID: id00
dataFolderPattern: "{dataRoot}"
metaExperimentDevice: "id00/metaexp/test"
- server: MetaExperiment
personal_name: icatservers
device:
- tango_name: id00/metaexp/test
class: MetaExperiment
properties:
beamlineID: id00
```
The properties `queueName` and `queueURLs` are used to register [datasets](data_policy.md#change-dataset). The properties `icatplus_server` and `API_KEY` are used to send messages to the [electronic logbook](data_metadata.md#electronic-logbook).
## Running
The two ICAT servers can be started inside the BLISS conda environment as follows
```bash
MetaExperiment icatservers
MetadataManager icatservers
```
Note that `MetaExperiment` must be started before `MetadataManager`. At the beamline there can be multiple `MetadataManager` servers, each serving a specific technique that needs a specific set of metadata parameters to be registered with the ICAT database.
## Enable in BLISS
Add the ICAT device tango uri's in the beamline configuration
```yaml
scan_saving:
class: ESRFScanSaving
metadata_manager_tango_device: id00/metadata/test
metadata_experiment_tango_device: id00/metaexp/test
```
## MetadataManager state
The state of the MetadataManager device can be
* OFF: No experiment ongoing
* STANDBY: Experiment started, sample or dataset not specified
* ON: No dataset running
* RUNNING: Dataset is running
* FAULT: Device is not functioning correctly
Every time a scan is started, BLISS verifies that the dataset as specified in the session's `SCAN_SAVING` object is *RUNNING*. If this is not the case, BLISS will close the previous running dataset (if any) and start the new dataset.
Data produced among sessions is published into [Redis]( https://redis.io/) (RAM
storage) and in the same time written to disk in a hdf5 file ([see data
saving](scan_saving.md)).
storage).
In Redis, data is stored for a limited period of time (1 day by default) and for
a limited amount (1GB by default).
......@@ -41,7 +40,7 @@ the session is created and the scan data is stored in this directory.
As two different samples will be scanned, one sub-directory per sample will be
created. To do that, the [SCAN_SAVING](scan_saving.md#scan_saving) object has to
created. To do that, the [SCAN_SAVING](dev_data_policy_basic.md#scan_saving) object has to
be used. The data saving path is customized by adding a new parameter
'*s_name*' usable in the template of the PATH.
......
# External data processing
BLISS offers the possibility to have a separate process (on the system level) for retrieving the acquired data in order to save or process.
# Example 1: Save data in HDF5
## Example 1: Save data in HDF5
!!! note
BLISS already comes with on external writer. This is just an example.
BLISS already comes with an [HDF5 writer](dev_data_nexus_server.md). This is just an example.
The example script discussed here is provided in Bliss repository
at [scripts/external_saving_example/external_saving_example.py](https://gitlab.esrf.fr/bliss/bliss/blob/master/scripts/external_saving_example/external_saving_example.py).
To have a minimal working Bliss environment have a look at the [installation notes](installation.md#installation-outside-esrf)
......@@ -17,7 +16,7 @@ When running the script
python scripts/external_saving_example/external_saving_example.py
it is listening to new scans in the Bliss __test_session__
it is listening to new scans in the Bliss *test_session*
```python
listen_to_session_wait_for_scans("test_session")
......@@ -44,7 +43,7 @@ Using the `walk_on_new_events()` function with `filter="scan"`(limit walk to no
In the example script a new instance of the class `HDF5_Writer` is created per scan that is started. Following the
initialisation a _gevent greenlet_ is spawned to run the actual listener in a non blocking way. Inside `def run(self)`
a second iterator is started walking through all events emitted by the scan
[(see data structure section)](data_structure.md#experiment-and-redis-data-structure):
[(see data structure section)](dev_data_publishing.md#experiment-and-redis-data-structure):
```python
for event_type, node in self.scan_node.iterator.walk_events():
......@@ -55,7 +54,7 @@ a second iterator is started walking through all events emitted by the scan
- [Overview scan engine](scan_engine.md)
- [Bliss data nodes](scan_data_node.md)
- [Data structure of published data](data_structure.md)
- [Data structure of published data](dev_data_publishing.md)
Once an event is received it can be categorized by the event type:
......
# Flint Scan Plotting
Bliss plotting is done through a silx-based application called **flint**.
Bliss plotting is done through a [silx](data_vis_silx.md)-based application called **flint**.