Commit b956a2dd authored by Wout De Nolf's avatar Wout De Nolf
Browse files

[docs] add data policy

parent b2f29622
Detectors publish their own metadata by default. Here we describe how to add user metadata. A more flexible and presistent way to add metadata is described [here](data_metadata_dev.md).
## E-logbook
Send user message to the [e-logbook](https://data.esrf.fr)
```
DEMO [1]: lprint("user message in electronic logbook ")
```
## Scan comments
Add comments to your scans
```python
DEMO [1]: s = loopscan(10,0.1,run=False)
DEMO [2]: s.add_comment("This is a comment")
DEMO [3]: s.add_comment("This is another comment")
DEMO [4]: s.add_comment("And another one")
DEMO [4]: s.run()
```
\ No newline at end of file
Currently Bliss supports only one data format: [Nexus compliant](https://www.nexusformat.org/) HDF5 files written by the [Nexus writer](data_nexus_server.md). Here we describe the logic of this Nexus structure.
In the example below we show the file of [one dataset](data_policy.md) which contains the data of three scans:
1. `ascan(samy, 0, 9, 9, 0.1, diode1, basler1, xmap1)` where diode1 is a diode, basler1 is a camera with one ROI defined and xmap1 an MCA controller with one channel
2. unspecified scan
3. a scan with two independent subscans (for example one subscan can be a temperature monitor scan)
```
sample_dataset.h5
├ 1.1 # first scan
| ├ instrument
| | ├ samy(@NXpositioner)
| | | └ value (10) # motor positions during scan
| | ├ diode1(@NXdetector)
| | | └ data (10)
| | ├ basler1(@NXdetector)
| | | ├ data (10, 2048, 2048)
| | | ├ acq_parameters # camera metadata
| | | | └ ...
| | | └ ctrl_parameters # camera metadata
| | | └ ...
| | ├ basler1_roi1(@NXdetector)
| | | ├ data (10)
| | | ├ avg (10)
| | | ├ std (10)
| | | ├ min (10)
| | | ├ max (10)
| | | └ selection # ROI metadata
| | | └ ...
| | ├ xmap1_det0(@NXdetector)
| | | ├ data (10, 2048)
| | | ├ elapsed_time (10)
| | | ├ live_time (10)
| | | ├ dead_time (10)
| | | ├ input_counts (10)
| | | ├ input_rate (10)
| | | ├ output_counts (10)
| | | └ output_rate (10)
| | ├ positioners
| | | ├ samx (1) # motor position at start
| | | ├ samy (10) # motor positions during scan
| | | └ samz (1) # motor position at start
| | ├ start_positioners
| | | ├ samx (1) # motor position at start
| | | ├ samy (1) # motor positions at start
| | | └ samz (1) # motor position at start
| └ measurement
| ├ samy (10)
| ├ diode1 (10)
| ├ basler1 (10, 2048, 2048)
| ├ basler1_roi1 (10)
| ├ basler1_roi1_avg (10)
| ├ basler1_roi1_std (10)
| ├ basler1_roi1_min (10)
| ├ basler1_roi1_max (10)
| ├ xmap1_det0 (10, 2048)
| ├ xmap1_det0_elapsed_time (10)
| ├ xmap1_det0_live_time (10)
| ├ xmap1_det0_dead_time (10)
| ├ xmap1_det0_input_counts (10)
| ├ xmap1_det0_input_rate (10)
| ├ xmap1_det0_output_counts (10)
| └ xmap1_det0_output_rate (10)
├ 2.1 # second scan
├ 3.1 # third scan
└ 3.2 # also third scan
```
So each scan contains two groups (plots and application definitions are not shown)
* *instrument*:
* all motors moving during the scan (*NXpositioner*: distance, time, energy, ...)
* all detectors enabled for the scan (*NXdetector*)
* *start_positioners*: snapshot of all motors before the scan
* *positioners*: like start_positioners
* *measurement*: flat list of all NXpositioner and NXdetector data
Note that each *NXdetector* contains one primary value called *data* and each *NXpositioner* contains one primary value called *value*. Additional datasets and groups represent secondary detector/positioner data or metadata such as detector settings.
\ No newline at end of file
# NeXus compliant external writer
The code of this external NeXus writer is maintained by the ESRF Data Analysis Unit (DAU) to ensure seamless integration with data analysis tools provided by the DAU.
The ESRF data policy requires data to be saved in HDF5 files compliant with the [Nexus format](https://www.nexusformat.org/). The Nexus writer is a TANGO device maintained by the ESRF Data Analysis Unit (DAU) to ensure seamless integration with data analysis tools provided by the DAU.
## External Nexus writer as a Tango device
## Summary
The data writing of one BLISS session is handled by one nexus writer TANGO device.
To install and use the Nexus writer
### Register session writer with the Tango database
1. [Register](#installation) a Nexus writer for each BLISS session (*test_session* in this example) with the TANGO database:
To register the TANGO device automatically, specify its properties in the beamline configuration files
```bash
RegisterNexusWriter test_session --domain id00 --instance nexuswriters
```
```yaml
server: NexusWriter
personal_name: nexuswriters
device:
- tango_name: id00/bliss_nxwriter/test_session
class: NexusWriter
properties:
session: test_session
```
2. [Run](#running) the Nexus writer server:
The device class should always be __NexusWriter__ and the __session__ property should be the BLISS session name. If you want to register the device manually with the TANGO database, you can use a helper function to avoid mistakes (correct class name and session property, only one TANGO device per session)
```bash
NexusWriterService nexuswriters --log=info
```
```bash
$ python -m nexus_writer_service.nexus_register_writer test_session --domain id00 --instance nexuswriters
```
3. [Enable](#enable-in-bliss) the Nexus writer in the BLISS session:
```python
TEST_SESSION [1]: SCAN_SCAVING.writer = "nexus"
```
## Installation
The data writing of one BLISS session is handled by one Nexus writer TANGO device. The device has one MANDATORY property call *test_session* which must be equal to the BLISS session name. To register the device with the TANGO database you need to specify:
| | example | comment |
|----------------------|----------------|--------------------------------------------------------|
| server name | NexusWriter | you can choose but this is recommended |
| server instance name | nexuswriters | you can choose |
| server class | NexusWriter | MANDATORY!!! |
| device domain name | id00 | you can choose but typically this is the beamline name |
| device family name | bliss_nxwriter | you can choose but this is recommended |
| device member name | test_session | you can choose but typically this is the session name |
Here are three ways to register this TANGO device:
1. Installation script
In this example we registered a writer for BLISS session __test_session__ which runs under domain __id00__ in TANGO server instance __nexuswriters__. By default the device family is __bliss_nxwriter__ and the device name is equal to the session name. Running multiple session writers in on TANGO server instance (i.e. one process) is allowed but not recommended if the associated BLISS sessions may produce lots of data simultaneously.
```bash
RegisterNexusWriter test_session --domain id00 --instance nexuswriters
```
### Start the Tango server
2. Jive
A nexus writer TANGO server (which may serve different BLISS session) can be started inside the BLISS conda environment as follows
![Register Nexus writer](img/register_nxwriter1.png)
In this example we registered three Nexus writers with the same server. Specify the *session* property for each Nexus writer
![Nexus writer properties](img/register_nxwriter2.png)
3. Beacon configuration files
```yaml
server: NexusWriter
personal_name: nexuswriters
device:
- tango_name: id00/bliss_nxwriter/test_session
class: NexusWriter
properties:
session: test_session
```
## Running
A Nexus writer TANGO server (which may serve different BLISS session) can be started inside the BLISS conda environment as follows
```bash
$ NexusWriterService nexuswriters --log=info
NexusWriterService nexuswriters --log=info
```
You need to specify the instance name of the TANGO server, so __nexuswriters__ in the example.
You need to specify the instance name of the TANGO server, so *nexuswriters* in the example.
### Enable in BLISS
## Enable in BLISS
Select the external writer in the BLISS session in order to be notified of errors and register metadata generators
Select the Nexus writer in the BLISS session
```python
SCAN_SCAVING.writer = "nexus"
TEST_SESSION [1]: SCAN_SCAVING.writer = "nexus"
```
BLISS will discover the external writer automatically. Note that if you disable the writer but have the TANGO server running, data will be saved but the BLISS session is unaware of it.
BLISS will discover the Nexus writer automatically. The scan will stop when the writer throws an exception.
## Session writer state
### Session writer status
The status of the TANGO device serving a BLISS session can be
The state of the TANGO device serving a BLISS session can be
* INIT: initializing (not accepting scans)
* ON: accepting scans (without active scan writers)
......@@ -60,9 +95,9 @@ The status of the TANGO device serving a BLISS session can be
When the server stays in the INIT state you can try calling the TANGO devices's "init" method. This can happen when the connection to beacon fails in the initialization stage. When in the OFF state, use the TANGO devices's "start" method. To stop accepting new scans, use the TANGO devices's "stop" method.
### Scan writer status
## Scan writer state
Each session writer launches a separate scan writer which saves the data of a particular scan (subscans are handled by the same scan writer). The scan writer status can be
Each session writer launches a separate scan writer which saves the data of a particular scan (subscans are handled by the same scan writer). The scan writer state can be
* INIT: initializing (not accepting data yet)
* ON: accepting data
......@@ -73,48 +108,26 @@ The final state will always be OFF (finished succesfully) or FAULT (finished uns
When the state is ON while the scan is finished, the writer did not received the "END_SCAN" event. You can stop the writer with the TANGO devices's "stop_scan" method. This gracefully finalizes the writing. As a last resort you can invoke the "kill_scan" method which might result in incomplete or even corrupt data (when it is executing a write operation while you kill it).
### Concurrent writing
Scans run in parallel and multi-to-master scans will cause the writer to create and modify multiple NXentry groups in the same HDF5 file concurrently.
## Concurrent writing
To protect against multiple writers listening to the same session (and therefore writing the same data) BLISS verifies whether only one writer is listening to the current BLISS session before starting a scan. If multiple writers are active nevertheless, each writer checks whether the NXentry exists before trying to create it at the start of the scan. If it exists, the writer goes in the FAULT state and it will not try to write the data of the (sub)scan associated with this NXentry. This checking relies on "h5py.File.create_group" which is not an atomic operation so not bulletproof.
Each scan writer holds the HDF5 file open in append mode for the duration of the scan. The HDF5 file is [locked](https://support.hdfgroup.org/HDF5/docNewFeatures/SWMR/Design-HDF5-FileLocking.pdf) which means that
### Concurrent reading
* The HDF5 file cannot be accessed during the scan unless you [bypass](#concurrent-reading) the file lock.
Each scan writer holds the HDF5 file open in append mode for the duration of the scan. HDF5 file locking is disabled. Flushing is done regularly so readers can see the latest changes.
* If the HDF5 file is opened and locked by other software, new data cannot be written to this file which will prevent scans from starting: you will get a "file locked" exception in BLISS.
!!! warning
A reader should never open the HDF5 file in append mode. Even when only performing read operation, this will result in a corrupted file!
### File permissions
Flushing is done regularly so [readers](#concurrent-reading) can see the latest changes. Data from scans running in parallel and multi-top-master scans will writting concurrently.
The HDF5 file and parent directories are created by the TANGO server and are therefore owned by the user under which the server process is running. Subdirectories are created by the BLISS session (e.g. directories for lima data) and are therefore owned by the user under which the BLISS session is running. Files in those subdirectories are created by the device servers and are therefore owned by their associated users.
## Concurrent reading
## External Nexus writer as a Python process
To read the HDF5 file during a scan, open it in read-only mode while bypassing the file lock.
!!! warning
This is intended for testing and should not be used in production. Caution: you may start more than one writer per session trying to write the same data. BLISS in unaware of writers started this way.
### Start the writer process
A session writer process (which serves one BLISS session) can be started inside the BLISS conda environment as follows
```bash
$ NexusSessionWriter test_session --log=info
```
A reader should never open the HDF5 file in append mode (which is the default in `h5py`). Even when only performing read operations, this will result in a corrupted file!
### Enable in BLISS
To allow for a proper Nexus structure, add these lines to the session's user script (strongly recommended but not absolutely necessary):
```python
from nexus_writer_service import metadata
metadata.register_all_metadata_generators()
```
!!! warning
A reader which locks the HDF5 file (this happens by default, even in read-only mode) will prevent the Nexus writer from accessing the file and scans in BLISS will be prevented from starting!
The internal BLISS writer needs to be enabled in case you do not want to register the metadata generators
## File permissions
```python
SCAN_SAVING.writer = 'hdf5'
```
The HDF5 file and parent directories are created by the TANGO server and are therefore owned by the user under which the server process is running. Subdirectories are created by the BLISS session (e.g. directories for lima data) and are therefore owned by the user under which the BLISS session is running. Files in those subdirectories are created by the device servers and are therefore owned by their associated users.
A data policy determines data structure (file format and directory structure) and registeration of data collection with external services. BLISS comes with two data policies
1. The [ESRF data policy](#esrf-data-policy) which allows users to access their data and electronic logbook at https://data.esrf.fr. The data is written in [Nexus compliant](https://www.nexusformat.org/) HDF5 files in a specific directory structure.
2. The [basic data policy](#basic-data-policy) does not impose a data directory structure or register data with any external service. Data can (but does not have to be) written in [Nexus compliant](https://www.nexusformat.org/) HDF5 files.
Installation and configuration of the [ESRF](data_policy_dev_esrf.md) and [basic](data_policy_dev_basic.md) data policy in a BLISS session can be found elsewhere as well as how to create [custom](data_policy_dev_custom.md) data policies. Below we describe how to use the data policies.
## ESRF data policy
This data policy requires the user to specify *proposal*, *sample* and *dataset*. This will completely define how data is organized.
### Change proposal
```
DEMO [1]: newproposal("blc123")
Proposal set to 'blc123`
Data path: /data/id00/inhouse/blc123/id00/sample/sample_0001
```
When no proposal name is given, the default proposal is inhouse proposal `{beamline}{yymm}`. For example at ID21 in January 2020 the default proposal name is `id212001`.
The data root directory is derived from the proposal name
* no name given: `/data/{beamline}/inhouse/`
* *ih** and *blc**: `/data/{beamline}/inhouse/`
* *test**, *tmp** or *temp**: `/data/{beamline}/tmp/`
* all other names: `/data/visitor/`
These root path can be [configured](data_policy_dev_esrf.md#configuration) but these are the defaults.
### Change sample
```
DEMO [2]: newsample("sample1")
Sample set to 'sample1`
Data path: /data/id00/inhouse/blc123/id00/sample1/sample1_0001
```
When no sample name is given, the default sample name "sample" is used. Note that you can always come back to an existing sample.
### Change dataset
#### Named datasets
```
DEMO [3]: newdataset("area1")
Dataset set to 'area1`
Data path: /data/id00/inhouse/blc123/id00/sample1/sample1_area1
```
When the dataset already exists the name will be automatically incremented ("area1_0002", "area1_0003", ...). Note that you can never come back to the same dataset after you changed dataset.
#### Unnamed datasets
```
DEMO [4]: newdataset()
Dataset set to '0002`
Data path: /data/id00/inhouse/blc123/id00/sample1/sample1_0002
```
The dataset will be named automatically "0001", "0002", ... The dataset number is independent for each sample. Note that you can never come back to the same dataset after you changed dataset.
### Policy state
To get an overview of the current state of the data policy
```
DEMO [5]: SCAN_SAVING
Out [5]: Parameters (default) -
.user_name = 'denolf'
.images_path_template = 'scan{scan_number}'
.images_prefix = '{img_acq_device}_'
.date_format = '%Y%m%d'
.scan_number_format = '%04d'
.dataset_number_format = '%04d'
.technique = ''
.session = 'demo'
.date = '20200208'
.scan_name = '{scan_name}'
.scan_number = '{scan_number}'
.img_acq_device = '<images_* only> acquisition device name'
.writer = 'nexus'
.data_policy = 'ESRF'
.template = '{proposal}/{beamline}/{sample}/{sample}_{dataset}'
.beamline = 'id00'
.proposal = 'blc123'
.proposal_type = 'inhouse'
.base_path = '/data/id00/inhouse'
.sample = 'sample1'
.dataset = '0001'
.data_filename = '{sample}_{dataset}'
.images_path_relative = True
.creation_date = '2020-02-08-12:09'
.last_accessed = '2020-02-08-12:12'
-------------- --------- -------------------------------------------------------------------
exists filename /data/id00/inhouse/blc123/id00/sample1/sample1_0001/sample1_0001.h5
exists directory /data/id00/inhouse/blc123/id00/sample1/sample1_0001
Metadata RUNNING Dataset is running
-------------- --------- -------------------------------------------------------------------
```
## Basic data policy
This data policy requires the user to use the [`SCAN_SAVING`](data_policy_dev_basic.md#scan_saving) object directly to define where the data will be saved. The data location is completely determined by specifying *base_path*, *template*, *data_filename* and *writer*
```
DEMO [1]: SCAN_SAVING.base_path = "/tmp/data"
DEMO [2]: SCAN_SAVING.writer = "nexus"
DEMO [3]: SCAN_SAVING.template = "{date}/{session}/{mysubdir}"
DEMO [4]: SCAN_SAVING.date_format = "%y%b"
DEMO [5]: SCAN_SAVING.add("mysubdir", "sample1")
DEMO [6]: SCAN_SAVING.data_filename = "scan{scan_number}"
DEMO [7]: SCAN_SAVING.filename
Out [7]: '/tmp/data/20Feb/demo/sample1/scan{scan_number}.h5'
```
Note that each attribute can be a template string to be filled with other attributes from the [`SCAN_SAVING`](data_policy_dev_basic.md#scan_saving) object.
### Policy state
To get an overview of the current state of the data policy
```
DEMO [8]: SCAN_SAVING
Out [8]: Parameters (default) -
.base_path = '/tmp/data'
.data_filename = 'scan{scan_number}'
.user_name = 'denolf'
.template = '{date}/{session}/{mysubdir}'
.images_path_relative = True
.images_path_template = 'scan{scan_number}'
.images_prefix = '{img_acq_device}_'
.date_format = '%y%b'
.scan_number_format = '%04d'
.mysubdir = 'sample1'
.session = 'demo'
.date = '20Feb'
.scan_name = '{scan_name}'
.scan_number = '{scan_number}'
.img_acq_device = '<images_* only> acquisition device name'
.writer = 'nexus'
.data_policy = 'None'
.creation_date = '2020-02-08-12:04'
.last_accessed = '2020-02-08-12:05'
-------------- --------- -----------------------------------------------------------------
exists filename /tmp/data/20Feb/demo/sample1/scan{scan_number}.h5
exists directory /tmp/data/20Feb/demo/sample1
-------------- --------- -----------------------------------------------------------------
```
# Basic data policy
## Architecture
This policy is meant for testing only. It does not enforce data structure (file format)
![Screenshot](img/scan_data_flow_path.svg)
## Summary
To enable the basic data policy
1. Install and run the [Nexus writer](data_nexus_server.md) to write the data in Nexus format (optional)
2. Specify file directory and name using the [SCAN_SAVING](#scan_saving) object in the BLISS session
## SCAN_SAVING
......@@ -64,7 +71,7 @@ a dictionary, whose key `root_path` is the final path to scan files.
- `root_path`: `base_path` + interpolated template
- `data_path`: fullpath for the *data file* without the extension.
- `images_path`: path where image device should save (Lima)
- `parent`: parent node for publishing data via Redis
- `db_path_items`: used to create parent node for publishing data via Redis
- `writer`: Data file writer object.
!!! note
......@@ -75,9 +82,7 @@ a dictionary, whose key `root_path` is the final path to scan files.
#### SCAN_SAVING writer
`.writer` is a special member of `SCAN_SAVING`; it indicates which
writer to use for saving data. BLISS only supports the HDF5 file
format for scan data, although more writers could be added to the
project later.
writer to use for saving data. BLISS supports `"hdf5"` (internal writer in BLISS), `"nexus"` (the [Nexus writer](data_nexus_server.md)) and `"null"` (writing disabled).
### Configuration example
......@@ -146,42 +151,3 @@ DEMO [13]: SCAN_SAVING.user_name='toto'
DEMO [14]: SCAN_SAVING.get_path()
Out [14]: '/data/visitor/unknown/lysozyme'
```
### Programers note
SCAN_SAVING is a `ParametersWardrobe`.
from `bliss/common/session.py`:
```python
class Session:
[...]
def setup(self, env_dict=None, verbose=False):
[...]
env_dict["SCAN_SAVING"] = ScanSaving(self.name)
```
from `bliss/scanning/scan.py`:
```python
class ScanSaving(ParametersWardrobe):
SLOTS = []
WRITER_MODULE_PATH = "bliss.scanning.writer"
[...]
def __init__(self, name=None):
[...]
_default_values = {
"base_path": "/tmp/scans",
"data_filename": "data",
[...]
def get(self):
try:
# calculate all parameters
except KeyError as keyname:
raise RuntimeError("Missing %s attribute in ScanSaving" % keyname)
```
# Custom data policy
SCAN_SAVING is a `ParametersWardrobe` which defines the data policy in the BLISS session. The activate data policy is selected in the session object (see `bliss/common/session.py`):
```python
class Session:
def _set_scan_saving_class(self, scan_saving_class):
scan_saving.set_scan_saving_class(scan_saving_class)
self.scan_saving = scan_saving.ScanSaving(self.name)
if is_bliss_shell():
self.env_dict["SCAN_SAVING"] = self.scan_saving
```
Creating a custom data policy means deriving a class from `bliss.scanning.scan_saving.BaseScanSaving`:
```python
class CustomScanSaving(BaseScanSaving):
DEFAULT_VALUES = {
# default and not removable values
"technique": "",
...
# saved properties in Redis:
"_proposal": "",
...
}
# read only attributes implemented with python properties
PROPERTY_ATTRIBUTES = [
"proposal",
...
]
REDIS_SETTING_PREFIX = "custom_scan_saving"
SLOTS = ["_custom_attr"]
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._custom_attr = None
def get(self):
try:
# calculate all parameters
except KeyError as keyname:
raise RuntimeError("Missing %s attribute in CustomScanSaving" % keyname)
```
# ESRF data policy
The ESRF data policy allows users to access their data and electronic logbook at https://data.esrf.fr. Data is registered with [ICAT](https://data.esrf.fr) and the data written in [Nexus compliant](https://www.nexusformat.org/) HDF5 files in a specific directory structure.
## Summary
To enable the ESRF data policy
1. Install and run the [Nexus writer](data_nexus_server.md) to write the data in Nexus format
2. Install and run the [ICAT servers](data_policy_servers.md) to communicate with ICAT
3. Enable the ESRF data policy in the BLISS session to configure the data directory structure. This is done in the beamline configuration which will contain a mixture of [data policy configuration](#configuration) and [ICAT server configuration](data_policy_servers.md#enable-in-bliss):
```yaml
scan_saving:
class: ESRFScanSaving
beamline: id00
metadata_manager_tango_device: id00/metadata/test
metadata_experiment_tango_device: id00/metaexp/test
tmp_data_root: /data/{beamline}/tmp
visitor_data_root: /data/visitor
inhouse_data_root: /data/{beamline}/inhouse
```
4. Use the [data policy commands in BLISS](data_policy.md)
## Configuration
Define in the beamline configuration
* beamline name
* root directories for inhouse, visitor and tmp proposals
```yaml
scan_saving:
class: ESRFScanSaving
beamline: id00
tmp_data_root: /data/{beamline}/tmp
visitor_data_root: /data/visitor
inhouse_data_root: /data/{beamline}/inhouse
```
The [ESRF data policy](data_policy_dev_esrf.md) allows users to access their data and electronic logbook at https://data.esrf.fr. Two TANGO devices need to be installed, running and enabled for this.
## Summary
To install and use the ICAT servers
1. [Register](#installation) two TANGO devices with the TANGO database
2. [Run](#running) the two TANGO devices
3.