Data Management Support
Introduction
At the ESRF experiments are usually carried out in the scope of a proposal. There are different kind or proposals like peer-review, industrials, inhouse, commisioning, etc...
One proposal has got one or several sessions. A session is a time slot allocated in a beamline, in some cases an A-form has to be submitted to SMIS defining samples and users. A session can also be defined by the tuple: proposal, beamline and date.
Users perform their experiments in a predefined location usually /data/visitor/{proposal}/{beamline} or in case of inhouse research is /data/inhouse
It is up to them to organize their data within that folder the way they consider convenient.
Today's Data Management Plan
For /data/visitor data is kept on disk during 50 days and then backed up for 2 years. Beyond that period data is not any longer kept by the ESRF. There is no common plan about how to organize data within a proposal and in most of the cases there is not an automatic mechanism for capturing a description of the dataset and/or the associated metadata.
For /data/inhouse data is backed up for 6 months only BUT it is not automatically removed. It is kept until someone removes it or disk crashes.
Data Policy
ESRF endorsed a data policy in November 2015 and the current situation is not longer valid.
With the data policy the ESRF becomes the custodian of the data and it's responsible for their curation and preservation. Besides, non-proprietary research will be made public in 3 years time. It does include data and metadata.
Then ESRF is engaged to curate and preserve data and metadata as well as develop the mechanisms to make it public at some point.
In order to do so, ESRF has develop several tools that allow to store the metadata in ICAT and send data to the ESRF's archive system that today is based on tapes. Two Tango devices have been developed and are called MetadataManager and MetaExperiment which are the link between the sofware that performs the experiment, like BLISS, and ICAT.
Goal
Main aim is to be capable with BLISS to manage data and metadata in the scope of a proposal in a coherent, seamless and efficient way among the whole ESRF. It means allowing users to setup some configuration parameters for their experiment and then capturing automatically a predefined set of values of metadata as well as data.
Requirements
BLISS should allow to work with and without data management. In case "data management" is enabled then some parameters will need to be mandatory in order to identify the proposal.
Parameters
BLISS should store as well as provide an API to resolve and read parameters
Mandatory
BLISS should manage by offering a API the next mandatory parameters:
proposal: name of the active proposal
sample: name of the sample that is currently mounted
sample description: description of the sample
dataset: name of the dataset that is currently being collected
technique: name of the technique. For instance: EXAFS, SAXS, etc..
location: a folder where data is collected.
Location
Location and dataset should be a 1 to 1 relation.
Location could be formally be composed by remplacing values from a given pattern, for instance:
pattern = {dataRoot}/{proposal}/{beamlineName}/{sample}/{dataset}
location = /data/visitor/XXYYYY/[ID|BM][00..99]/sampleName/datasetName
In this context dataRoot could be defined as the root of the proposal. For instance: [/data/visitor | /data/inhouse]
Optional Parameters
Additionaly, a list of metadata describing a dataset can be provided in a configuration resource. This list contains a key-value pair list.
Key is a reserved key word that is nexus-like complaint and defined in the scope of this project
Value might be of type:
- String. For instance: "16 bunch"
- Link to a tango device. For instance: orion:10000/fe/id/21/SR_Filling_Mode
- Empty is value is value will be push dinamically during a data collection
This is an example of a valid configuration file:
InstrumentDetector01Positioners_name = istopy istopz
InstrumentSource_mode = orion:10000/fe/id/21/SR_Filling_Mode
InstrumentDetector01Positioners_value
Events
BLISS should also implement the next triggers for events:
On Proposal Changed
On Sample Changed
On Dataset started
On Dataset finished
On Dataset aborted
Ideally a mechanism would be put in place to listen to only these events and not the full redis tree.
Example of Use case
Next diagram represents how user interacts with the system. It might not be the only allowed flow: