Location of processed data

BM29, discussion with @jerome.kieffer @demariaa

Soon we will be able to send processed data to ICAT. Saving and sending processed data is not done by Bliss but BM29 and potentially other beamlines want to have raw and processed data close together while not ingesting processed data twice (which is what currently happens).

Summary of the current BLISS path template

Currently Bliss has an ESRF-wide template for raw datasets (remember, 1 dataset can have many scans, but at BM29 it is 1 scan per dataset)

{base_path}/{proposal_dirname}/{beamline}/{proposal_session_name}/{collection_name}/{collection_name}_{dataset_name}

For example

base_path = "/data/visitor"
proposal_dirname = "mx2398"
beamline = "bm29"
proposal_session_name = "20220914"
collection_name = "220914water_calib_sc"  # which can be equal to the sample name but not necessarily
dataset_name = "buffer_after_water"

which gives the following path for the "raw dataset"

/data/visitor/mx2398/bm29/20220914/220914water_calib_sc/220914water_calib_sc_buffer_after_water

Location of processed data

At BM29 we currently have datasets that look like thid (1 loopscan with a lima camera in this example)

./220914water_calib_sc_buffer_after_water
 ├── 220914water_calib_sc_buffer_after_water.h5
 ├── processed
 │   └── integrate
 │       └── buffer_after_water0000-integrate.h5
 └── scan0001
     └── buffer_after_water0000.h5

The processed directory is created by data processing software independent from Bliss.

The path of the "processed dataset" is

/data/visitor/mx2398/bm29/20220914/220914water_calib_sc/220914water_calib_sc_buffer_after_water/processed/integrate

There will be other processed datasets in the future so we will have "integrate", "hplc", ... whatever.

The problem

In the end we want 2 datasets in ICAT:

the raw dataset:

/data/visitor/mx2398/bm29/20220914/220914water_calib_sc/220914water_calib_sc_buffer_after_water
the processed dataset:

/data/visitor/mx2398/bm29/20220914/220914water_calib_sc/220914water_calib_sc_buffer_after_water/processed/integrate

The problem is that the folder of the processed dataset is INSIDE the folder of the raw dataset. This means it will be ingested TWICE.

Edited Sep 16, 2022 by Wout De Nolf