Location of processed data
BM29, discussion with @jerome.kieffer @demariaa
Soon we will be able to send processed data to ICAT. Saving and sending processed data is not done by Bliss but BM29 and potentially other beamlines want to have raw and processed data close together while not ingesting processed data twice (which is what currently happens).
Summary of the current BLISS path template
Currently Bliss has an ESRF-wide template for raw datasets (remember, 1 dataset can have many scans, but at BM29 it is 1 scan per dataset)
{base_path}/{proposal_dirname}/{beamline}/{proposal_session_name}/{collection_name}/{collection_name}_{dataset_name}
For example
base_path = "/data/visitor"
proposal_dirname = "mx2398"
beamline = "bm29"
proposal_session_name = "20220914"
collection_name = "220914water_calib_sc" # which can be equal to the sample name but not necessarily
dataset_name = "buffer_after_water"
which gives the following path for the "raw dataset"
/data/visitor/mx2398/bm29/20220914/220914water_calib_sc/220914water_calib_sc_buffer_after_water
Location of processed data
At BM29 we currently have datasets that look like thid (1 loopscan with a lima camera in this example)
./220914water_calib_sc_buffer_after_water
├── 220914water_calib_sc_buffer_after_water.h5
├── processed
│ └── integrate
│ └── buffer_after_water0000-integrate.h5
└── scan0001
└── buffer_after_water0000.h5
The processed
directory is created by data processing software independent from Bliss.
The path of the "processed dataset" is
/data/visitor/mx2398/bm29/20220914/220914water_calib_sc/220914water_calib_sc_buffer_after_water/processed/integrate
There will be other processed datasets in the future so we will have "integrate", "hplc", ... whatever.
The problem
In the end we want 2 datasets in ICAT:
-
the raw dataset:
/data/visitor/mx2398/bm29/20220914/220914water_calib_sc/220914water_calib_sc_buffer_after_water
-
the processed dataset:
/data/visitor/mx2398/bm29/20220914/220914water_calib_sc/220914water_calib_sc_buffer_after_water/processed/integrate
The problem is that the folder of the processed dataset is INSIDE the folder of the raw dataset. This means it will be ingested TWICE.