Merge branch 'add_from_bliss_original_file_to_raw' into 'main'

Add 'from_bliss_original_file_to_raw' to tomoscan.esrf.scan.utils See merge request !238

Merge branch 'add_from_bliss_original_file_to_raw' into 'main'
647d810b · payno · 94c00df8 · 208a38b9 · 647d810b · 647d810b
Commit 647d810b authored 5 months ago by payno
--- a/doc/tutorials/publish_processed_data_to_data_portal.ipynb
+++ b/doc/tutorials/publish_processed_data_to_data_portal.ipynb
@@ -50,7 +50,10 @@
    "\n",
    "You can get the original dataset path from an instance of `TomoScanBaseInstance` by calling `get_bliss_original_files()`.\n",
    "\n",
-    "Warning: this path can contain some '/mnt/multipath-shares' prefix that should be passed to ICAT/DRAC."
+    "Warning: this path can contain some '/mnt/multipath-shares' prefix that shouldn't be passed to ICAT/DRAC. To filter this you can use the 'from_bliss_original_file_to_raw' helper function.\n",
+    "``` python\n",
+    "from tomoscan.esrf.scan.utils import from_bliss_original_file_to_raw\n",
+    "```"
   ]
  },
  {
@@ -227,4 +230,3 @@
 "nbformat": 4,
 "nbformat_minor": 5
 }
-
 %% Cell type:markdown id:287b637a-4a01-4c11-81b8-68404d4c0727 tags:

 # How to publish reconstructed volume to the data portal?

 This tutorial explains how to publish a reconstructed volume (done by nabu) to the (ESRF) data portal.

 Today the ESRF data portal catalog is based on DRAC (successor of ICAT). As switching is fresh, you should consider ICAT == DRAC as both names can be used interchangeably.

 %% Cell type:markdown id:dae0f793-0a6b-440c-8bae-d6781b144bad tags:

 ## DRAC processed dataset

 There are two types of datasets in DRAC: raw datasets (published automatically by Bliss-tomo), and processed datasets (the ones we will publish in this tutorial)

 A **DRAC processed dataset** is related to:

 * one or several raw datasets (usually one, but it can be several in the case of a stitching, for example) (*a*)
 * a set of metadata keys (voxel size, phase retrieval options, etc.) (*b*)
 * one beamline (*c*)
 * one proposal (*d*)
 * one dataset (*e*)
 * one folder (the folder containing the reconstructed volume) (*f*)

 %% Cell type:markdown id:868b2c98-0bb3-43de-b0ce-c14daa2fbbd2 tags:

 ## Retrieve all the data needed for ICAT

 %% Cell type:markdown id:a7254799-34f7-474e-9674-9a02874cd1ba tags:

 ### (*a*) `raw` parameter

 Path to the raw datasets. Source(s) of the processed dataset. It should be a tuple, but it can be a tuple of a single element.

 You can get the original dataset path from an instance of `TomoScanBaseInstance` by calling `get_bliss_original_files()`.

-Warning: this path can contain some '/mnt/multipath-shares' prefix that should be passed to ICAT/DRAC.
+Warning: this path can contain some '/mnt/multipath-shares' prefix that shouldn't be passed to ICAT/DRAC. To filter this you can use the 'from_bliss_original_file_to_raw' helper function.
+``` python
+from tomoscan.esrf.scan.utils import from_bliss_original_file_to_raw
+```

 %% Cell type:markdown id:f0b764b9-93bd-46f5-bdda-ae93f1a3af40 tags:

 ### (*b*) `metadata` parameter

 The metadata to be published to ICAT can be obtained from an instance of `VolumeBase` by calling the `build_drac_metadata` function.

 For example, for an `HDF5Volume` you can have:

 ```
 volume = HDF5Volume(
    file_path=...,
    data_path=...,
 )

 drac_metadata = volume.build_drac_metadata()
 ```

 Note: there is a tutorial on volumes for more information.

 **Warning**: at the moment, the DRAC metadata will not contain the 'Sample_name' field, which is mandatory (without it, there will be no processing done). So you will need to add it.

 ```
 drac_metadata["Sample_name"] = ...
 ```

 It can be obtained from the `TomoScanBaseInstance` by calling `scan.sample_name`.

 *Note*: Available DRAC keys are defined [here](https://gitlab.esrf.fr/icat/hdf5-master-config/-/blob/master/hdf5_cfg.xml?ref_type=heads) (see `Tomo` group, `reconstruction` section).

 %% Cell type:markdown id:2c10b63a-b52e-4cff-8dea-e3b6f9105f74 tags:

 ### (*c*) `beamline` parameter

 This is the name of the beamline, like 'bm05', 'bm18'... (in lower case)

 %% Cell type:markdown id:c7ceb617-b64f-46c9-bfa4-8fa888f50b6b tags:

 ### (*d*) `proposal` parameter

 Name of the proposal.

 %% Cell type:markdown id:45a211a4-e203-4517-8edb-1e7515beab01 tags:

 ### (*e*) `dataset` parameter

 Name of the dataset. This is the (processed) dataset in the DRAC context.

 This dataset will create a key with the folder path at the DRAC level and it must be unique.

 The default value we propose is 'reconstructed_volumes'.

 %% Cell type:markdown id:0e6ce848-cc68-47bf-bb4f-65083bbf789c tags:

 ### (*f*) `path` parameter

 This is the path to the folder containing the reconstructed volume (by Nabu).

 **Warning**: All files contained in this folder will be published to ICAT. There is no mechanism to publish a single file or a set of files.

 Here is the recommended structure if path == 'reconstructed_volumes' and for an HDF5 reconstruction:

 ```
 reconstructed_volumes
    |
    |------ nabu_rec.hdf5                                - nabu reconstructed volume master file  (1)
    |------ nabu_rec
    |         |---------- nabu_rec_0000_0256.hdf5        - nabu reconstructed volume sub file 1
    |------ gallery                                      - gallery related to the processed dataset (2)
    |         |------ screenshot_1.png
    |         |------ screenshot_2.png
    |------ nabu_cfg_files                               - folder containing nabu configuration files (3)
              |------ nabu_config.cfg
 ```
 (1) The Nabu reconstructions. It can be replaced by a folder containing a volume with .tiff files.

 (2) **Optional**. A set of images (.png or .jpg) linked to the reconstructed volume, like 3 slices along each axis.

 (3) nabu_cfg_files: location of the configuration used to obtain the volume(s). In the future, it should be used to reprocess a volume.

 %% Cell type:markdown id:28d6a1e3-2616-4112-a90d-736d6793ddd5 tags:

 ## Publication to DRAC / ICAT

 To publish a **processed dataset** to ICAT, we use [pyicat_plus](https://gitlab.esrf.fr/icat/pyicat-plus).

 %% Cell type:markdown id:eee28397-a05f-4aaa-a748-5bfa2d3e7944 tags:

 ### Instantiate the `IcatClient`

 ``` python
 from pyicat_plus.client.main import IcatClient
 icat_client = IcatClient(
    metadata_urls=("bcu-mq-01.esrf.fr:61613", "bcu-mq-02.esrf.fr:61613")
 )
 ```

 %% Cell type:markdown id:047debaa-bac2-431a-8c99-d1411aa73c71 tags:

 ### Publish to ICAT

 ``` python
 icat_client.store_processed_data(
    raw=raw,  # (a)
    metadata=metadata,  # (b)
    beamline="id16a",  # (c)
    proposal=self.inputs.proposal, # (d)
    dataset="reconstructed_volumes",
    path=path,
 )
 ```

 %% Cell type:code id:1d4bee56-b109-4f38-9b34-66171c45b661 tags:

 ``` python
 ```

--- a/tomoscan/esrf/scan/utils.py
+++ b/tomoscan/esrf/scan/utils.py
@@ -821,3 +821,22 @@ def get_series_slice(
    if start is not None:
        return slice(start, len(image_key_values), 1)
    return None
+
+
+def from_bliss_original_file_to_raw(bliss_original_file: str | None) -> str | None:
+    """
+    convert NXtomo 'bliss_original_files' to drac raw parameter (folder containing the raw)
+    without some possible noise added by 'realpath' like '/mnt/multipath-shares' or '/gpfs/easy'
+    """
+    if bliss_original_file is None:
+        return None
+
+    bliss_original_file = os.path.dirname(bliss_original_file)
+    for key in ("/gpfs/easy", "/mnt/multipath-shares"):
+        if bliss_original_file.startswith(key):
+            # no simple workaround. abspath return a path with '/mnt/multipath-shares'
+            _logger.info(
+                f"looks like raw data is given with '{key}' prefix. Drac will fail on it. Must remove it."
+            )
+            bliss_original_file.replace(key, "")
+    return bliss_original_file