Skip to content

Resolve "use XAS scan tags in CSV extraction"

Wout De Nolf requested to merge 1-use-xas-scan-tags-in-csv-extraction into main

Closes #1 (closed)

mainh5path  = "/data/visitor/es1473/id21/20240502/RAW_DATA/es1473_id21.h5" # Path and name of the main h5 file
outpath     = "/tmp/wdntest/" # Path to save all the ouput data ("." is the current working directory)
matchers    = ["CuO_1"]  # Filter to fit files with the matching name (e.g. "sample_1" or "region3"). Leave empty ("") to fit all
elemdet     = "fx2_det0_CuKa"  # Select the element and detector with data. Trans: "idet"; Fluor: "fx2_CuKa" fx2_MnKa
elemdetnorm = "CuKa_corr_norm0"  # Select the element and counter with the normalization and dead time correction. Trans: "absorp1"; Fluor: "CuKa_corr_norm0"
energycntr  = "energy_enc"  # Select the energy counter 

save_singles  = "no"  # Write "yes" to save each XANES spectra in a single .dat file for each of them "no" otherwise
save_average  = "no"  # Write "yes" to create files with the average of poi with several scans "no" otherwise
save_orange   = "yes"  # Write "yes" to create a single .csv file with all XANES spectrum "no" otherwise
orange_name   = "FluorescenceCu_es1473"  # Name for the orange output file
save_excel    = "no"  # Write "yes" to create a single .csv file with all XANES spectrum in columns layout "no" otherwise
excel_name    = ""  # Name for the excel output file
add_positions = "yes"  # Write "yes" to add the motor positions at the end of the orange and excel files and "Jump" value "no" otherwise

add_tags      = "yes"  # Write "yes" to add the sample and sub-sample tags to the CSV data, "no" otherwise

The Orange data frame looks like this

                      Energy values                       0                       1  ...       303                304                 305
0                       Energy(keV)                    8.95                  8.9505  ...      samz              sampy               sampz
1   ChiBSACuO_1_poi85791_100436_2.1   2.144796945249792e-05  0.00027039835967743517  ...  25.54608             67.148              28.435
2   ChiBSACuO_1_poi85792_100437_2.1   4.389032048838099e-05    0.000486483485934694  ...  25.54608             66.835             44.0099
3   ChiBSACuO_1_poi85793_100435_2.1  3.1869227505901145e-05   0.0003511254917898989  ...  25.54608  72.02799999999999              39.023
4   ChiBSACuO_1_poi85794_100434_2.1   5.762484587347931e-05   0.0006452849531306613  ...  25.54608  72.02799999999999   35.97000000000001
..                              ...                     ...                     ...  ...       ...                ...                 ...
59        CuO_1_poi85328_100070_2.1  1.0843628778200935e-05   9.122110089140406e-05  ...  24.64136             17.564              36.335
60        CuO_1_poi85329_100071_2.1   6.694355212519426e-06   8.503025209066485e-05  ...  24.64136             63.696              53.002
61        CuO_1_poi85330_100072_2.1   4.714904145736826e-05  0.00013355211261596354  ...  24.64136             41.742  49.946000000000005
62        CuO_1_poi85331_100073_2.1  6.0927169066538775e-06   7.436683114408147e-05  ...  24.64136             57.443              49.391
63        CuO_1_poi85332_100074_2.1   1.001690612073319e-05  0.00011880896946489141  ...  24.64136            46.0491              39.529

[64 rows x 307 columns]

Now it looks like this (removed empty column before motor columns, scan column is now an index and added two extract indices)

                                                                  8.95    8.9505     8.951  ...      samz    sampy    sampz
scan                            sample_tags subsample_tags                                  ...                            
ChiBSACuO_1_poi85791_100436_2.1 -           hotspot,surface   0.000021  0.000270  0.000319  ...  25.54608  67.1480  28.4350
ChiBSACuO_1_poi85792_100437_2.1 -           hotspot,surface   0.000044  0.000486  0.000484  ...  25.54608  66.8350  44.0099
ChiBSACuO_1_poi85793_100435_2.1 -           hotspot,surface   0.000032  0.000351  0.000372  ...  25.54608  72.0280  39.0230
ChiBSACuO_1_poi85794_100434_2.1 -           hotspot,surface   0.000058  0.000645  0.000537  ...  25.54608  72.0280  35.9700
ChiBSACuO_1_poi85795_100438_2.1 -           diffused,surface  0.000025  0.000306  0.000349  ...  25.54608  71.0100  32.4070
...                                                                ...       ...       ...  ...       ...      ...      ...
CuO_1_poi85328_100070_2.1       -           -                 0.000011  0.000091  0.000082  ...  24.64136  17.5640  36.3350
CuO_1_poi85329_100071_2.1       -           -                 0.000007  0.000085  0.000094  ...  24.64136  63.6960  53.0020
CuO_1_poi85330_100072_2.1       -           -                 0.000047  0.000134  0.000121  ...  24.64136  41.7420  49.9460
CuO_1_poi85331_100073_2.1       -           -                 0.000006  0.000074  0.000077  ...  24.64136  57.4430  49.3910
CuO_1_poi85332_100074_2.1       -           -                 0.000010  0.000119  0.000105  ...  24.64136  46.0491  39.5290

[63 rows x 305 columns]

In the original table "Energy values" and "Energy(keV)" were nonsensical so I changed it in addition to supporting tags.

Feature Decomposition terminology in Machine Learning:

Classification Term Orange Term Pandas DataFrame CSV Table Description
Sample Instance data row XANES scan
Feature Variable columns column XANES energy or motor position
Meta Feature Meta Variable index special column scan name, sample tags or subsample tags

Edit: The Orange3 CVS importer somehow understands that the first row of the CSV are the feature names and the scan column is a meta variables. It does not understand that "sample_tags" and "subsample_tags" are also meta variables.

image

Edited by Wout De Nolf

Merge request reports