Resolve "use XAS scan tags in CSV extraction"
Closes #1 (closed)
mainh5path = "/data/visitor/es1473/id21/20240502/RAW_DATA/es1473_id21.h5" # Path and name of the main h5 file
outpath = "/tmp/wdntest/" # Path to save all the ouput data ("." is the current working directory)
matchers = ["CuO_1"] # Filter to fit files with the matching name (e.g. "sample_1" or "region3"). Leave empty ("") to fit all
elemdet = "fx2_det0_CuKa" # Select the element and detector with data. Trans: "idet"; Fluor: "fx2_CuKa" fx2_MnKa
elemdetnorm = "CuKa_corr_norm0" # Select the element and counter with the normalization and dead time correction. Trans: "absorp1"; Fluor: "CuKa_corr_norm0"
energycntr = "energy_enc" # Select the energy counter
save_singles = "no" # Write "yes" to save each XANES spectra in a single .dat file for each of them "no" otherwise
save_average = "no" # Write "yes" to create files with the average of poi with several scans "no" otherwise
save_orange = "yes" # Write "yes" to create a single .csv file with all XANES spectrum "no" otherwise
orange_name = "FluorescenceCu_es1473" # Name for the orange output file
save_excel = "no" # Write "yes" to create a single .csv file with all XANES spectrum in columns layout "no" otherwise
excel_name = "" # Name for the excel output file
add_positions = "yes" # Write "yes" to add the motor positions at the end of the orange and excel files and "Jump" value "no" otherwise
add_tags = "yes" # Write "yes" to add the sample and sub-sample tags to the CSV data, "no" otherwise
The Orange data frame looks like this
Energy values 0 1 ... 303 304 305
0 Energy(keV) 8.95 8.9505 ... samz sampy sampz
1 ChiBSACuO_1_poi85791_100436_2.1 2.144796945249792e-05 0.00027039835967743517 ... 25.54608 67.148 28.435
2 ChiBSACuO_1_poi85792_100437_2.1 4.389032048838099e-05 0.000486483485934694 ... 25.54608 66.835 44.0099
3 ChiBSACuO_1_poi85793_100435_2.1 3.1869227505901145e-05 0.0003511254917898989 ... 25.54608 72.02799999999999 39.023
4 ChiBSACuO_1_poi85794_100434_2.1 5.762484587347931e-05 0.0006452849531306613 ... 25.54608 72.02799999999999 35.97000000000001
.. ... ... ... ... ... ... ...
59 CuO_1_poi85328_100070_2.1 1.0843628778200935e-05 9.122110089140406e-05 ... 24.64136 17.564 36.335
60 CuO_1_poi85329_100071_2.1 6.694355212519426e-06 8.503025209066485e-05 ... 24.64136 63.696 53.002
61 CuO_1_poi85330_100072_2.1 4.714904145736826e-05 0.00013355211261596354 ... 24.64136 41.742 49.946000000000005
62 CuO_1_poi85331_100073_2.1 6.0927169066538775e-06 7.436683114408147e-05 ... 24.64136 57.443 49.391
63 CuO_1_poi85332_100074_2.1 1.001690612073319e-05 0.00011880896946489141 ... 24.64136 46.0491 39.529
[64 rows x 307 columns]
Now it looks like this (removed empty column before motor columns, scan column is now an index and added two extract indices)
8.95 8.9505 8.951 ... samz sampy sampz
scan sample_tags subsample_tags ...
ChiBSACuO_1_poi85791_100436_2.1 - hotspot,surface 0.000021 0.000270 0.000319 ... 25.54608 67.1480 28.4350
ChiBSACuO_1_poi85792_100437_2.1 - hotspot,surface 0.000044 0.000486 0.000484 ... 25.54608 66.8350 44.0099
ChiBSACuO_1_poi85793_100435_2.1 - hotspot,surface 0.000032 0.000351 0.000372 ... 25.54608 72.0280 39.0230
ChiBSACuO_1_poi85794_100434_2.1 - hotspot,surface 0.000058 0.000645 0.000537 ... 25.54608 72.0280 35.9700
ChiBSACuO_1_poi85795_100438_2.1 - diffused,surface 0.000025 0.000306 0.000349 ... 25.54608 71.0100 32.4070
... ... ... ... ... ... ... ...
CuO_1_poi85328_100070_2.1 - - 0.000011 0.000091 0.000082 ... 24.64136 17.5640 36.3350
CuO_1_poi85329_100071_2.1 - - 0.000007 0.000085 0.000094 ... 24.64136 63.6960 53.0020
CuO_1_poi85330_100072_2.1 - - 0.000047 0.000134 0.000121 ... 24.64136 41.7420 49.9460
CuO_1_poi85331_100073_2.1 - - 0.000006 0.000074 0.000077 ... 24.64136 57.4430 49.3910
CuO_1_poi85332_100074_2.1 - - 0.000010 0.000119 0.000105 ... 24.64136 46.0491 39.5290
[63 rows x 305 columns]
In the original table "Energy values" and "Energy(keV)" were nonsensical so I changed it in addition to supporting tags.
Feature Decomposition terminology in Machine Learning:
Classification Term | Orange Term | Pandas DataFrame | CSV Table | Description |
---|---|---|---|---|
Sample | Instance | data | row | XANES scan |
Feature | Variable | columns | column | XANES energy or motor position |
Meta Feature | Meta Variable | index | special column | scan name, sample tags or subsample tags |
Edit: The Orange3 CVS importer somehow understands that the first row of the CSV are the feature names and the scan column is a meta variables. It does not understand that "sample_tags" and "subsample_tags" are also meta variables.
Edited by Wout De Nolf