speedup streamline
Closes #44 (closed)
Streamline overheads considered
- QR-reader tuning: 5 seconds
- Exposure/attenuator optimization: 3 sec per sample when using a
ct
(0.8 sec when using anascan
withsave=False
) - ID31 scan overhead (p3 initialization + metadata gathering): 1 to 2 sec for an
sct
- In addition overheads with
newsample
andenddataset
were discovered (see below, ~1 sec each when ~2500 datasets) - Is the P3 saving temporary EDF files?
The current situation
- QR-reader tuning whenever we read "QRCODE_NOT_READABLE" (for example when there is no QR code)
- exposure/attenuator optimization for each sample individually with a ct (and making sure we base the calculation on a pixel value that is within the dynamic range of the camera and not too low)
This MR
Individual tuning and exposure/attenuator optimization is the most robust.
This MR adds streamline_scanner
options to exchange robustness for speed when desired:
SIXC [5]: streamline_scanner
Robust vs. speed:
verify_qrcode False
autotune_qrreader_per 'baguette'
optimize_exposure_per 'baguette'
Testing:
dryrun False
-
autotune_qrreader_per
:-
None
: never tune -
"sample"
: tune when a sample QR code is "QRCODE_NOT_READABLE" -
"baguette"
: tune one time after loading when QR code is "QRCODE_NOT_READABLE" for the first (or second or third or ... it looks for the first successful read)
-
-
optimize_exposure_per
:-
None
: keep the current attenuator and measure for the requested time (1 sec by default) -
"sample"
: determine optimal attenuator and exposure time for each sample individually to reach a certain max pixel value -
"baguette"
: determine optimal attenuator and exposure time for all samples with a single scan at fixed attenuator position
-
-
verify_qrcode
:-
False
: read QR code only once (when moving to a hole) -
True
: verify the QR-code before and after each measurement. QR-codes are read 3 times per sample when enabled (autotune_qrreader_per
applies to the two additional reads as well)
-
-
dryrun
:-
False
: scan normally (i.e. save data + trigger workflows) -
True
: scan without the actual measurement to check the total overhead (i.e. the actualsct
that saves data + triggers a workflow)
-
Profiling
For the tests I used a baguette with one missing QR code and measured the average time per sample for a full baguette run.
Profiling commands are
SIXC [1]: user_script_load("/users/blissadm/local/xrpd/blissprofile/id31_streamline.py")
SIXC [2]: user.timeit_holder_scan() # takes 2.5h to complete
SIXC [3]: user.timeit_holder_scan_original(dryrun=False)
SIXC [4]: user.timeit_holder_scan_new(dryrun=False)
SIXC [5]: user.profile_holder_scan()
SIXC [6]: user.timeit_optimize_exposure() # ascan vs. individual ct optimization
REMARK: as time goes on, the system becomes slower. For ~2500 datasets a newdataset()
call takes almost 1 sec blissstats_opid31_pstat_newdataset.pyprof. So do not look at the absolute times but the different between different settings.
These are the most important profiling results
dryrun verify_qrcode autotune_qrreader_per optimize_exposure_per time (sec/sample)
-------- --------------- ----------------------- ----------------------- -------------------
False True sample sample 12.038 ± 0.593 (original)
False False baguette baguette 6.766 ± 0.037 (new)
True True sample sample 6.444 ± 0.127 (original overhead only)
True False baguette baguette 2.373 ± 0.051 (new overhead only)
True False - - 1.014 ± 0.003 (raw overhead)
- original: like it was before
- new: like it is with the new default options
- raw: just baguette moving, QR-code reading and data policy commands
So the optimization improved the speed by 2x while still having baguette-wise tuning and exposure optimization.
As said before, the absolute value of the time seems to vary alot. Here is another run
dryrun verify_qrcode autotune_qrreader_per optimize_exposure_per time (sec/sample)
-------- --------------- ----------------------- ----------------------- -------------------
False True sample sample 12.837 ± 0.378 (original)
False False baguette baguette 6.923 ± 0.026 (new)
True True sample sample 6.993 ± 0.042 (original overhead only)
True False baguette baguette 3.462 ± 0.068 (new overhead only)
True False - - 1.955 ± 0.042 (raw overhead)
All permutations in a fresh proposal
dryrun verify_qrcode autotune_qrreader_per optimize_exposure_per time (sec/sample)
-------- --------------- ----------------------- ----------------------- -------------------
True True - - 1.312 ± 0.115
True True - sample 4.376 ± 0.069
True True - baguette 2.654 ± 0.052
True True sample - 2.951 ± 0.192
True True sample sample 6.444 ± 0.127
True True sample baguette 4.303 ± 0.067
True True baguette - 1.256 ± 0.021
True True baguette sample 4.771 ± 0.043
True True baguette baguette 2.651 ± 0.041
True False - - 1.014 ± 0.003
True False - sample 4.522 ± 0.067
True False - baguette 2.495 ± 0.071
True False sample - 1.535 ± 0.005
True False sample sample 4.975 ± 0.010
True False sample baguette 2.993 ± 0.066
True False baguette - 1.020 ± 0.008
True False baguette sample 4.540 ± 0.050
True False baguette baguette 2.373 ± 0.051
False True - - 4.569 ± 0.017
False True - sample 10.409 ± 0.288
False True - baguette 6.248 ± 0.093
False True sample - 6.361 ± 0.116
False True sample sample 12.038 ± 0.593
False True sample baguette 7.901 ± 0.154
False True baguette - 4.880 ± 0.014
False True baguette sample 10.931 ± 0.292
False True baguette baguette 6.508 ± 0.088
False False - - 4.799 ± 0.022
False False - sample 10.622 ± 0.234
False False - baguette 6.236 ± 0.077
False False sample - 5.446 ± 0.016
False False sample sample 11.411 ± 0.249
False False sample baguette 7.059 ± 0.070
False False baguette - 5.157 ± 0.040
False False baguette sample 11.156 ± 0.142
False False baguette baguette 6.766 ± 0.037
Related
https://jira.esrf.fr/browse/DPDEV-203
- profiling: dau/devops/bliss/blisshelpers/blissprofile!1
- streamline_changer: streamline_changer!28 (merged)
Reference
Qr-code reader
-
SR700NL20.autotuning
: starts from the last tunned bank or bank 3, lets the reader tune itself until in succeeds or fails -
SR700NL20.read(autoTuningAllowed=True)
: reads the qrcode, callsSR700NL20.autotuning
when failed -
SampleChanger.tune_qrreader
: takes about 5 seconds withforce=True
optimize exposure/attenuator
-
limatake
andct
take the same time butlimatake
(1 sec overhead) does not print table of counters so use that one
SIXC [56]: with bench():
...: limatake(0.2)
acquisition chain
└── p3
└── roi_counters
Scan 52 2024-03-20T15:00:19.478835+01:00 None sixc user = opid31
limatake 0.2000 1
p3 acq #1
Finished (took 0:00:01.006782)
Execution time: 1s 145ms 128μs
SIXC [57]: with bench():
...: ct(0.2)
ct: elapsed 1.012 s (abort with Ctrl-c)
Execution time: 1s 100ms 795μs
-
Id31StreamlineScanner._optimize_sample_exposure
takes about 3 sec which is the sum of-
setup_globals.att(position)
takes about 1 sec - moving the blades from 14 to 31 or v.v. takes between 0.5 and sec (we do it at least 2 times)
- limatake takes about 1 sec
-
-
determining all attenuator/exposure conditions with an
ascan
at fixed attenuator position takes ... sec/sample and with individual ct's at variable attenuator position (when counts are too high or low) takes ... sec/sample.
SIXC [1]: user_script_load("/users/blissadm/local/xrpd/blissprofile/id31_streamline.py")
SIXC [2]: user.timeit_optimize_exposure()