Skip to content

speedup streamline

Wout De Nolf requested to merge optimize_streamline into main

Closes #44 (closed)

Streamline overheads considered

  • QR-reader tuning: 5 seconds
  • Exposure/attenuator optimization: 3 sec per sample when using a ct (0.8 sec when using an ascan with save=False)
  • ID31 scan overhead (p3 initialization + metadata gathering): 1 to 2 sec for an sct
  • In addition overheads with newsample and enddataset were discovered (see below, ~1 sec each when ~2500 datasets)
  • Is the P3 saving temporary EDF files?

image

The current situation

  • QR-reader tuning whenever we read "QRCODE_NOT_READABLE" (for example when there is no QR code)
  • exposure/attenuator optimization for each sample individually with a ct (and making sure we base the calculation on a pixel value that is within the dynamic range of the camera and not too low)

This MR

Individual tuning and exposure/attenuator optimization is the most robust.

This MR adds streamline_scanner options to exchange robustness for speed when desired:

SIXC [5]: streamline_scanner
    Robust vs. speed:
        verify_qrcode           False     
        autotune_qrreader_per   'baguette'
        optimize_exposure_per   'baguette'

    Testing:
        dryrun   False
  • autotune_qrreader_per:
    • None: never tune
    • "sample": tune when a sample QR code is "QRCODE_NOT_READABLE"
    • "baguette": tune one time after loading when QR code is "QRCODE_NOT_READABLE" for the first (or second or third or ... it looks for the first successful read)
  • optimize_exposure_per:
    • None: keep the current attenuator and measure for the requested time (1 sec by default)
    • "sample": determine optimal attenuator and exposure time for each sample individually to reach a certain max pixel value
    • "baguette": determine optimal attenuator and exposure time for all samples with a single scan at fixed attenuator position
  • verify_qrcode:
    • False: read QR code only once (when moving to a hole)
    • True: verify the QR-code before and after each measurement. QR-codes are read 3 times per sample when enabled (autotune_qrreader_per applies to the two additional reads as well)
  • dryrun:
    • False: scan normally (i.e. save data + trigger workflows)
    • True: scan without the actual measurement to check the total overhead (i.e. the actual sct that saves data + triggers a workflow)

Profiling

For the tests I used a baguette with one missing QR code and measured the average time per sample for a full baguette run.

Profiling commands are

SIXC [1]: user_script_load("/users/blissadm/local/xrpd/blissprofile/id31_streamline.py")
SIXC [2]: user.timeit_holder_scan()  # takes 2.5h to complete
SIXC [3]: user.timeit_holder_scan_original(dryrun=False)
SIXC [4]: user.timeit_holder_scan_new(dryrun=False)
SIXC [5]: user.profile_holder_scan()
SIXC [6]: user.timeit_optimize_exposure()  # ascan vs. individual ct optimization

REMARK: as time goes on, the system becomes slower. For ~2500 datasets a newdataset() call takes almost 1 sec blissstats_opid31_pstat_newdataset.pyprof. So do not look at the absolute times but the different between different settings.

These are the most important profiling results

dryrun    verify_qrcode    autotune_qrreader_per    optimize_exposure_per    time (sec/sample)
--------  ---------------  -----------------------  -----------------------  -------------------
False     True             sample                   sample                   12.038 ± 0.593 (original)
False     False            baguette                 baguette                 6.766 ± 0.037  (new)

True      True             sample                   sample                   6.444 ± 0.127  (original overhead only)
True      False            baguette                 baguette                 2.373 ± 0.051  (new overhead only)

True      False            -                        -                        1.014 ± 0.003  (raw overhead)
  • original: like it was before
  • new: like it is with the new default options
  • raw: just baguette moving, QR-code reading and data policy commands

So the optimization improved the speed by 2x while still having baguette-wise tuning and exposure optimization.

As said before, the absolute value of the time seems to vary alot. Here is another run

dryrun    verify_qrcode    autotune_qrreader_per    optimize_exposure_per    time (sec/sample)
--------  ---------------  -----------------------  -----------------------  -------------------
False     True             sample                   sample                   12.837 ± 0.378 (original)
False     False            baguette                 baguette                 6.923 ± 0.026  (new)

True      True             sample                   sample                   6.993 ± 0.042  (original overhead only)
True      False            baguette                 baguette                 3.462 ± 0.068  (new overhead only)

True      False            -                        -                        1.955 ± 0.042  (raw overhead)

All permutations in a fresh proposal

dryrun    verify_qrcode    autotune_qrreader_per    optimize_exposure_per    time (sec/sample)
--------  ---------------  -----------------------  -----------------------  -------------------
True      True             -                        -                        1.312 ± 0.115
True      True             -                        sample                   4.376 ± 0.069
True      True             -                        baguette                 2.654 ± 0.052
True      True             sample                   -                        2.951 ± 0.192
True      True             sample                   sample                   6.444 ± 0.127
True      True             sample                   baguette                 4.303 ± 0.067
True      True             baguette                 -                        1.256 ± 0.021
True      True             baguette                 sample                   4.771 ± 0.043
True      True             baguette                 baguette                 2.651 ± 0.041
True      False            -                        -                        1.014 ± 0.003
True      False            -                        sample                   4.522 ± 0.067
True      False            -                        baguette                 2.495 ± 0.071
True      False            sample                   -                        1.535 ± 0.005
True      False            sample                   sample                   4.975 ± 0.010
True      False            sample                   baguette                 2.993 ± 0.066
True      False            baguette                 -                        1.020 ± 0.008
True      False            baguette                 sample                   4.540 ± 0.050
True      False            baguette                 baguette                 2.373 ± 0.051
False     True             -                        -                        4.569 ± 0.017
False     True             -                        sample                   10.409 ± 0.288
False     True             -                        baguette                 6.248 ± 0.093
False     True             sample                   -                        6.361 ± 0.116
False     True             sample                   sample                   12.038 ± 0.593
False     True             sample                   baguette                 7.901 ± 0.154
False     True             baguette                 -                        4.880 ± 0.014
False     True             baguette                 sample                   10.931 ± 0.292
False     True             baguette                 baguette                 6.508 ± 0.088
False     False            -                        -                        4.799 ± 0.022
False     False            -                        sample                   10.622 ± 0.234
False     False            -                        baguette                 6.236 ± 0.077
False     False            sample                   -                        5.446 ± 0.016
False     False            sample                   sample                   11.411 ± 0.249
False     False            sample                   baguette                 7.059 ± 0.070
False     False            baguette                 -                        5.157 ± 0.040
False     False            baguette                 sample                   11.156 ± 0.142
False     False            baguette                 baguette                 6.766 ± 0.037

Related

https://jira.esrf.fr/browse/DPDEV-203

Reference

Qr-code reader

  • SR700NL20.autotuning: starts from the last tunned bank or bank 3, lets the reader tune itself until in succeeds or fails
  • SR700NL20.read(autoTuningAllowed=True): reads the qrcode, calls SR700NL20.autotuning when failed
  • SampleChanger.tune_qrreader: takes about 5 seconds with force=True

optimize exposure/attenuator

  • limatake and ct take the same time but limatake (1 sec overhead) does not print table of counters so use that one
SIXC [56]: with bench():
      ...:     limatake(0.2)
acquisition chain
└── p3
    └── roi_counters

Scan 52 2024-03-20T15:00:19.478835+01:00 None sixc user = opid31
limatake 0.2000 1
p3 acq #1
Finished (took 0:00:01.006782)

Execution time: 1s 145ms 128μs
SIXC [57]: with bench():                                                                                                                                                                                                      
      ...:     ct(0.2)                                                                                                                                                                                                        
ct: elapsed 1.012 s  (abort with Ctrl-c) 
Execution time: 1s 100ms 795μs
  • Id31StreamlineScanner._optimize_sample_exposure takes about 3 sec which is the sum of

    • setup_globals.att(position) takes about 1 sec
    • moving the blades from 14 to 31 or v.v. takes between 0.5 and sec (we do it at least 2 times)
    • limatake takes about 1 sec
  • determining all attenuator/exposure conditions with an ascan at fixed attenuator position takes ... sec/sample and with individual ct's at variable attenuator position (when counts are too high or low) takes ... sec/sample.

SIXC [1]: user_script_load("/users/blissadm/local/xrpd/blissprofile/id31_streamline.py")
SIXC [2]: user.timeit_optimize_exposure()
Edited by Wout De Nolf

Merge request reports