Ptycho: shared probe
More stable optimisations, notably for near-field ptycho (NFP), could benefit from using a shared probe between multiple scans.
In the case of NFP, this would be particularly useful if the relative scan positions are different for multiple projections, as this would increase the diversity of the probe shifts.
The basic principle would be to average or update the probe from the multiple scans.
Several approaches are possible:
- using MPI, periodically share the probe and average it.
- In principle it could be done every cycle, but with a terrible performance so it is better to do it every N cycle
- the advantage of using MPI is that it would be the least intrusive (just need to introduce a special operator step which would gather and merge the probe between all instances)
- using MPI it would assume that a sufficiently large number of scans are running together, since this is how the greatest benefit could be obtained.
- Since requesting 20-50 simultaneous GPUs is not a good idea (and this is not compatible with preemptive queues), MPI is probably not the optimal approach.
- loading multiple (20-50) scans to be analysed with a single GPU. This would be done by:
- calling multiple times
PtychoRunnerScan.load_scan()
- keeping a stack of all the objects specific to each scan (
PtychoData
, object, probe), likely in the Ptycho object, ready to be swapped in/out. Likely a dictionary (with the key=the scan number) of dictionaries with the relevant objects. - adding a
LoopScan
operator which could be used to analyse the scans with the same probe, e.g. with a chain likeDM**1000
replaced byLoopScan(DM**50, merge_probe=True)**20
to merge the probe every 50 cycles - This assumes all scans have almost identical parameters (number of positions, object size)
- The final
save()
call would need to somehow handle looping over all the scans. This should probably be done in thePtychoRunner
rather than thePtychoRunnerScan
- do a regular save of the first scan, then inPtychoRunner
, loop over all other scans to save the data.
- calling multiple times
This last option could work by adding a share_probe
(--share_probe
) option which would change the behaviour of the PtychoRunner
to load all the scans instead of loading them sequentially.
General strategy:
- avoid changing
PtychoRunnerScan
- in the
Ptycho
object:- add a
_multiscan_dict
variable, each entry holding the PtychoData, object and probe for each scan - add a
swap_scan
orselect_scan
function ?
- add a
- add a
LoopScan
operator - modify
PtychoRunner
with a different looping over all the scans, also handling looping all the saving at the end.
Note that there are several approaches possible regarding the shared probe:
- Use a single probe, which is updated sequentially by the different scans, which should eventually converge
- Use a different starting probe for each scan, and average it every N (50) cycles
- Use a different starting probe for each scan, and do a truncated SVD decomposition (or a similar approach) of the nb_scan probes every N(50) cycles, with a small number of coherent modes (e.g. 3)- this should allow to study the variations in the probes
The approach could be decided by adding a command-line option --share-probe n
, where n=-1 (default) would just re-use the probe in sequence, n=1 would average the probe, and n>1 would use a feature decomposition algorithm.
This is related to #166