bliss issueshttps://gitlab.esrf.fr/bliss/bliss/-/issues2024-03-07T11:25:14+01:00https://gitlab.esrf.fr/bliss/bliss/-/issues/4246standard: move available in the shell is now exposed as a umove function2024-03-07T11:25:14+01:00Valentin Vallsstandard: move available in the shell is now exposed as a umove functionThis could be considered as a regression.This could be considered as a regression.Valentin VallsValentin Vallshttps://gitlab.esrf.fr/bliss/bliss/-/issues/4244debugoff('*') should restore previous state2024-03-06T11:46:55+01:00Perceval Guilloudebugoff('*') should restore previous statedebugoff('*') set all nodes to warning instead of setting back nodes to the state before `debugon('*')`debugoff('*') set all nodes to warning instead of setting back nodes to the state before `debugon('*')`https://gitlab.esrf.fr/bliss/bliss/-/issues/4243remove deprecated user_* and maintain disable_print context2024-03-06T11:44:18+01:00Perceval Guillouremove deprecated user_* and maintain disable_print contextversion 2.1.0https://gitlab.esrf.fr/bliss/bliss/-/issues/4242handle disable_user_output2024-03-06T11:22:41+01:00Perceval Guillouhandle disable_user_outputPerceval GuillouPerceval Guillouhttps://gitlab.esrf.fr/bliss/bliss/-/issues/4241optimize toolbox.py:sort_counter_by_dependency_level2024-03-07T13:02:22+01:00Damien Naudetoptimize toolbox.py:sort_counter_by_dependency_level
Depending on the setup, **sort_counter_by_dependency_level** can take a very long time (2.5s on bm16) because it can call CounterController.counters thousands of times.
Tested the following on bm16, a simple ct(0.1) went from 4.5s to 2...
Depending on the setup, **sort_counter_by_dependency_level** can take a very long time (2.5s on bm16) because it can call CounterController.counters thousands of times.
Tested the following on bm16, a simple ct(0.1) went from 4.5s to 2.3s:
```
def sort_counter_by_dependency_level(counters):
cnt_dict = {cnt: cnt._counter_controller.counters
for cnt in counters}
def cmp_sort2(cnt1, cnt2):
cnt1_cntrs = cnt_dict[cnt1]
cnt2_cntrs = cnt_dict[cnt2]
if cnt1 in cnt2_cntrs:
return -1
elif cnt2 in cnt1_cntrs:
return 1
else:
return len(cnt1_cntrs) - len(cnt2_cntrs)https://gitlab.esrf.fr/bliss/bliss/-/issues/4240Double Ctrl+C might leave the scan unsealed2024-03-04T18:33:11+01:00Samuel DebionneDouble Ctrl+C might leave the scan unsealedSeen on ID10.
Running a loopscan and pressing Ctrl+C twice might leave the scan open (and Flint would then fail to properly display later scans).
This is relatively easy to reproduce.
```
Traceback (most recent call last): ...Seen on ID10.
Running a loopscan and pressing Ctrl+C twice might leave the scan open (and Flint would then fail to properly display later scans).
This is relatively easy to reproduce.
```
Traceback (most recent call last):
File "/home/blissadm/local/bliss2.git/bliss/scanning/scan.py", line 1372, in wrapper
yield
File "/home/blissadm/local/bliss2.git/bliss/scanning/scan.py", line 1516, in _runctx_scan_data
self._scan_data.close()
File "/home/blissadm/local/bliss2.git/blissdata/blissdata/redis_engine/scan.py", line 56, in wrapper
return func(self, *args, **kwargs)
File "/home/blissadm/local/bliss2.git/blissdata/blissdata/redis_engine/scan.py", line 301, in close
self._close_stream_writers()
File "/home/blissadm/local/bliss2.git/blissdata/blissdata/redis_engine/scan.py", line 56, in wrapper
return func(self, *args, **kwargs)
File "/home/blissadm/local/bliss2.git/blissdata/blissdata/redis_engine/scan.py", line 271, in _close_stream_writers
stream_writer.seal()
File "/home/blissadm/local/bliss2.git/blissdata/blissdata/redis_engine/stream.py", line 96, in wrapper
return func(self, *args, **kwargs)
File "/home/blissadm/local/bliss2.git/blissdata/blissdata/redis_engine/stream.py", line 274, in seal
self._sink.stop()
File "/home/blissadm/local/bliss2.git/blissdata/blissdata/redis_engine/sink.py", line 72, in stop
self._cmd_queue.join()
...
KeyboardInterrupt
```
The scientist also suggested to used the Escape key to abort scans.version 2.0.xhttps://gitlab.esrf.fr/bliss/bliss/-/issues/4237ct(0.1) prints, but ct(0.01) does not print with eh3check.on at ID112024-03-02T08:44:10+01:00Pierre-Olivier Autranct(0.1) prints, but ct(0.01) does not print with eh3check.on at ID11Not urgent (this is @wright). @papillon It looks like a threading glitch or some printing to console is not serialised? You should see "(abort with Ctrl-c) [elapsed time 0.0 s] Checking Undulator positions ..." in the output below, but t...Not urgent (this is @wright). @papillon It looks like a threading glitch or some printing to console is not serialised? You should see "(abort with Ctrl-c) [elapsed time 0.0 s] Checking Undulator positions ..." in the output below, but the browser does not render it, so I guess there are invisible characters or "\r" issues?
```
NSCOPE [1945]: ct(.01)
0%| (abort with Ctrl-c) [elapsed time 0.0 s] Checking Undulator positions ...
48%|██████████████████████████████████████████████████████████████████████▊ (abort with Ctrl-c) [elapsed time 0.2 s] Checking OH1 valves ...
Out [1945]: Scan(name=ct, path='not saved')
NSCOPE [1946]: ct(.1)
0%| (abort with Ctrl-c) [elapsed time 0.0 s] Checking Undulator positions ...
4%|██████▎ (abort with Ctrl-c) [elapsed time 0.2 s] Checking OH1 valves ...
pico6 = 9114.69 ( 91146.9 /s) keithley
current = 67.6300 mA ( 676.300 mA/s) machinfo
sbcurr = 4.63000 mA ( 46.3000 mA/s) machinfo
refill = 1581.00 sec ( 15810.0 sec/s) machinfo
roi1_avg = 12.9414 ( 129.414 /s) basler_eh32
roi1_max = 1954.00 ( 19540.0 /s) basler_eh32
roi1_min = 0.00000 ( 0.00000 /s) basler_eh32
roi1_std = 36.2080 ( 362.080 /s) basler_eh32
roi1_sum = 1.01577e+06 ( 1.01577e+07 /s) basler_eh32
fpico6 = 917885. ( 9.17885e+06 /s) p201_21
sec = 0.100000 s ( 1.00000 s/s) p201_21
Au_det0 = 0 ( 0.00000 /s) mca
Ce_det0 = 0 ( 0.00000 /s) mca
W_det0 = 0 ( 0.00000 /s) mca
Zr_det0 = 0 ( 0.00000 /s) mca
all_det0 = 3 ( 30.0000 /s) mca
deadtime_det0 = 0.413822 ( 4.13822 /s) mca
energy_livetime_det0 = 0.0586176 ( 0.586176 /s) mca
events_det0 = 10 ( 100.000 /s) mca
icr_det0 = 170.597 ( 1705.97 /s) mca
ocr_det0 = 100.000 ( 1000.00 /s) mca
realtime_det0 = 0.0999997 ( 0.999997 /s) mca
trigger_livetime_det0 = 0.0996499 ( 0.996499 /s) mca
triggers_det0 = 17 ( 170.000 /s) mca
Out [1946]: Scan(name=ct, path='not saved')
```https://gitlab.esrf.fr/bliss/bliss/-/issues/4236Isolate bliss.shell.data clearly in the architecture2024-03-01T09:53:41+01:00Valentin VallsIsolate bliss.shell.data clearly in the architectureThis code mostly contain the display for the F5 data display.
It would be good to split what is about this process and what is about the bliss shell, or what is shared.
For instance:
- `bliss.common.utils.nonblocking_print` is only use...This code mostly contain the display for the F5 data display.
It would be good to split what is about this process and what is about the bliss shell, or what is shared.
For instance:
- `bliss.common.utils.nonblocking_print` is only used by the data display
- `bliss.shell.data.display.StepScanProgress` is only used by bliss replversion 2.1.0Valentin VallsValentin Vallshttps://gitlab.esrf.fr/bliss/bliss/-/issues/4235Shell tests can locally fail2024-02-29T17:25:35+01:00Valentin VallsShell tests can locally failThis is probably because of a user config which is loaded from the env.
This have to be hardcoded by the tests.
![image](/uploads/c2caec2db97c06042943c0ad25e56b9e/image.png)This is probably because of a user config which is loaded from the env.
This have to be hardcoded by the tests.
![image](/uploads/c2caec2db97c06042943c0ad25e56b9e/image.png)version 2.1.0Valentin VallsValentin Vallshttps://gitlab.esrf.fr/bliss/bliss/-/issues/4232blissterm: minor todo2024-02-29T09:26:49+01:00Stuart Fisherblissterm: minor todo- [ ] limit to single session
- [ ] handle malformed yaml
- [ ] import missing components in daiquiri-lib (beamviewer, intraled)
- [ ] terminal resizing (RemountOnResize)- [ ] limit to single session
- [ ] handle malformed yaml
- [ ] import missing components in daiquiri-lib (beamviewer, intraled)
- [ ] terminal resizing (RemountOnResize)Stuart FisherStuart Fisherhttps://gitlab.esrf.fr/bliss/bliss/-/issues/4228BM16: mca stuck the scan2024-03-05T09:47:09+01:00Valentin VallsBM16: mca stuck the scanFor information, at BM16, @naudet have noticed that the non-mosca MCA was able to stuck the scan.
In this case the CTRL-C is not working anymore.
py-spy was able to retirve useful information
```
# Install py-spy
ssh blissadm#id00ctr...For information, at BM16, @naudet have noticed that the non-mosca MCA was able to stuck the scan.
In this case the CTRL-C is not working anymore.
py-spy was able to retirve useful information
```
# Install py-spy
ssh blissadm#id00ctrl
. blissenv
pip install py-spy
which py-spy
# /users/blissadm/conda/miniconda/envs/bliss_dev/bin/py-spy
# get the BLISS PID
ps aux | grep bliss | grep monocryo
```
```
# connect as yourself
# because py-spy need to have rights to read process memory
ssh me@id00ctrl
sudo /users/blissadm/conda/miniconda/envs/bliss_dev/bin/py-spy dump --nonblocking --pid 666
```
Here is what was the result
```
Python v3.9.18 (/opt/bliss/conda/miniconda/envs/bliss_dev/bin/python3.9)
Thread 0x7F62DE863740 (active+gil): "MainThread"
start_acquisition (bliss/controllers/mca/xglab_dante.py:694)
start (bliss/scanning/acquisition/mca.py:206)
acq_start (bliss/scanning/chain.py:759)
acq_start (bliss/scanning/chain.py:109)
Thread 0x7F6253700700 (active)
acquire_with_timeout (gevent/_threading.py:36)
wait (gevent/_threading.py:86)
get (gevent/_threading.py:207)
run (gevent/threadpool.py:195)
Thread 0x7F61ACD3B700 (idle): "asyncio_0"
wait (threading.py:312)
acquire (threading.py:450)
get (queue.py:294)
_worker (concurrent/futures/thread.py:81)
run (threading.py:917)
_bootstrap_inner (threading.py:980)
_bootstrap (threading.py:937)
Thread 0x7F6252EFF700 (idle): "Thread-2"
wait (threading.py:316)
wait (threading.py:581)
run (tqdm/_monitor.py:60)
_bootstrap_inner (threading.py:980)
_bootstrap (threading.py:937)
Thread 0x7F61AF53C700 (active)
acquire_with_timeout (gevent/_threading.py:36)
wait (gevent/_threading.py:86)
get (gevent/_threading.py:207)
run (gevent/threadpool.py:195)
Thread 0x7F61A91D9700 (active)
```
We can guess that BLISS is stuck inside `xglab_dante`, which is a very useful information.https://gitlab.esrf.fr/bliss/bliss/-/issues/4227ID22: lima events should reflect what goes in the lima HDF5 file2024-02-23T15:17:48+01:00Wout De NolfID22: lima events should reflect what goes in the lima HDF5 fileWith this custom scan
https://gitlab.esrf.fr/bcu-vercors/id22/id22/-/blob/master/id22/scripts/tscan.py?ref_type=heads#L19
The NeXus writer creates a virtual dataset linking the `lima` data
```
/data/visitor/ch6690/id22/20240220/RAW_DA...With this custom scan
https://gitlab.esrf.fr/bcu-vercors/id22/id22/-/blob/master/id22/scripts/tscan.py?ref_type=heads#L19
The NeXus writer creates a virtual dataset linking the `lima` data
```
/data/visitor/ch6690/id22/20240220/RAW_DATA/empty_PDF/empty_PDF_RT/empty_PDF_RT.h5::/1.1/instrument/perkinelmer/data
```
which cannot be read by `blissdata`
```
pip install ewoks ewoksxrpd
rm -rf /tmp/testresults
ewoks execute workflow.json --log=info # this hangs forever
```
The reason is that the virtual dataset links to files that do not and never will exist for the `tscan`, which is saving lima images in MANUAL mode (only saving half of them, the others are used for background subtraction) and not adapting the Redis events accordingly. So the NeXus writer (and Flint as well I assume) does not know about the missing images.
This fixes the issue but will break other online data processing:
```
@@ -53,10 +53,14 @@ class BlissDynamicHDF5Handler(hdf5.DynamicHDF5Handler):
"""Check that the last source of a virtual dataset is released by the writer."""
if not hasattr(item, "is_virtual"):
return
if not item.is_virtual:
return
+ print(f"PERKIN PATCH FOR BLISSDATA: ASSUME VDS {item.name} IS ACCESSIBLE")
+ # WDN: fix for perkin with half of the images saved
+ return
```https://gitlab.esrf.fr/bliss/bliss/-/issues/4226NeXus writer: dynamic disk space warnings2024-02-22T17:13:45+01:00Wout De NolfNeXus writer: dynamic disk space warnings200TB disk quota of /data/visitor/in1172 was exceeded and our disk space checks were insufficient to catch this, most likely caused by the fixed disk space limits which were too low for the data rate.
https://requests.esrf.fr/browse/SCH...200TB disk quota of /data/visitor/in1172 was exceeded and our disk space checks were insufficient to catch this, most likely caused by the fixed disk space limits which were too low for the data rate.
https://requests.esrf.fr/browse/SCHLP-24728
> Bliss and the NeXus writer already monitor disk space and notify the user or prevent starting a scan:
>
> 1. Do not start a scan when "free_space < required_disk_space"
>
> 2. During a scan, print a warning when "free_space < recommended_disk_space" (checked every 3 seconds)
>
> 3. The NeXus writer stops writing when "free_space < required_disk_space" (checked every 3 seconds)
>
> In all three cases, the user should get an easy to understand warning or error message. We check disk space like this:
>
```
import os
required_disk_space = 200 # MB, configurable
recommended_disk_space = 1024 # MB, configurable
dataset_directory = "/data/visitor/in1172/bm18/20240221/RAW_DATA/helical_HA2200_33.94um_Hauser_pte/helical_HA2200_33.94um_Hauser_pte_0001/"
stat = os.statvfs(dataset_directory)
free_space = stat.f_frsize * stat.f_bavail / 1024**2 # MB
```
>
> The problem is that when the data rate is larger than "required_disk_space/3" MB/s (which most likely is the case at BM18) this check will come too late. At least I think that's why the check was not working and the users did not get an informative error message. Another reason could have been that the os.statvfs call takes several seconds (we know this can happen on NFS). In that case you keep scanning until the call returns (executed in a separate thread).
>
> The error message the user saw was
> RuntimeError: ('GroupingMaster', 'Nexus writer is in FAULT state (Driver truncate request failed (slist already enabled?))')
>
> In the writer error logs I see
> blissadm@lbm18ctrl:/var/log$ grep FAULT nexus_writer.log
>
> ...
> ERROR 2024-02-21 20:29:14,239 nexus_writer_service.subscribers.session_writer: [MRTOMO-4 (RUNNING)] [2_dark images-4 (FAULT)] [Errno 122] Unable to open file (unable to close file, errno = 122, error message = 'Disk quota exceeded')
> ...
> ERROR 2024-02-21 20:29:14,350 nexus_writer_service.subscribers.session_writer: [MRTOMO-3 (RUNNING)] [fullturn-3 (FAULT)] Driver truncate request failed (slist already enabled?)
>
> So the users saw the second error but not the first. So what happened here is that scan "2_dark images" failed which got ignored (I don't understand how) and then the second scan was started "fullturn" got started and failed with a cryptic error message about truncate request failing.
> What we can try to do is make the recommended_disk_space and required_disk_space limits dynamic, depending on the data rate during the scan (could be different for every scan). However if the os.statvfs call takes several seconds to resolve (not sure how often this happens) we cannot do anything.https://gitlab.esrf.fr/bliss/bliss/-/issues/4225XIA: HandelError: [HandelError 402] XERXES: XerXes returned an error2024-03-05T09:46:48+01:00Olivier UlrichXIA: HandelError: [HandelError 402] XERXES: XerXes returned an error```
.../bliss/common/cleanup.py", line 266, in capture yield
.../bliss/scanning/scan.py", line 1371, in wrapper yield
.../bliss/scanning/scan.py", line 1286, in run self._execute_scan_runner(runner)
.../bliss/scanning/scan.py", line 1358...```
.../bliss/common/cleanup.py", line 266, in capture yield
.../bliss/scanning/scan.py", line 1371, in wrapper yield
.../bliss/scanning/scan.py", line 1286, in run self._execute_scan_runner(runner)
.../bliss/scanning/scan.py", line 1358, in _execute_scan_runner runner.send(
.../bliss/scanning/scan.py", line 333, in send return self.runner.send(arg)
.../bliss/scanning/scan.py", line 391, in _run self._gwait(stop_tasks, masked_kill_nb=1)
/opt/.../python3.9/contextlib.py", line 126, in __exit__ next(self.gen)
.../bliss/common/cleanup.py", line 289, in capture_exceptions raise value.with_traceback(tb)
.../bliss/common/cleanup.py", line 266, in capture yield
.../bliss/scanning/scan.py", line 366, in _run t.get() # get the task result ; this may raise an exception
.../bliss/common/greenlet_utils/killmask.py", line 197, in get return super().get(*args, **keys)
.../bliss/scanning/scan.py", line 326, in _run_next for i in next_iter:
.../bliss/scanning/chain.py", line 1009, in __next__ gevent.joinall(this_level_tasks, raise_error=True)
.../bliss/scanning/chain.py", line 1079, in call_with_debug return getattr(acq_obj_iter, func_name)(*args, *kwargs)
.../bliss/scanning/chain.py", line 470, in acq_wait_ready gevent.joinall(tasks, count=1, raise_error=True)
.../bliss/scanning/chain.py", line 817, in wait_reading self._reading_task.get()
.../bliss/common/greenlet_utils/killmask.py", line 197, in get return super().get(*args, **keys)
.../bliss/scanning/acquisition/mca.py", line 243, in reading raise values
C:...\handel\interface.py", line 965, in trigger
C:...\handel\interface.py", line 257, in stop_run
C:...\handel\error.py", line 272, in check_error
HandelError: [HandelError 402] XERXES: XerXes returned an error
```https://gitlab.esrf.fr/bliss/bliss/-/issues/4224[Mosca] Counters depending on the trigger mode2024-03-12T15:55:23+01:00Damien Naudet[Mosca] Counters depending on the trigger modeSupport for when the number of statistics returned by the controller depends on the trigger mode.
Use case: xglab.Support for when the number of statistics returned by the controller depends on the trigger mode.
Use case: xglab.https://gitlab.esrf.fr/bliss/bliss/-/issues/4218Handling heavy scan metadata in Blissdata2024-02-20T16:48:21+01:00Lucas FelixHandling heavy scan metadata in BlissdataRelated to #4163
# Exploring some approaches...
1. Embed path to an external file into scan's JSON
- pros:
- no modification
- data type can be anything if there is a file format for it
- possibility to lin...Related to #4163
# Exploring some approaches...
1. Embed path to an external file into scan's JSON
- pros:
- no modification
- data type can be anything if there is a file format for it
- possibility to link a single file to multiple scans
- cons:
- not using Redis exposes us to slow filesystem problems
- file should be managed rigorously for blissdata to be reliable (creation order, lifetime, no overwriting)
1. Embed directly into scan's JSON and tell that part of it is disposable
- pros:
- no need for type definition if human readable
- cons:
- JSON is not suited for large binaries (should be human readable, parsers don't like heavy stuff)
- making JSON bigger will slow down _scan.load()_ and put extra load on Redis
- need to modify blissdata protocol (including memory tracker)
1. Use a dedicated stream and publish to it once the scan is running
- pros:
- can reuse ndarray streams for typing
- cons:
- can only be published during running scan or requires blissdata protocol modifications (and potentially a new intermediate scan state)
- need a new type of stream if it can't fit ndarray
1. Use additional Redis keys and make the memory tracker to clean them
- pros:
- ...
- cons:
- need for explicit types definition as neither stream encoder, JSON or file will handle it (no pickle), msgpack ?
- need to modify blissdata protocol (including memory tracker)
1. Use a scan sequence with an extra channel containing large metadata artifact along the actual scan
- pros:
- no modification at all
- can link a single artifact to multiple scans
- cons:
- makes scan access more complexLucas FelixLucas Felixhttps://gitlab.esrf.fr/bliss/bliss/-/issues/4217session object is part of it's own bliss session2024-02-28T13:57:39+01:00Valentin Vallssession object is part of it's own bliss sessionAt ID06 was have a session named `nanodac`
If i type `nanodac` i got
```
NANODAC [7]: nanodac
Out [7]: <bliss.common.session.Session object at 0x7f02f9172e80>
```
1. Does it provide any thing useful?
I imagine if we have to expos...At ID06 was have a session named `nanodac`
If i type `nanodac` i got
```
NANODAC [7]: nanodac
Out [7]: <bliss.common.session.Session object at 0x7f02f9172e80>
```
1. Does it provide any thing useful?
I imagine if we have to expose this object it would be better to always use the reference name, for example `SESSION`.
2. Does it prevent to use "nanodac" for another object?
What do you think?https://gitlab.esrf.fr/bliss/bliss/-/issues/4212CI: flaky tests/config/test_wardrobe.py::test_creation_time2024-02-16T14:04:06+01:00Valentin VallsCI: flaky tests/config/test_wardrobe.py::test_creation_time```
FAILED tests/config/test_wardrobe.py::test_creation_time[True] - AssertionError: last_accessed was not cached
assert datetime.datetime(2024, 2, 16, 9, 35, 36) == datetime.datetime(2024, 2, 16, 9, 35, 37)
= 1 failed, 1931 passed, 24 s...```
FAILED tests/config/test_wardrobe.py::test_creation_time[True] - AssertionError: last_accessed was not cached
assert datetime.datetime(2024, 2, 16, 9, 35, 36) == datetime.datetime(2024, 2, 16, 9, 35, 37)
= 1 failed, 1931 passed, 24 skipped, 22 xfailed, 6 warnings, 1 rerun in 9379.60s (2:36:19) =
```
https://gitlab.esrf.fr/bliss/bliss/-/jobs/962026version 2.1.0https://gitlab.esrf.fr/bliss/bliss/-/issues/4206speedgoat: Use textblock instead of ANSI char2024-03-15T15:16:13+01:00Valentin Vallsspeedgoat: Use textblock instead of ANSI charThe class `SpeedgoatUtils` provides a `display_counters` that use ANSI char to update a block of text.
This is something we can rework with a textblock. See for example see !6105The class `SpeedgoatUtils` provides a `display_counters` that use ANSI char to update a block of text.
This is something we can rework with a textblock. See for example see !6105version 2.1.0Valentin VallsValentin Vallshttps://gitlab.esrf.fr/bliss/bliss/-/issues/4205moco: Unconsistent state2024-02-12T17:14:50+01:00Valentin Vallsmoco: Unconsistent stateThe moco device have some inconsistent implementation
- state is a function
- state does not return anything but print things
- It would also be good to know the different values the state can have
It would be nice to make it consisten...The moco device have some inconsistent implementation
- state is a function
- state does not return anything but print things
- It would also be good to know the different values the state can have
It would be nice to make it consistent with other devices