writeFrame (manual saving) deadlock
Kindly reported by @denolf
This issue affects scan with autof for instance.
Repro:
ssh bcu-ci-lab1
mkdir output
cd output
git clone https://gitlab.esrf.fr/bliss/bliss.git
docker run -it -v $HOME/output esrfbcu/bliss-ci-runner:master /bin/bash
rm /dev/random && ln -s /dev/urandom /dev/random || true
. /opt/conda/etc/profile.d/conda.sh
. /opt/conda/etc/profile.d/mamba.sh
mamba activate default_env
cd /output/bliss
export PYTHON_VERSION=3.9
mamba install --file conda-requirements.txt --file conda-requirements-dev.txt python=$PYTHON_VERSION -c esrf-bcu
pip install .[dev] ./blissdata
export LIMA_SIMULATOR_CONDA_ENV=lima_env
mamba env create --name $LIMA_SIMULATOR_CONDA_ENV -f bliss_lima_simulators/conda-environment.yml
while pytest tests/nexus_writer/test_nxw_autofilter.py::test_nxw_autofilter;do echo 'ok';done
leads to the following trace:
[2023/07/27 12:33:28.305962] 7fe47ef36700 *Application*Lima.Server.LimaCCDs::LimaCCDs::writeImage (LimaCCDs.py:2063)-Funct: Enter
[2023/07/27 12:33:28.306009] 7fe47ef36700 *Control*Control::Saving::writeFrame (control/src/CtSaving.cpp:1775)-Funct: Enter
[2023/07/27 12:33:28.306030] 7fe47ef36700 *Control*Control::Saving::writeFrame (control/src/CtSaving.cpp:1776)-Param: aFrameNumber=47
[2023/07/27 12:33:28.306054] 7fe47ef36700 *Control*Control::Saving::_waitWritingThreads (control/src/CtSaving.cpp:1081)-Funct: Enter
[2023/07/27 12:33:28.306076] 7fe47ef36700 *Control*Control::Control::ImageStatus::ImageStatus (control/src/CtControl.cpp:1358)-Funct: Enter
[2023/07/27 12:33:28.306101] 7fe47ef36700 *Control*Control::Control::ImageStatus::ImageStatus (control/src/CtControl.cpp:1358)-Funct: Exit
[2023/07/27 12:33:28.306127] 7fe47ef36700 *Control*Control::Control::Status::Status (control/src/CtControl.cpp:1393)-Funct: Enter
[2023/07/27 12:33:28.306150] 7fe47ef36700 *Control*Control::Control::Status::Status (control/src/CtControl.cpp:1393)-Funct: Exit
[2023/07/27 12:33:28.306172] 7fe3f1bc6700 *Control*Control::Control::ReadImage (control/src/CtControl.cpp:827)-Funct: Enter
[2023/07/27 12:33:28.306264] 7fe3f1bc6700 *Control*Control::Control::ReadImage (control/src/CtControl.cpp:828)-Param: frameNumber=-1, readBlockLen=1
[2023/07/27 12:33:28.306299] 7fe3f1bc6700 *Control*Control::Control::readBlock (control/src/CtControl.cpp:843)-Funct: Enter
[2023/07/27 12:33:28.306326] 7fe3f1bc6700 *Control*Control::Control::readBlock (control/src/CtControl.cpp:844)-Param: frameNumber=-1, readBlockLen=1, baseImage=0
[2023/07/27 12:33:28.306355] 7fe3f1bc6700 *Control*Control::Image::getImageDim (control/src/CtImage.cpp:652)-Funct: Enter
[2023/07/27 12:33:28.306382] 7fe3f1bc6700 *Control*Control::Acquisition::getAcqMode (control/src/CtAcquisition.cpp:444)-Funct: Enter
[2023/07/27 12:33:28.306406] 7fe3f1bc6700 *Control*Control::Acquisition::getAcqMode (control/src/CtAcquisition.cpp:448)-Return: mode=Single
[2023/07/27 12:33:28.306432] 7fe3f1bc6700 *Control*Control::Acquisition::getAcqMode (control/src/CtAcquisition.cpp:444)-Funct: Exit
[2023/07/27 12:33:28.306456] 7fe3f1bc6700 *Control*Control::Image::getImageType (control/src/CtImage.cpp:629)-Funct: Enter
[2023/07/27 12:33:28.306480] 7fe3f1bc6700 *Control*Control::Acquisition::getAcqMode (control/src/CtAcquisition.cpp:444)-Funct: Enter
[2023/07/27 12:33:28.306504] 7fe3f1bc6700 *Control*Control::Acquisition::getAcqMode (control/src/CtAcquisition.cpp:448)-Return: mode=Single
[2023/07/27 12:33:28.306528] 7fe3f1bc6700 *Control*Control::Acquisition::getAcqMode (control/src/CtAcquisition.cpp:444)-Funct: Exit
[2023/07/27 12:33:28.306552] 7fe3f1bc6700 *Control*Control::Image::getImageType (control/src/CtImage.cpp:641)-Return: type=Bpp32
[2023/07/27 12:33:28.306578] 7fe3f1bc6700 *Control*Control::Image::getImageType (control/src/CtImage.cpp:629)-Funct: Exit
[2023/07/27 12:33:28.306621] 7fe3f1bc6700 *Control*Control::Sofware BinRoiFlip::getSize (control/src/CtImage.cpp:141)-Funct: Enter
[2023/07/27 12:33:28.306670] 7fe3f1bc6700 *Control*Control::Sofware BinRoiFlip::getSize (control/src/CtImage.cpp:143)-Trace: m_max_size=<1024x1024>, m_bin=<1x1>, m_roi=<0,0>-<0x0>
[2023/07/27 12:33:28.306708] 7fe3f1bc6700 *Control*Control::Sofware BinRoiFlip::getSize (control/src/CtImage.cpp:150)-Return: m_size=<1024x1024>
[2023/07/27 12:33:28.306742] 7fe3f1bc6700 *Control*Control::Sofware BinRoiFlip::getSize (control/src/CtImage.cpp:141)-Funct: Exit
[2023/07/27 12:33:28.306774] 7fe3f1bc6700 *Control*Control::Image::getImageDim (control/src/CtImage.cpp:660)-Return: dim=<1024x1024x4-Bpp32>
[2023/07/27 12:33:28.306806] 7fe3f1bc6700 *Control*Control::Image::getImageDim (control/src/CtImage.cpp:652)-Funct: Exit
[2023/07/27 12:33:28.306839] 7fe3f1bc6700 *Control*Control::Saving::getManagedMode (control/src/CtSaving.cpp:1226)-Funct: Enter
[2023/07/27 12:33:28.306894] 7fe47ef36700 *Control*Control::Control::getStatus (control/src/CtControl.cpp:695)-Funct: Enter
where threads 7fe3f1bc6700
and 7fe47ef36700
deadlocks:
- thread1
CtSaving::writeFrame
takesCtSaving
mutex - thread2
CtControl::ReadImage
takesCtControl
mutex - thread1 in
writeFrame
callsControl::getStatus
that try to takeCtControl
mutex - thread2 in
ReadImage
callsSaving::getManagedMode
that try to takeCtSaving
mutex
The mutexes are used with conditional variables and thus are NOT recursive.
Edited by Samuel Debionne