SMX PSI/Jungfrau-4M: SEGV in the middle of a long acquisition
- 14:41:44.367 INFO: Status: running
- 14:41:44.367 INFO: 1: PacketStream for port 33115
- 14:41:44.367 INFO: File Write Disabled
- 14:41:44.367 INFO: File Write Disabled
- 14:41:44.367 INFO: 0: PacketStream for port 32912
- 14:41:44.367 INFO: File Write Disabled
- 14:41:44.367 INFO: Ready ...
- 14:41:44.367 INFO: Ready ...
- 14:41:44.367 INFO: Receiver Started
- 14:41:44.367 INFO: Status: running
- 14:41:44.367 INFO: Ready ...
- 14:41:44.367 INFO: Receiver Started
- 14:41:44.367 INFO: Receiver Started
- 14:41:44.367 INFO: Status: running
- 14:41:44.367 INFO: Status: running
- 14:41:44.367 INFO: 1: UDP port opened at port 33112
- 14:41:44.367 INFO: 1: PacketStream for port 33112
- 14:41:44.367 INFO: File Write Disabled
- 14:41:44.367 INFO: Ready ...
- 14:41:44.367 INFO: Receiver Started
- 14:41:44.367 INFO: Status: running
[lid29p9jfrau2:3702542] *** Process received signal ***
[lid29p9jfrau2:3702542] Signal: Segmentation fault (11)
[lid29p9jfrau2:3702542] Signal code: (3)
[lid29p9jfrau2:3702542] Failing at address: 0x73736572676f7280
[lid29p9jfrau2:3702542] [ 0] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x74c3e4a604c8]
[lid29p9jfrau2:3702542] [ 1] /lib/powerpc64le-linux-gnu/libc.so.6(+0xac0b0)[0x74c3e2e1c0b0]
[lid29p9jfrau2:3702542] [ 2] [0x74c39d7ad5e0]
[lid29p9jfrau2:3702542] [ 3] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libstdc++.so.6(_Znwm+0x2c)[0x74c3e30c5eb0]
[lid29p9jfrau2:3702542] [ 4] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/liblima_psi.so.0.0(_ZN5boost4json6detail16default_resource11do_allocateEmm+0x2c)[0x74c3e485509c]
[lid29p9jfrau2:3702542] [ 5] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libboost_json.so.1.80.0(_ZN5boost4json6objectixENS_4core17basic_string_viewIcEE+0x130)[0x74c3e3678360]
[lid29p9jfrau2:3702542] [ 6] lima2_psi_smx_recv(+0xf25ec)[0xd1a1ac425ec]
[lid29p9jfrau2:3702542] [ 7] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libtango.so.94(_ZN5Tango12Device_3Impl25read_attributes_no_exceptERKNS_17DevVarStringArrayERNS_17_AttributeIdlDataEbRSt6vectorIlSaIlEE+0x728)[0x74c3e3cb1a38]
[lid29p9jfrau2:3702542] [ 8] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libtango.so.94(_ZN5Tango12Device_5Impl17read_attributes_5ERKNS_17DevVarStringArrayENS_9DevSourceERKNS_9ClntIdentE+0xde4)[0x74c3e3cd7404]
[lid29p9jfrau2:3702542] [ 9] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libtango.so.94(+0x412918)[0x74c3e3b92918]
[lid29p9jfrau2:3702542] [10] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libomniORB4.so.2(_ZN14omniCallHandle6upcallEP11omniServantR18omniCallDescriptor+0x504)[0x74c3e4670b34]
[lid29p9jfrau2:3702542] [11] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libtango.so.94(_ZN5Tango14_impl_Device_59_dispatchER14omniCallHandle+0x6a0)[0x74c3e3b96f20]
[lid29p9jfrau2:3702542] [12] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libomniORB4.so.2(_ZN4omni10omniOrbPOA8dispatchER14omniCallHandleP17omniLocalIdentity+0x218)[0x74c3e4661188]
[lid29p9jfrau2:3702542] [13] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libomniORB4.so.2(_ZN17omniLocalIdentity8dispatchER14omniCallHandle+0x8c)[0x74c3e462234c]
[lid29p9jfrau2:3702542] [14] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libomniORB4.so.2(_ZN4omni6GIOP_S13handleRequestEv+0x1c4)[0x74c3e46a91f4]
[lid29p9jfrau2:3702542] [15] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libomniORB4.so.2(_ZN4omni6GIOP_S10dispatcherEv+0x344)[0x74c3e46aad74]
[lid29p9jfrau2:3702542] [16] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libomniORB4.so.2(_ZN4omni10giopWorker7executeEv+0x70)[0x74c3e46a2130]
[lid29p9jfrau2:3702542] [17] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libomniORB4.so.2(_ZN15omniAsyncWorker8real_runEv+0x144)[0x74c3e460f0a4]
[lid29p9jfrau2:3702542] [18] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libomniORB4.so.2(_ZN20omniServerWorkerInfo3runEv+0x64)[0x74c3e460f6a4]
[lid29p9jfrau2:3702542] [19] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libomniORB4.so.2(+0xe1288)[0x74c3e4611288]
[lid29p9jfrau2:3702542] [20] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libomniORB4.so.2(_ZN15omniAsyncWorker7mid_runEv+0x84)[0x74c3e460e934]
[lid29p9jfrau2:3702542] [21] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libomniORB4.so.2(_ZN19omniAsyncWorkerInfo3runEv+0x64)[0x74c3e460ef34]
[lid29p9jfrau2:3702542] [22] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libtango.so.94(+0x7defd4)[0x74c3e3f5efd4]
[lid29p9jfrau2:3702542] [23] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libomniORB4.so.2(_ZN19omniAsyncWorkerInfo3runEv+0x38)[0x74c3e460ef08]
[lid29p9jfrau2:3702542] [24] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libomniORB4.so.2(+0xe1158)[0x74c3e4611158]
[lid29p9jfrau2:3702542] [25] /users/blissadm/conda/miniconda/envs/jungfrau_lima2/lib/libomnithread.so.4(omni_thread_wrapper+0x19c)[0x74c3e4504c2c]
[lid29p9jfrau2:3702542] [26] /lib/powerpc64le-linux-gnu/libpthread.so.0(+0x8838)[0x74c3e47a8838]
[lid29p9jfrau2:3702542] [27] /lib/powerpc64le-linux-gnu/libc.so.6(clone+0x74)[0x74c3e2ebba44]
[lid29p9jfrau2:3702542] *** End of error message ***
[2024-03-06 15:38:18.225878][0x000944fc][0x000079e32a36eee0][tango/cpp/include/lima/tango/receiver_class.hpp:456][error][proc][] Deleting device from DB: id29/limaprocessing/414ee0ea-dbbf-11ee-bc47-d08e790a3bbd@0
[2024-03-06 15:38:18.226722][0x000944fd][0x000076d5babceee0][tango/cpp/include/lima/tango/receiver_class.hpp:456][error][proc][] Deleting device from DB: id29/limaprocessing/414ee0ea-dbbf-11ee-bc47-d08e790a3bbd@1
[2024-03-06 15:38:18.228833][0x00387f0d][0x00007c829d86eee0][tango/cpp/include/lima/tango/receiver_class.hpp:456][error][proc][] Deleting device from DB: id29/limaprocessing/414ee0ea-dbbf-11ee-bc47-d08e790a3bbd@2
[2024-03-06 15:38:18.227774][0x000944fc][0x000079e32a36eee0][tango/cpp/include/lima/tango/receiver_class.hpp:460][error][proc][] Erasing pipeline: 414ee0ea-dbbf-11ee-bc47-d08e790a3bbd
[2024-03-06 15:38:18.228637][0x000944fd][0x000076d5babceee0][tango/cpp/include/lima/tango/receiver_class.hpp:460][error][proc][] Erasing pipeline: 414ee0ea-dbbf-11ee-bc47-d08e790a3bbd
[2024-03-06 15:38:18.230840][0x00387f0d][0x00007c829d86eee0][tango/cpp/include/lima/tango/receiver_class.hpp:460][error][proc][] Erasing pipeline: 414ee0ea-dbbf-11ee-bc47-d08e790a3bbd
[2024-03-06 15:38:18.239006][0x000944fd][0x000076d5babceee0][tango/cpp/include/lima/tango/receiver_class.hpp:464][error][proc][] Deleting device from class: id29/limaprocessing/414ee0ea-dbbf-11ee-bc47-d08e790a3bbd@1
[2024-03-06 15:38:18.243846][0x000944fc][0x000079e32a36eee0][tango/cpp/include/lima/tango/receiver_class.hpp:464][error][proc][] Deleting device from class: id29/limaprocessing/414ee0ea-dbbf-11ee-bc47-d08e790a3bbd@0
[2024-03-06 15:38:18.246348][0x00387f0d][0x00007c829d86eee0][tango/cpp/include/lima/tango/receiver_class.hpp:464][error][proc][] Deleting device from class: id29/limaprocessing/414ee0ea-dbbf-11ee-bc47-d08e790a3bbd@2
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 4 with PID 3702542 on node lid29p9jfrau2 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
Restoring default CPU affinity to other processes ...
The call stack seems to be:
* operator new(unsigned long)
* boost::json::detail::default_resource::do_allocate(unsigned long, unsigned long)
* boost::json::object::operator[](boost::core::basic_string_view<char>)
* lima::tango::JsonReadAttr<lima::tango::processing, &(lima::tango::processing::progress_counters() const)>::read(Tango::DeviceImpl*, Tango::Attribute&)
* Tango::Device_3Impl::read_attributes_no_except(Tango::DevVarStringArray const&, Tango::_AttributeIdlData&, bool, std::vector<long, std::allocator<long> >&)
* Tango::Device_5Impl::read_attributes_5(Tango::DevVarStringArray const&, Tango::DevSource, Tango::ClntIdent const&)
* _0RL_lcfn_6fe2f94a21a10053_84000000(omniCallDescriptor*, omniServant*)
* omniCallHandle::upcall(omniServant*, omniCallDescriptor&)
* Tango::_impl_Device_5::_dispatch(omniCallHandle&)
* omni::omniOrbPOA::dispatch(omniCallHandle&, omniLocalIdentity*)
It seems that the fourth receiver accumulates a delay in file saving:
opid29@lid29control-2:/data/visitor/mx2545/id29/20240305/RAW_DATA/FAE_old/space5_24T/run_02_ssx_foil_collection$
ls -l --full-time FAE_old-FAE_old_dense_?_00140.h5
-rwxr-x--- 1 opid29 jsbg 10629068315 2024-03-06 15:23:17.707820196 +0100 FAE_old-FAE_old_dense_0_00140.h5
-rwxr-x--- 1 opid29 jsbg 10608225030 2024-03-06 15:23:17.720742759 +0100 FAE_old-FAE_old_dense_1_00140.h5
-rwxr-x--- 1 opid29 jsbg 10620634024 2024-03-06 15:23:17.721035942 +0100 FAE_old-FAE_old_dense_2_00140.h5
-rwxr-x--- 1 opid29 jsbg 10652841828 2024-03-06 15:35:56.088052736 +0100 FAE_old-FAE_old_dense_3_00140.h5
Edited by Alejandro Homs Puron