Segfault on init_device

For the controller and receiver devices we call init_device in the constructor which sometimes segfaults:

https://gitlab.esrf.fr/bliss/bliss/-/jobs/2125316

[runner-f8unlx2w3-project-325-concurrent-5:9495 :0:9495] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8)
==== backtrace (tid:   9495) ====
 0  /opt/conda/envs/bliss_lima2_simulator/bin/../lib/libucs.so.0(ucs_handle_error+0x2fd) [0x709842da584d]
 1  /opt/conda/envs/bliss_lima2_simulator/bin/../lib/libucs.so.0(+0x2fa3f) [0x709842da5a3f]
 2  /opt/conda/envs/bliss_lima2_simulator/bin/../lib/libucs.so.0(+0x2fc0a) [0x709842da5c0a]
 3  /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x709843ff9420]
 4  /opt/conda/envs/bliss_lima2_simulator/bin/../lib/libtango.so.10.1(_ZN5Tango12AttrPropertyC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_+0x27) [0x7098449fd977]
 5  /opt/conda/envs/bliss_lima2_simulator/bin/../lib/libtango.so.10.1(_ZNSt6vectorIN5Tango12AttrPropertyESaIS1_EE17_M_realloc_appendIJRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESB_EEEvDpOT_+0xa4) [0x7098449fea34]
 6  /opt/conda/envs/bliss_lima2_simulator/bin/../lib/libtango.so.10.1(_ZN5Tango19MultiClassAttribute20init_class_attributeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEl+0xba9) [0x7098449ff709]
 7  /opt/conda/envs/bliss_lima2_simulator/bin/../lib/libtango.so.10.1(_ZN5Tango7DServer11init_deviceEv+0x315) [0x709844a81755]
 8  /opt/conda/envs/bliss_lima2_simulator/bin/../lib/libtango.so.10.1(_ZN5Tango12DServerClass14device_factoryEPKNS_17DevVarStringArrayE+0xb7) [0x709844a8e1f7]
 9  /opt/conda/envs/bliss_lima2_simulator/bin/../lib/libtango.so.10.1(_ZN5Tango12DServerClassC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x3b1) [0x709844a8e931]
10  /opt/conda/envs/bliss_lima2_simulator/bin/../lib/libtango.so.10.1(_ZN5Tango12DServerClass4initEv+0x94) [0x709844a8ecd4]
11  /opt/conda/envs/bliss_lima2_simulator/bin/../lib/libtango.so.10.1(_ZN5Tango4Util11server_initEb+0x3f) [0x709844bb315f]
12  lima2_tango(main+0xf4c) [0x56c11394166c]
13  /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x709843b69083]
14  lima2_tango(+0x22c9e) [0x56c113941c9e]
=================================

which cleaned up looks like this

==== backtrace (tid: 9495) ====
0  libucs.so.0(ucs_handle_error+0x2fd)
1  libucs.so.0(+0x2fa3f)
2  libucs.so.0(+0x2fc0a)
3  libpthread.so.0(+0x14420)
4  libtango.so.10.1(Tango::AttrProperty::AttrProperty(...) +0x27)
5  libtango.so.10.1(std::vector<Tango::AttrProperty>::_M_realloc_append(...) +0xa4)
6  libtango.so.10.1(Tango::MultiClassAttribute::init_class_attribute(...) +0xba9)
7  libtango.so.10.1(Tango::DServer::init_device(...) +0x315)
8  libtango.so.10.1(Tango::DServerClass::device_factory(...) +0xb7)
9  libtango.so.10.1(Tango::DServerClass::DServerClass(...) +0x3b1)
10 libtango.so.10.1(Tango::DServerClass::init(...) +0x94)
11 libtango.so.10.1(Tango::Util::server_init(...) +0x3f)
12 lima2_tango(main+0xf4c)
13 libc.so.6(__libc_start_main+0xf3)
14 lima2_tango(+0x22c9e)

It seems like init_device is doing lots of things which perhaps are called too soon when you do it in the constructor and could segfault?

Edit: Reynald confirmed that calling init_device in the constructor and delete_device in the destructor is the normal thing to do.

https://gitlab.esrf.fr/limagroup/lima2/-/blob/develop/tango/include/lima/tango/control.inl#L27

For example:

attribute_lock_guard lock(this->get_device_attr()->get_attr_by_name("acq_state"));
this->push_change_event("acq_state", dev_state, 1, 0, true);

If get_device_attr() or get_attr_by_name(...) returns nullptr, attribute_lock_guard will dereference it and blow up.

Also the callback (registered with m_ctrl->register_on_state_change(...)) can fire from another thread at any time. If it runs before Tango finished setting up attributes, it will crash.

Edited by Wout De Nolf