HDF5 file in nexus writer
@pithan @matias.guijarro @andy.gotz @sole
What is the final decision in terms of HDF5_USE_FILE_LOCKING
and holding the file open during scans?
Things to take into account:
-
Multiple scans can run in parallel and write to the same file
We need to write with
HDF5_USE_FILE_LOCKING=FALSE
. The scan writers run in the same python process but in a different greenlets with different h5py.File instances. I do have a lock pool (maps filename to a lock) which is shared by all scan writers of one bliss session. I acquire the lock when creating the file and modifying the NXroot attributes but nowhere else. Each scan will create/modify its own NXentry concurrently. I haven't seen conflicts/corruption of the file but I'm not sure theoretically. -
Not all filesystems support flock (like NFS if I understand correctly)
We need to write with
HDF5_USE_FILE_LOCKING=FALSE
(#373 (closed)). -
File access and corruption
Let me know if I got it wrong:
Nexus writer (mode=a) | External application | Issue |
---|---|---|
HDF5_USE_FILE_LOCKING | mode/HDF5_USE_FILE_LOCKING | |
TRUE | r/UNSET(TRUE) | Either may never be able to open the file |
TRUE | r/FALSE | None |
TRUE | a/UNSET(TRUE) | Either may never be able to open the file |
TRUE | a/FALSE | Either may raise exception. File might get corrupted. |
FALSE | r/UNSET(TRUE) | External application may raise exception |
FALSE | r/FALSE | External application may raise exception |
FALSE | a/UNSET(TRUE) | Either may raise exception. File might get corrupted. |
FALSE | a/FALSE | Either may raise exception. File might get corrupted. |
When we use file locking the data cannot be written when others are accessing the file unless they use well behaving software like the latest versions of silx and pymca. However the file never gets corrupted. If we don't use file locking, we can always write but external applications can corrupt the file when opening in append mode (only when actually writing are just by opening?).
-
File access hold/flush or open/close
If we set
HDF5_USE_FILE_LOCKING=FALSE
it is mainly a question of performance. If we setHDF5_USE_FILE_LOCKING=TRUE
then external applications cannot use the file when a scan is running. -
Single-writer multiple reader
I could try to open the file in this mode and if it doesn't work, open in the normal mode. Is this useful? Some scary stuff
Conclusion (tell me if you object): HDF5_USE_FILE_LOCKING=FALSE
, don't use SWMR and hold/flush (every 5 seconds? configurable? depending on memory and data size?) and live with the danger of external applications trying to modify the file.