nexus.nxobject: issues when saving 100 of thousands of files.
report
When saving hundred of frames looks like take more and more time to close external files.
Most of the time is spend on closes
from files.py
from the profiling of splitter_profilling.prof which is only handling the first 50000 frames. And this part keep increasing with time...
Notes
file can be browsed with snakeviz
splitter_profilling.prof
code involved
looks like most of the execution time is done on closing the file from:
def from_data_url_to_virtual_source(url: DataUrl) -> tuple:
"""
:param DataUrl url: url to be converted to a virtual source. It must target a 2D detector
:return: (h5py.VirtualSource, tuple(shape of the virtual source), numpy.drype: type of the dataset associated with the virtual source)
:rtype: tuple
"""
if not isinstance(url, DataUrl):
raise TypeError(
f"url is expected to be an instance of DataUrl and not {type(url)}"
)
with HDF5File(url.file_path(), mode="r") as o_h5s:
original_data_shape = o_h5s[url.data_path()].shape
data_type = o_h5s[url.data_path()].dtype
if len(original_data_shape) == 2:
original_data_shape = (
1,
original_data_shape[0],
original_data_shape[1],
)
vs_shape = original_data_shape
if url.data_slice() is not None:
vs_shape = (
url.data_slice().stop - url.data_slice().start,
original_data_shape[-2],
original_data_shape[-1],
)
vs = h5py.VirtualSource(
url.file_path(), url.data_path(), shape=vs_shape, dtype=data_type
)
if url.data_slice() is not None:
vs.sel = selection.select(original_data_shape, url.data_slice())
return vs, vs_shape, data_type
It should be simple to avoid opening and closing the file.
Some time can also be spend on the data_type = o_h5s[url.data_path()].dtype
that we should also be able to skip I think.
Edited by payno