Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • integrator integrator
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 5
    • Issues 5
    • List
    • Boards
    • Service Desk
    • Milestones
  • Jira
    • Jira
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • tomotools
  • integratorintegrator
  • Issues
  • #10

Closed
Open
Created Dec 13, 2021 by Pierre Paleo@paleoOwner

SKIP option does not immediately quit the process

The master file should be checked before launching the integration.

(2021.1) slurm-nice-devel2904:ihma109/id15/test_dec2021 % integrate-slurm integrator.conf
/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/dask_jobqueue/core.py:20: FutureWarning: tmpfile is deprecated and will be removed in a future release. Please use dask.utils.tmpfile instead.
  from distributed.utils import tmpfile
Will process the following datasets:
[ID15Dataset(fname=/data/id15/inhouse4/ihma109/id15/ACL9011001b/ACL9011001b_0002/ACL9011001b_0002.h5, entry=1.1),
 ID15Dataset(fname=/data/id15/inhouse4/ihma109/id15/ACL9011001b/ACL9011001b_0003/ACL9011001b_0003.h5, entry=1.1)]
Spawning workers
Spawning integrators
New dataset /data/id15/inhouse4/ihma109/id15/ACL9011001b/ACL9011001b_0002/ACL9011001b_0002.h5
Will process dataset: /data/id15/inhouse4/ihma109/id15/ACL9011001b/ACL9011001b_0002/ACL9011001b_0002.h5 into output file: /data/id15/inhouse4/ihma109/id15/test_dec2021/ACL9011001b/ACL9011001b_0002/azint_ACL9011001b_0002_scan0001.h5
4/125 - ETA 606 secs
6/125 - ETA 597 secs
Traceback (most recent call last):
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/integrator/app/integrate_slurm.py", line 105, in integrate_slurm_cli
    wait(futures, timeout=healthcheck_period)
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/distributed/client.py", line 4329, in wait
    result = client.sync(_wait, fs, timeout=timeout, return_when=return_when)
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/distributed/client.py", line 865, in sync
    return sync(
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/distributed/utils.py", line 327, in sync
    raise exc.with_traceback(tb)
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/distributed/utils.py", line 310, in f
    result[0] = yield future
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/distributed/client.py", line 4300, in _wait
    await future
  File "/usr/lib/python3.8/asyncio/tasks.py", line 501, in wait_for
    raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/bin/integrate-slurm", line 8, in <module>
    sys.exit(integrate_slurm_cli())
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/integrator/app/integrate_slurm.py", line 109, in integrate_slurm_cli
    DI.get_eta()
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/integrator/distributed_integration.py", line 570, in get_eta
    eta = (len(self._tasks) - n_finished)/speed
ZeroDivisionError: float division by zero
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7fe26a3103a0>>, <Task finished name='Task-1445' coro=<Cluster._sync_cluster_info() done, defined at /scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/distributed/deploy/cluster.py:104> exception=CommClosedError("Exception while trying to call remote method 'set_metadata' before comm was established.")>)
Traceback (most recent call last):
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/distributed/comm/tcp.py", line 205, in read
    frames_nbytes = await stream.read_bytes(fmt_size)
tornado.iostream.StreamClosedError: Stream is closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/distributed/core.py", line 787, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/distributed/core.py", line 640, in send_recv
    response = await comm.read(deserializers=deserializers)
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/distributed/comm/tcp.py", line 221, in read
    convert_stream_closed_error(self, e)
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/distributed/comm/tcp.py", line 128, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc}") from exc
distributed.comm.core.CommClosedError: in <TCP (closed) rpc.set_metadata local=tcp://160.103.228.95:46068 remote=tcp://160.103.228.95:36087>: Stream is closed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/distributed/deploy/cluster.py", line 105, in _sync_cluster_info
    await self.scheduler_comm.set_metadata(
  File "/scisoft/tomotools_env/integrator/ubuntu20.04/x86_64/2021.1/lib/python3.8/site-packages/distributed/core.py", line 790, in send_recv_from_rpc
    raise type(e)(
distributed.comm.core.CommClosedError: Exception while trying to call remote method 'set_metadata' before comm was established.
Assignee
Assign to
Time tracking