|
|
## "bamboo_hercules" on scisoft13 (19/05/2021 nabu 2021.1.1-dev)
|
|
|
|
|
|
Reconstruction took 22 minutes (2160 slices of 4120x4120 from 4400 projections).
|
|
|
|
|
|
The `LimitedMemory` pipeline was used.
|
|
|
- read: 20 to 80 secs
|
|
|
- process 8 groups (flat field, phase, unsharp, log): 16 secs
|
|
|
- build sinos: 7 secs
|
|
|
- reconstruct 6 stacks: 52 secs
|
|
|
- Write: 112 secs
|
|
|
|
|
|
Here I/O take 75% to 80% of the time, although write performances are poor.
|
|
|
|
|
|
[bamboo_hercules.log](uploads/c423f10c8c3ae876f3095fa1bb6c48ac/bamboo_hercules.log)
|
|
|
|
|
|
## "stylo" on scisoft13 (10/05/2021 nabu 2021.1.0-rc3)
|
|
|
|
|
|
Reconstruction took 1h and 03 mins.
|
|
|
|
|
|
Performance for reading data subvolumes vary quite a lot (from 0.016 GB/s to 1.5 GB/s !)
|
|
|
|
|
|
Typical figures on a chunk (size = 100):
|
|
|
|
|
|
- read data: 13 secs
|
|
|
- flat, dff: instant
|
|
|
- phase: 2 sec
|
|
|
- log: 2sec
|
|
|
- rec: 8s
|
|
|
- save: 7s
|
|
|
|
|
|
|
|
|
[stylo_sci13_10-05-21.txt](uploads/2caf47c4f8f16782ad2bb9d47541b48c/stylo_sci13_10-05-21.txt)
|
|
|
|
|
|
## "Big" with scisoft14 (01/08/2020)
|
|
|
|
|
|
See [these comments on MR 60](https://gitlab.esrf.fr/tomotools/nabu/-/merge_requests/60#note_85135).
|
|
|
|
|
|
Reconstruction took 1h30 with 90% of time being spent in I/O.
|
|
|
|
|
|
|
|
|
|
|
|
## "Big" with scisoft13 (04/08/2020)
|
|
|
|
|
|
**Warning**: provide `--cpu_mem_fraction 0.5`, otherwise recurring `OSError: Cannot allocate memory` errors occur!
|
|
|
It does not seem linked to a memory leak: CPU memory usage is remarkably stable (never exceeds 72.8 GB in this case).
|
|
|
|
|
|
Disk speed: read 330 MB/S, write 180 MB/S
|
|
|
Memory: 126 GB (72.8 GB used at peak)
|
|
|
GPU: GTX 1080 Ti (11 GB), used 4.5 GB at peak
|
|
|
|
|
|
Reconstruction took 38 minutes.
|
|
|
|
|
|
**Timings for one chunk**
|
|
|
- Read 324 slices: 50 secs (330 MB/s)
|
|
|
- Process one group of radios: 1.5 sec
|
|
|
- Process all the 17 groups (with GPU I/O): 25 secs
|
|
|
- Build sinos (on host): 25 secs
|
|
|
- Reconstruction + GPU->CPU of one stack: 7 secs (`(20, 4524, 4524)`)
|
|
|
- Reconstruction + GPU->GPU of all 11 stacks: 77 secs
|
|
|
- Save: 60 secs (300 MB/s)
|
|
|
|
|
|
This means that disk I/O accounts for almost 50% of total time (GPU IO is not counted as I/O here).
|
|
|
|
|
|
**Notes**
|
|
|
- Reconstruction could not be completed entirely, as disk `/tmp` was full. Fortunately the "out of disk space" error happened only for writing the last subvolume `(1928, 2160)`
|
|
|
- Writing the reconstructions is interleaved with reading the next chunk, so the metrics might be biased in favor of write speed rather than read speed
|
|
|
|
|
|
[Reconstruction log: "big" on scisoft13](uploads/fdb42588e455bb46815ba2219e969bac/scisoft13_big.log)
|
|
|
|
|
|
|
|
|
## "Big" with scisoft15 (04/08/2020)
|
|
|
|
|
|
Reconstruction took 25 mins.
|
|
|
|
|
|
**Warning**: it must be run with `--gpu_mem_fraction 0.7`, otherwise GPU memory allocation does not happend (it hangs forever), certainly because the memory block is too big.
|
|
|
|
|
|
Almost all the volume is handled in one iteration of `CudaFullFieldPipelineLimitedMemory` (first subvolume is `(0, 1952)`).
|
|
|
|
|
|
**Timings for one chunk**
|
|
|
- Read 1952 lines: 4 mins 15 secs (392 MB/s)
|
|
|
- Process one group of radios: 3 secs
|
|
|
- Process all the 25 groups (with GPU I/O): 90 secs
|
|
|
- Build sinos (on host): 2 mins 30 secs
|
|
|
- Reconstruction + GPU->CPU of one stack: 20.5 secs
|
|
|
- Reconstruction + GPU->GPU of all 24 stacks: 8 mins 11 secs
|
|
|
- Save: 2 mins 30 secs (1.0 GB/s)
|
|
|
|
|
|
Here disk I/O accounts for 36% of total reconstruction time.
|
|
|
|
|
|
|
|
|
**Notes**
|
|
|
- Releasing memory is very slow on this machine: almost 2 minutes to destroy the first pipeline instance.
|
|
|
|
|
|
|
|
|
[Reconstruction logs for "big" on scisoft15](uploads/5c26936cfab3408ea1bf86e01bd8f32b/scisoft15_big.log)
|
|
|
|
|
|
|
|
|
|
|
|
New run on 02/10/2020:
|
|
|
|
|
|
Reconstruction took 1 hour and 11 mins.
|
|
|
|
|
|
For 1102 lines:
|
|
|
- Read: 5 mins 45secs
|
|
|
- Process One group of radios: 3.5 secs
|
|
|
- Process all 14 groups of radios: 47 secs
|
|
|
- Build sinos: 1min 11 secs
|
|
|
- Reconstruct (+ H2D / D2H) one chunk (80 sinos): 21 secs
|
|
|
- All 14 reconstructions: 4 mins 51 secs
|
|
|
- Histogram: 20 mins !
|
|
|
- Save: 40 secs
|
|
|
|
|
|
Histogram takes a lot of time !
|
|
|
|
|
|
[Reconstruction logs for "big" on scisoft15](uploads/0ff96eb5297f03847e44f5dd74119eda/nabu.log)
|
|
|
|
|
|
|
|
|
## "Big" with p9-04 (07/08/2020)
|
|
|
|
|
|
Setting is roughly the same as with `scisoft15`, but without SSD.
|
|
|
|
|
|
Reconstruction took 42 minutes.
|
|
|
|
|
|
**Timings for one chunk**:
|
|
|
- Reading 1302 lines : 12 mins 14secs (90 MB/s)
|
|
|
- Processing one radio groups: 612 radios of 1302*2560 (flat-field, phase, unsharp, log): 3.5 secs
|
|
|
- Processing all the 17 radios groups: 57 secs
|
|
|
- Build sinos: 2 mins 14 secs
|
|
|
- Reconstruct one sinos stack (with GPU I/O): 21 secs
|
|
|
- Reconstruct all 16 sinos stacks: 5 mins 36
|
|
|
- Save data (1250, 4524, 4524) : 2 mins 11 secs (780 MB/s)
|
|
|
|
|
|
Here I/O accounts for 62% of total time.
|
|
|
|
|
|
[Reconstruction logs for "big" on p9-04](uploads/0b540463543aa6228a3456f34a01471c/p9-04_big.log)
|
|
|
|
|
|
**When data is on /tmp (SSD) (28/08/2020):**
|
|
|
|
|
|
Reconstruction took 26 minutes.
|
|
|
|
|
|
Timing for one chunk:
|
|
|
|
|
|
- Read 1898 lines: 3mins45secs (430 MB/s)
|
|
|
- Process one group: 6 secs
|
|
|
- H2D: 3 secs
|
|
|
- FF: instant
|
|
|
- Phase: 2 secs
|
|
|
- Log instant
|
|
|
- D2H: 1 sec
|
|
|
- Processing all 19 groups: 1 min 46 secs
|
|
|
- Build sinos: 2 mins 2 secs
|
|
|
- One stack of 1822/19 = 101 slices:
|
|
|
- H2D: 1s
|
|
|
- Reconstruction+D2H: 24 secs
|
|
|
- All 19 stacks: 7mins 52 secs
|
|
|
- Write data: 5 mins 18 secs (460 MB/s)
|
|
|
|
|
|
Here disk I/O accounts for 43% of total time.
|
|
|
|
|
|
[Reconstruction logs for "big" on p9-04 with data on local disk](uploads/d449083fadd1b39a901f4d7b085f78e1/big_rec_p904_tmp.log)
|
|
|
|
|
|
|
|
|
|
|
|
## "bamboo" with p9-04 (25/08/2020)
|
|
|
|
|
|
This was done to benchmark "reconstruction of a 2k*2k*2k volume".
|
|
|
Reconstruction took 4 minutes.
|
|
|
|
|
|
[p9-04_bamboo.log](uploads/ba5dc4996a90ad3f9c620c4bdca43c9b/p9-04_bamboo.log)
|
|
|
|
|
|
|
|
|
## "P1_P2_P3" (12/11/2020)
|
|
|
|
|
|
This is a dataset which cannot be made public.
|
|
|
Detector is 2048 * 1024, and there are 2999 projections.
|
|
|
|
|
|
The reconstruction was done on a power9 machine, which was able to ingest the data in only two chunks. Reconstruction took 5 minutes and 8 seconds.
|
|
|
Even in this near-idealistic case (small phase margin, many detector rows loaded at once):
|
|
|
|
|
|
- Reading takes 85% and 91% of the time.
|
|
|
- Pre-processing takes 2 seconds (!)
|
|
|
- Reconstruction takes 20 and 11 seconds
|
|
|
|
|
|
[Reconstruction log for P1_P2_P3](uploads/25c4cebc479898a12bd201c6990e559d/l.log)
|
|
|
|
|
|
When the data is in cache reconstruction takes 1 minute and 8 seconds
|
|
|
- IO still take 45% and 43% of the time
|
|
|
- Pre-processing: 2 seconds
|
|
|
- Reconstruction: 20 and 11 seconds
|
|
|
|
|
|
[Reconstruction log for P1_P2_P3](uploads/f4572d774d9d3713711f72b76abbd533/l2.txt) |