Commit 991a0ead authored by Pierre Paleo's avatar Pierre Paleo
Browse files

Merge branch 'flatfield' into 'master'

Flatfield

See merge request paleo/nabu!3
parents 250d2355 8db19e7e
.vscode/
*.geany
doc/_build/
# Created by https://www.gitignore.io/api/c,c++,python
### C ###
......
# Nabu concepts, definitions and notations
A tomography scan produces a collection of X-ray images. These can be of two types:
- Radios: raw intensity data acquired on the detector
- Flats, darks: for flat-field normalization
A **radiograph** (or **radio**) is the raw data at the output of a X-ray detector.
## Radios and sinograms
In this documentation, we distinguish between radios and sinograms.
As stated before, a radio is a raw projection data, i.e what comes out of the detector. "Corrected radios" refer to radios after some CCD corrections and flat-field normalization.
Each radio has *Nx* pixels horizontally, and *Nz* pixels vertically. During a scan, a collections of *Na* radios are acquired, where *Na* is the number of projection angles.
![radios](images/radios_stack.png)
In short, there are *Na* radios of *(Nx, Nz)* pixels.
In parallel geometry with circular trajectory, the sinogram 0 is obtained by extracting the first line of each radio. The sinogram number *z* is obtained by extracting the line *z* of each radio.
![sinos](images/sinos.png)
## Radios chunks
In Nabu, the radios are processed by chunks. A **chunk** is a collection of radios subregion: this is a fixed subregion taken in all the *Na* radios.
In the simplest setting, the first chunk (chunk zero) is obtained by reading the first *C* lines of each radio, i.e lines [0, *C*[. The second chunk (chunk 1) is obtained by reading lines [*C*, 2*C*[ ; and so on.
Chunks are actually read with a slight overlap (see next figure).
Processing radios by chunks enables to process sinograms by batches, as **a radios chunk of size *C* gives a stack of *C* sinograms (or slices)** once the chunk is "transposed".
![chunks with overlap](images/chunks.png)
## Processing done in the radios space
![Radios processing](images/ccd_processing.png)
The process can also be started from any point. For example, we can start from phase-retrieved radios, by deactivating steps 1-2-3.
## Processing done in the sinogram space
![Sinograms processing](images/sinos_processing.png)
The input is usually a chunk of radios, and the data will be transposed to be accessed as a stack of sinograms.
......@@ -10,6 +10,7 @@ Welcome to Nabu's documentation!
:maxdepth: 2
:caption: Contents:
definitions.md
nabu_config_file.md
validators.md
......
import numpy as np
from os.path import dirname
import pycuda.gpuarray as garray
from pycuda.compiler import SourceModule
from silx.opencl.utils import ConvolutionInfos
from silx.image.utils import gaussian_kernel
from ..utils import updiv, get_cuda_srcfile
from .processing import CudaProcessing
# Import useful stuff from silx.opencl.convolution
# Unfortunately, merely importing anything from a silx opencl module
# creates a context on all GPU devices (due to the OpenCL() singleton)
# so we use this hack to avoid this.
# The drawback is that to use silx actual OpenCL features, then silx should
# be imported first.
import os
env_silx_opencl = os.getenv("SILX_OPENCL")
os.environ["SILX_OPENCL"] = "0"
from silx.opencl.convolution import gaussian_kernel, ConvolutionInfos
if env_silx_opencl is not None:
os.environ["SILX_OPENCL"] = env_silx_opencl
else:
del os.environ["SILX_OPENCL"]
#
class Convolution(CudaProcessing):
"""
A class for performing convolution on GPU with CUDA, but with the following
......@@ -188,8 +175,11 @@ class Convolution(CudaProcessing):
)
# Compile source module
compile_options = [str("-DUSED_CONV_MODE=%d" % self._c_conv_mode)]
self.sourcemodule_kwargs["options"] = compile_options
fname = get_cuda_srcfile("convolution.cu")
nabu_cuda_dir = dirname(fname)
include_dirs = [nabu_cuda_dir]
self.sourcemodule_kwargs["options"] = compile_options
self.sourcemodule_kwargs["include_dirs"] = include_dirs
with open(fname) as fid:
cuda_src = fid.read()
self._module = SourceModule(
......
import numpy as np
from os.path import dirname
import pycuda.gpuarray as garray
from pycuda.compiler import SourceModule
from silx.opencl.utils import ConvolutionInfos
from silx.image.utils import gaussian_kernel
from ..utils import updiv, get_cuda_srcfile
from .processing import CudaProcessing
class MedianFilter(CudaProcessing):
"""
A class for performing median filter on GPU with CUDA
"""
def __init__(
self,
shape,
footprint=(3, 3),
mode="reflect",
threshold=None,
device_id=None,
ctx=None,
cleanup_at_exit=True,
):
"""Constructor of Cuda Median Filter.
Parameters
----------
shape: tuple
Shape of the array, in the format (n_rows, n_columns)
footprint: tuple
Size of the median filter, in the format (y, x).
mode: str
Boundary handling mode. Available modes are:
"reflect": cba|abcd|dcb
"nearest": aaa|abcd|ddd
"wrap": bcd|abcd|abc
"constant": 000|abcd|000
Default is "reflect".
threshold: float, optional
Threshold for the "thresholded median filter".
A thresholded median filter only replaces a pixel value by the median
if this pixel value is greater or equal than median + threshold.
Cuda Parameters
---------------
Please refer to the documentation of the CudaProcessing class for
the other parameters.
"""
super().__init__(device_id=device_id, ctx=ctx, cleanup_at_exit=cleanup_at_exit)
self._set_params(shape, footprint, mode, threshold)
self._init_arrays_to_none(["d_input", "d_output"])
self._init_kernels()
def _set_params(self, shape, footprint, mode, threshold):
self.data_ndim = len(shape)
if self.data_ndim == 2:
ny, nx = shape
nz = 1
elif self.data_ndim == 3:
nz, ny, nx = shape
else:
raise ValueError("Expected 2D or 3D data")
self.shape = shape
self.Nx = np.int32(nx)
self.Ny = np.int32(ny)
self.Nz = np.int32(nz)
if len(footprint) != 2:
raise ValueError("3D median filter is not implemented yet")
if not ((footprint[0] & 1) and (footprint[1] & 1)):
raise ValueError("Must have odd-sized footprint")
self.footprint = footprint
self._set_boundary_mode(mode)
self.do_threshold = False
if threshold is not None:
self.threshold = np.float32(threshold)
self.do_threshold = True
else:
self.threshold = np.float32(0)
def _set_boundary_mode(self, mode):
self.mode = mode
# Some code duplication from convolution
self._c_modes_mapping = {
"periodic": 2,
"wrap": 2,
"nearest": 1,
"replicate": 1,
"reflect": 0,
"constant": 3,
}
mp = self._c_modes_mapping
if self.mode.lower() not in mp:
raise ValueError(
"""
Mode %s is not available. Available modes are:
%s
"""
% (self.mode, str(mp.keys()))
)
if self.mode.lower() == "constant":
raise NotImplementedError("mode='constant' is not implemented yet")
self._c_conv_mode = mp[self.mode]
def _init_kernels(self):
# Compile source module
compile_options = [
"-DUSED_CONV_MODE=%d" % self._c_conv_mode,
"-DMEDFILT_X=%d" % self.footprint[1],
"-DMEDFILT_Y=%d" % self.footprint[0],
"-DDO_THRESHOLD=%d" % int(self.do_threshold),
]
fname = get_cuda_srcfile("medfilt.cu")
nabu_cuda_dir = dirname(fname)
include_dirs = [nabu_cuda_dir]
self.sourcemodule_kwargs = {}
self.sourcemodule_kwargs["options"] = compile_options
self.sourcemodule_kwargs["include_dirs"] = include_dirs
with open(fname) as fid:
cuda_src = fid.read()
self._module = SourceModule(cuda_src, **self.sourcemodule_kwargs)
self.cuda_kernel_2d = self._module.get_function("medfilt2d")
# Blocks, grid
self._block_size = {2: (32, 32, 1), 3: (16, 8, 8)}[self.data_ndim] # TODO tune
self._n_blocks = tuple(
[updiv(a, b) for a, b in zip(self.shape[::-1], self._block_size)]
)
def medfilt2(self, image, output=None):
"""
Perform a median filter on an image (or batch of images).
Parameters
-----------
images: numpy.ndarray or pycuda.gpuarray
2D image or 3D stack of 2D images
output: numpy.ndarray or pycuda.gpuarray, optional
Output of filtering. If provided, it must have the same shape
as the input array.
"""
self._set_array("d_input", image, self.shape)
if output is not None:
self._set_array("d_output", output, self.shape)
else:
self._allocate_array("d_output", self.shape)
self.cuda_kernel_2d(
self.d_input.gpudata,
self.d_output.gpudata,
self.Nx,
self.Ny,
self.Nz,
self.threshold,
grid=self._n_blocks,
block=self._block_size,
)
self._recover_arrays_references(["d_input", "d_output"])
if output is None:
return self.d_output.get()
else:
return output
import numpy as np
import pycuda.driver as cuda
import pycuda.gpuarray as garray
from .utils import get_cuda_context
dev_attrs = cuda.device_attribute
# TODO add logger ? inherit from Processing class with logger ?
# NB: we must detach from a context before creating another context
class CudaProcessing(object):
def __init__(self, device_id=None, cleanup_at_exit=True):
self.ctx = get_cuda_context(
device_id=device_id,
cleanup_at_exit=cleanup_at_exit
)
def __init__(self, device_id=None, ctx=None, cleanup_at_exit=True):
"""
Initialie a CudaProcessing instance.
CudaProcessing is a base class for all CUDA-based processings.
This class provides utilities for context/device management, and
arrays allocation.
Parameters
----------
device_id: int, optional
ID of the cuda device to use (those of the `nvidia-smi` command).
Ignored if ctx is not None.
ctx: pycuda.driver.Context, optional
Existing context to use. If provided, do not create a new context.
cleanup_at_exit: bool, optional
Whether to clean-up the context at exit.
Ignored if ctx is not None.
"""
if ctx is None:
self.ctx = get_cuda_context(
device_id=device_id,
cleanup_at_exit=cleanup_at_exit
)
else:
self.ctx = ctx
self.device = self.ctx.get_device()
self.device_name = self.device.name()
self.device_id = self.device.get_attribute(dev_attrs.MULTI_GPU_BOARD_GROUP_ID)
......@@ -19,6 +43,42 @@ class CudaProcessing(object):
self.ctx.push()
return self.ctx
def pop_context(self):
self.ctx.pop()
def _init_arrays_to_none(self, arrays_names):
self._allocated = {}
for array_name in arrays_names:
setattr(self, array_name, None)
setattr(self, "_old_" + array_name, None)
self._allocated[array_name] = False
def _recover_arrays_references(self, arrays_names):
for array_name in arrays_names:
old_arr = getattr(self, "_old_" + array_name)
if old_arr is not None:
setattr(self, array_name, old_arr)
def _allocate_array(self, array_name, shape, dtype=np.float32):
if not(self._allocated[array_name]):
new_gpuarr = garray.zeros(shape, dtype=dtype)
setattr(self, array_name, new_gpuarr)
self._allocated[array_name] = True
def _set_array(self, array_name, array_ref, shape, dtype=np.float32):
if isinstance(array_ref, garray.GPUArray):
current_arr = getattr(self, array_name)
setattr(self, "_old_" + array_name, current_arr)
setattr(self, array_name, array_ref)
elif isinstance(array_ref, np.ndarray):
self._allocate_array(array_name, shape, dtype=dtype)
getattr(self, array_name).set(array_ref)
else:
raise ValueError("Expected numpy array or pycuda array")
......@@ -60,3 +60,29 @@ __global__ void inplace_complexreal_mul_2Dby2D(complex* arr2D_out, float* arr2D_
arr2D_out[i]._M_re *= b;
arr2D_out[i]._M_im *= b;
}
#ifndef DO_CLIP_MIN
#define DO_CLIP_MIN 0
#endif
#ifndef DO_CLIP_MAX
#define DO_CLIP_MAX 0
#endif
// arr = -log(arr)
__global__ void nlog(float* array, float* output, int Nx, int Ny, int Nz, float clip_min, float clip_max) {
int x = blockDim.x * blockIdx.x + threadIdx.x;
int y = blockDim.y * blockIdx.y + threadIdx.y;
int z = blockDim.z * blockIdx.z + threadIdx.z;
if ((x >= Nx) || (y >= Ny) || (z >= Nz)) return;
int pos = (z*Ny + y)*Nx + x;
float val = array[pos];
#if DO_CLIP_MIN
val = fmaxf(val, clip_min);
#endif
#if DO_CLIP_MAX
val = fminf(val, clip_max);
#endif
output[pos] = -logf(val);
}
#ifndef BOUNDARY_H
#define BOUNDARY_H
// Get the center index of the filter,
// and the "half-Left" and "half-Right" lengths.
// In the case of an even-sized filter, the center is shifted to the left.
#define GET_CENTER_HL(hlen){\
if (hlen & 1) {\
c = hlen/2;\
hL = c;\
hR = c;\
}\
else {\
c = hlen/2 - 1;\
hL = c;\
hR = c+1;\
}\
}\
// Boundary handling modes
#define CONV_MODE_REFLECT 0 // cba|abcd|dcb
#define CONV_MODE_NEAREST 1 // aaa|abcd|ddd
#define CONV_MODE_WRAP 2 // bcd|abcd|abc
#define CONV_MODE_CONSTANT 3 // 000|abcd|000
#ifndef USED_CONV_MODE
#define USED_CONV_MODE CONV_MODE_NEAREST
#endif
#define CONV_PERIODIC_IDX_X int idx_x = gidx - c + jx; if (idx_x < 0) idx_x += Nx; if (idx_x >= Nx) idx_x -= Nx;
#define CONV_PERIODIC_IDX_Y int idx_y = gidy - c + jy; if (idx_y < 0) idx_y += Ny; if (idx_y >= Ny) idx_y -= Ny;
#define CONV_PERIODIC_IDX_Z int idx_z = gidz - c + jz; if (idx_z < 0) idx_z += Nz; if (idx_z >= Nz) idx_z -= Nz;
// clamp not in cuda
__device__ int clamp(int x, int min_, int max_) {
return min(max(x, min_), max_);
}
#define CONV_NEAREST_IDX_X int idx_x = clamp((int) (gidx - c + jx), 0, Nx-1);
#define CONV_NEAREST_IDX_Y int idx_y = clamp((int) (gidy - c + jy), 0, Ny-1);
#define CONV_NEAREST_IDX_Z int idx_z = clamp((int) (gidz - c + jz), 0, Nz-1);
#define CONV_REFLECT_IDX_X int idx_x = gidx - c + jx; if (idx_x < 0) idx_x = -idx_x-1; if (idx_x >= Nx) idx_x = Nx-(idx_x-(Nx-1));
#define CONV_REFLECT_IDX_Y int idx_y = gidy - c + jy; if (idx_y < 0) idx_y = -idx_y-1; if (idx_y >= Ny) idx_y = Ny-(idx_y-(Ny-1));
#define CONV_REFLECT_IDX_Z int idx_z = gidz - c + jz; if (idx_z < 0) idx_z = -idx_z-1; if (idx_z >= Nz) idx_z = Nz-(idx_z-(Nz-1));
#if USED_CONV_MODE == CONV_MODE_REFLECT
#define CONV_IDX_X CONV_REFLECT_IDX_X
#define CONV_IDX_Y CONV_REFLECT_IDX_Y
#define CONV_IDX_Z CONV_REFLECT_IDX_Z
#elif USED_CONV_MODE == CONV_MODE_NEAREST
#define CONV_IDX_X CONV_NEAREST_IDX_X
#define CONV_IDX_Y CONV_NEAREST_IDX_Y
#define CONV_IDX_Z CONV_NEAREST_IDX_Z
#elif USED_CONV_MODE == CONV_MODE_WRAP
#define CONV_IDX_X CONV_PERIODIC_IDX_X
#define CONV_IDX_Y CONV_PERIODIC_IDX_Y
#define CONV_IDX_Z CONV_PERIODIC_IDX_Z
#elif USED_CONV_MODE == CONV_MODE_CONSTANT
#error "constant not implemented yet"
#else
#error "Unknown convolution mode"
#endif
// Image access patterns
#define READ_IMAGE_1D_X input[(gidz*Ny + gidy)*Nx + idx_x]
#define READ_IMAGE_1D_Y input[(gidz*Ny + idx_y)*Nx + gidx]
#define READ_IMAGE_1D_Z input[(idx_z*Ny + gidy)*Nx + gidx]
#define READ_IMAGE_2D_XY input[(gidz*Ny + idx_y)*Nx + idx_x]
#define READ_IMAGE_2D_XZ input[(idx_z*Ny + gidy)*Nx + idx_x]
#define READ_IMAGE_2D_YZ input[(idx_z*Ny + idx_y)*Nx + gidx]
#define READ_IMAGE_3D_XYZ input[(idx_z*Ny + idx_y)*Nx + idx_x]
#endif
......@@ -4,81 +4,8 @@
*
*/
/******************************************************************************/
/**************************** Macros ******************************************/
/******************************************************************************/
#include "boundary.h"
// Get the center index of the filter,
// and the "half-Left" and "half-Right" lengths.
// In the case of an even-sized filter, the center is shifted to the left.
#define GET_CENTER_HL(hlen){\
if (hlen & 1) {\
c = hlen/2;\
hL = c;\
hR = c;\
}\
else {\
c = hlen/2 - 1;\
hL = c;\
hR = c+1;\
}\
}\
// Boundary handling modes
#define CONV_MODE_REFLECT 0 // cba|abcd|dcb
#define CONV_MODE_NEAREST 1 // aaa|abcd|ddd
#define CONV_MODE_WRAP 2 // bcd|abcd|abc
#define CONV_MODE_CONSTANT 3 // 000|abcd|000
#ifndef USED_CONV_MODE
#define USED_CONV_MODE CONV_MODE_NEAREST
#endif
#define CONV_PERIODIC_IDX_X int idx_x = gidx - c + jx; if (idx_x < 0) idx_x += Nx; if (idx_x >= Nx) idx_x -= Nx;
#define CONV_PERIODIC_IDX_Y int idx_y = gidy - c + jy; if (idx_y < 0) idx_y += Ny; if (idx_y >= Ny) idx_y -= Ny;
#define CONV_PERIODIC_IDX_Z int idx_z = gidz - c + jz; if (idx_z < 0) idx_z += Nz; if (idx_z >= Nz) idx_z -= Nz;
#define CONV_NEAREST_IDX_X int idx_x = clamp((int) (gidx - c + jx), 0, Nx-1);
#define CONV_NEAREST_IDX_Y int idx_y = clamp((int) (gidy - c + jy), 0, Ny-1);
#define CONV_NEAREST_IDX_Z int idx_z = clamp((int) (gidz - c + jz), 0, Nz-1);
#define CONV_REFLECT_IDX_X int idx_x = gidx - c + jx; if (idx_x < 0) idx_x = -idx_x-1; if (idx_x >= Nx) idx_x = Nx-(idx_x-(Nx-1));
#define CONV_REFLECT_IDX_Y int idx_y = gidy - c + jy; if (idx_y < 0) idx_y = -idx_y-1; if (idx_y >= Ny) idx_y = Ny-(idx_y-(Ny-1));
#define CONV_REFLECT_IDX_Z int idx_z = gidz - c + jz; if (idx_z < 0) idx_z = -idx_z-1; if (idx_z >= Nz) idx_z = Nz-(idx_z-(Nz-1));
#if USED_CONV_MODE == CONV_MODE_REFLECT
#define CONV_IDX_X CONV_REFLECT_IDX_X
#define CONV_IDX_Y CONV_REFLECT_IDX_Y
#define CONV_IDX_Z CONV_REFLECT_IDX_Z
#elif USED_CONV_MODE == CONV_MODE_NEAREST
#define CONV_IDX_X CONV_NEAREST_IDX_X
#define CONV_IDX_Y CONV_NEAREST_IDX_Y
#define CONV_IDX_Z CONV_NEAREST_IDX_Z
#elif USED_CONV_MODE == CONV_MODE_WRAP
#define CONV_IDX_X CONV_PERIODIC_IDX_X
#define CONV_IDX_Y CONV_PERIODIC_IDX_Y
#define CONV_IDX_Z CONV_PERIODIC_IDX_Z
#elif USED_CONV_MODE == CONV_MODE_CONSTANT
#error "constant not implemented yet"
#else
#error "Unknown convolution mode"
#endif
// Image access patterns
#define READ_IMAGE_1D_X input[(gidz*Ny + gidy)*Nx + idx_x]
#define READ_IMAGE_1D_Y input[(gidz*Ny + idx_y)*Nx + gidx]
#define READ_IMAGE_1D_Z input[(idx_z*Ny + gidy)*Nx + gidx]
#define READ_IMAGE_2D_XY input[(gidz*Ny + idx_y)*Nx + idx_x]
#define READ_IMAGE_2D_XZ input[(idx_z*Ny + gidy)*Nx + idx_x]
#define READ_IMAGE_2D_YZ input[(idx_z*Ny + idx_y)*Nx + gidx]
#define READ_IMAGE_3D_XYZ input[(idx_z*Ny + idx_y)*Nx + idx_x]
// Misc.
typedef unsigned int uint;
......
#ifndef N_FLATS
#error "Please provide the N_FLATS variable"
#endif
#ifndef N_DARKS
#error "Please provide the N_FLATS variable"
#endif
/**
* In-place flat-field normalization.
* This kernel assumes that all the radios are loaded into memory
* (although not necessarily the full radios images)
* and in radios[x, y z], z in the radio index (so it does not handle "missing
* radios")
*
* radios: 3D array
* flats: 3D array
* darks: 3D array
* Nx: number of pixel horizontally in the radios
* Nx: number of pixel vertically in the radios
* Nx: number of radios
* flats_indices: indices of flats, in sorted order
* darks_indices: indices of darks, in sorted order
**/
__global__ void flatfield_normalization(
float* radios,
float* flats,
float* darks,
int Nx,
int Ny,
int Nz,
int* flats_indices,
int* darks_indices
) {
int x = blockDim.x * blockIdx.x + threadIdx.x;
int y = blockDim.y * blockIdx.y + threadIdx.y;
int z = blockDim.z * blockIdx.z + threadIdx.z;
if ((x >= Nx) || (y >= Ny) || (z >= Nz)) return;
int pos = (z*Ny+y)*Nx + x;
float dark_val = 0.0f, flat_val = 1.0f;
#if N_FLATS == 1
flat_val = flats[y*Nx + x];
#else
// interpolation between 2 flats
for (int i = 0; i < N_FLATS-1; i++) {
int ind_prev = flats_indices[i];
int ind_next = flats_indices[i+1];
if (ind_prev == z) {
flat_val = flats[(i*Ny+y)*Nx + x];