Modularized GPU pipelines (split across multiple TBB nodes)

Currently the GPU code in rather monolithic, including computation that are not related, e.g. pedestal correction and FAI. And now reconstruction.

We should be allow composition of GPU pipelines in a more modular way:

graph LR;
  transfer_to_device_node --> reconstruction_node ;
  subgraph GPU
  reconstruction_node -- Event -->gain_pedestal_correction;
  gain_pedestal_correction -- Event --> fai_node;
  end
  gain_pedestal_correction --> transfer_to_host_node;
  transfer_to_host_node --> io_hdf5_node;

The nodes would be constructed with the appropriate resource (context, queues and buffers). Computing resources need to be handled separately as discussed in #141.

The data transmitted will be a boost::compute::event, to be discussed.