image/svg+xml
Programme Matter and Technologies High-speed links for online data processingwith distributed GPU computing Timo Dritschler, Matthias Vogelgesang Exemplary latency for a block size of 4MB
InfiniBand Kernel main memory tographics card InfiniBand InfiniBand InfiniBand graphics card tomain memory Kernel 0.74ms 0.74ms 0.49ms 0.49ms 0.74ms 0.74ms Default GPUDirect
GPU Computing Computing requirements for modern DAQ systems have increased significantly. Standard CPU computing is no longer able to provide the necessary computing powers. GPUs (Graphics Processing Units) have become accessible to general purpose computing and provide highly parallel, high-performance computing structures. Using GPUs in distri-buted computing systems can greatly increase processing performance. However, data distribution between the nodes of GPU computing clusters is a challenge.
UFO Framework
GPU
FPGA
The UFO framework provides means to use distributed GPU computing through simple programming mechanisms. •
Processing workflows are described as a graph of atomic tasks operating on input data. •
The framework detects subgraphs that can be efficiently parallelized and distributes those subgraphs onto multiple nodes as well as scheduling massively parallel tasks on all available GPUs. •
GPU and FPGA
The GPU RDMA mechanisms can also be used to connect GPUs with FPGAs. •
This allows for highly flexible FPGA-based DAQ hardware combined with the high performance computing GPUs. •
GPU RDMA
CPU
GPU
MEMORY
NETWORK
1 2 3
RDMA (Red) Conventional transfer (Green) RDMA transfer
Conventional data transfer from network or external devices require many copy operations. •
RDMA (Remote Direct Memory Access) allows direct access to GPU memory from network or other external devices. •
This decreases transfer-latency significantly! •
1
2
3
4
5
6
2
4
6
Linearscalability
NumberofGPUs
Speedup
fbp
dfi
art
10
5
10
6
10
7
10
8
10
9
2000
4000
6000
Datasize(B)
Throughput(MB/s)
GPU
CPU
Algorithms
FPGA to GPU Memory FPGA to System Memory