image/svg+xml Programme Matter and Technologies High-speed links for online data processingwith distributed GPU computing Timo Dritschler, Matthias Vogelgesang Exemplary latency for a block size of 4MB InfiniBand Kernel main memory tographics card InfiniBand InfiniBand InfiniBand graphics card tomain memory Kernel 0.74ms 0.74ms 0.49ms 0.49ms 0.74ms 0.74ms Default GPUDirect GPU Computing Computing requirements for modern DAQ systems have increased significantly. Standard CPU computing is no longer able to provide the necessary computing powers. GPUs (Graphics Processing Units) have become accessible to general purpose computing and provide highly parallel, high-performance computing structures. Using GPUs in distri-buted computing systems can greatly increase processing performance. However, data distribution between the nodes of GPU computing clusters is a challenge. UFO Framework GPU FPGA The UFO framework provides means to use distributed GPU computing through simple programming mechanisms. • Processing workflows are described as a graph of atomic tasks operating on input data. • The framework detects subgraphs that can be efficiently parallelized and distributes those subgraphs onto multiple nodes as well as scheduling massively parallel tasks on all available GPUs. • GPU and FPGA The GPU RDMA mechanisms can also be used to connect GPUs with FPGAs. • This allows for highly flexible FPGA-based DAQ hardware combined with the high performance computing GPUs. • GPU RDMA CPU GPU MEMORY NETWORK 1 2 3 RDMA (Red) Conventional transfer (Green) RDMA transfer Conventional data transfer from network or external devices require many copy operations. • RDMA (Remote Direct Memory Access) allows direct access to GPU memory from network or other external devices. • This decreases transfer-latency significantly! • 1 2 3 4 5 6 2 4 6 Linearscalability NumberofGPUs Speedup fbp dfi art 10 5 10 6 10 7 10 8 10 9 2000 4000 6000 Datasize(B) Throughput(MB/s) GPU CPU Algorithms FPGA to GPU Memory FPGA to System Memory