|
@@ -0,0 +1,14 @@
|
|
|
+ Current generation GPUs are capable of processing TFLOP/s which in turn makes
|
|
|
+ application with large bandwidth requirements and simple algorithms
|
|
|
+ I/O-bound. Applications that receive data from external data sources are hit
|
|
|
+ twice because data first has to be transferred into system main memory before
|
|
|
+ being moved to the GPU in a second transfer.
|
|
|
+
|
|
|
+ To remedy this problem, we designed and implemented a system architecture
|
|
|
+ comprising a custom FPGA board with a flexible DMA transfer policy and a
|
|
|
+ heterogeneous compute framework receiving data using AMD's DirectGMA
|
|
|
+ OpenCL extension.
|
|
|
+
|
|
|
+ With our proposed system architecture we are able to sustain the bandwidth
|
|
|
+ requirements of various applications such as real-time tomographic image
|
|
|
+ reconstruction and signal analysis with a peak FPGA-GPU throughput of XXX GB/s.
|