|
@@ -1,14 +1,23 @@
|
|
|
- Current generation GPUs are capable of processing TFLOP/s which in turn makes
|
|
|
- application with large bandwidth requirements and simple algorithms
|
|
|
- I/O-bound. Applications that receive data from external data sources are hit
|
|
|
- twice because data first has to be transferred into system main memory before
|
|
|
- being moved to the GPU in a second transfer.
|
|
|
-
|
|
|
- To remedy this problem, we designed and implemented a system architecture
|
|
|
- comprising a custom FPGA board with a flexible DMA transfer policy and a
|
|
|
- heterogeneous compute framework receiving data using AMD's DirectGMA
|
|
|
- OpenCL extension.
|
|
|
-
|
|
|
- With our proposed system architecture we are able to sustain the bandwidth
|
|
|
- requirements of various applications such as real-time tomographic image
|
|
|
- reconstruction and signal analysis with a peak FPGA-GPU throughput of XXX GB/s.
|
|
|
+Motivation/Problem
|
|
|
+
|
|
|
+Current generation GPUs are capable of processing several TFLOP/s which causes
|
|
|
+I/O bottlenecks in applications with large bandwidth and low computational
|
|
|
+requirements. Moreover, applications that process data from external sources
|
|
|
+such as a frontend FPGA are affected twice by this problem because data first
|
|
|
+has to be transferred into main system memory via CPU transfers before being
|
|
|
+moved to the GPU for final operation in a second transfer.
|
|
|
+
|
|
|
+Method/solution
|
|
|
+
|
|
|
+To remedy this problem, we designed and implemented a system architecture
|
|
|
+comprising a custom FPGA board with a flexible DMA transfer policy and a
|
|
|
+heterogeneous compute framework receiving data using AMD's DirectGMA
|
|
|
+OpenCL extension.
|
|
|
+
|
|
|
+Results
|
|
|
+
|
|
|
+Conclusion
|
|
|
+
|
|
|
+With our proposed system architecture we are able to sustain the bandwidth
|
|
|
+requirements of various applications such as real-time tomographic image
|
|
|
+reconstruction and signal analysis with a peak FPGA-GPU throughput of XXX GB/s.
|