|
@@ -116,12 +116,14 @@ developed a high-performance DMA engine based on Xilinx's PCIe Gen3 Core.To
|
|
|
process the data, we encapsulated the DMA setup and memory mapping in a plugin
|
|
|
for our scalable GPU processing framework~\cite{vogelgesang2012ufo}. This
|
|
|
framework allows for an easy construction of streamed data processing on
|
|
|
-heterogeneous multi-GPU systems. The framework is based on OpenCL, and
|
|
|
+heterogeneous multi-GPU systems. Because the framework is based on OpenCL,
|
|
|
integration with NVIDIA's CUDA functions for GPUDirect technology is not
|
|
|
-possible. We therefore integrated direct FPGA-to-GPU communication into our
|
|
|
-processing pipeline using AMD's DirectGMA technology. In this paper we report
|
|
|
-the performance of our DMA engine for FPGA-to-CPU communication and some
|
|
|
-preliminary measurements about DirectGMA's performance in low-latency applications.
|
|
|
+possible at the moment. Thus, we used AMD's DirectGMA technology to integrate
|
|
|
+direct FPGA-to-GPU communication into our processing pipeline. In this paper we
|
|
|
+report the performance of our DMA engine for FPGA-to-CPU communication and some
|
|
|
+preliminary measurements about DirectGMA's performance in low-latency
|
|
|
+applications.
|
|
|
+
|
|
|
|
|
|
\section{Architecture}
|
|
|
|
|
@@ -143,7 +145,7 @@ they are not directly involved in the data transfer anymore.
|
|
|
In a traditional DMA architecture (a), data are first written to the main
|
|
|
system memory and then sent to the GPUs for final processing. By using
|
|
|
GPUDirect/DirectGMA technology (b), the DMA engine has direct access to
|
|
|
- GPU's internal memory.
|
|
|
+ the GPU's internal memory.
|
|
|
}
|
|
|
\label{fig:trad-vs-dgpu}
|
|
|
\end{figure}
|