8 years ago · 54925c291e
--- a/paper.tex
+++ b/paper.tex
@@ -116,12 +116,14 @@ developed a high-performance DMA engine based on Xilinx's PCIe Gen3 Core.To
 
				 process the data, we encapsulated the DMA setup and memory mapping in a plugin
			
 
				 for our scalable GPU processing framework~\cite{vogelgesang2012ufo}. This
			
 
				 framework allows for an easy construction of streamed data processing on
			
 
				-heterogeneous multi-GPU systems. The framework is based on OpenCL,  and
			
 
				+heterogeneous multi-GPU systems. Because the framework is based on OpenCL,
			
 
				 integration with NVIDIA's CUDA functions for GPUDirect technology is not
			
 
				-possible. We therefore integrated direct FPGA-to-GPU communication into our
			
 
				-processing pipeline using AMD's DirectGMA technology. In this paper we report
			
 
				-the performance of our DMA engine for FPGA-to-CPU communication and some
			
 
				-preliminary measurements about DirectGMA's performance in low-latency applications.
			
 
				+possible at the moment. Thus, we used AMD's DirectGMA technology to integrate
			
 
				+direct FPGA-to-GPU communication into our processing pipeline. In this paper we
			
 
				+report the performance of our DMA engine for FPGA-to-CPU communication and some
			
 
				+preliminary measurements about DirectGMA's performance in low-latency
			
 
				+applications.
			
 
				+
			
 
				 
			
 
				 \section{Architecture}
			
 
				 
			
@@ -143,7 +145,7 @@ they are not directly involved in the data transfer anymore.
 
				     In a traditional DMA architecture (a), data are first written to the main
			
 
				     system memory and then sent to the GPUs for final processing.  By using
			
 
				     GPUDirect/DirectGMA technology (b), the DMA engine has direct access to
			
 
				-    GPU's internal memory.
			
 
				+    the GPU's internal memory.
			
 
				   }
			
 
				   \label{fig:trad-vs-dgpu}
			
 
				 \end{figure}