Browse Source

Fixed Abstract

lorenzo 8 years ago
parent
commit
9898382e98
1 changed files with 16 additions and 26 deletions
  1. 16 26
      paper.tex

+ 16 - 26
paper.tex

@@ -44,11 +44,13 @@ architecture consists of a   Direct Memory Access (DMA) engine compatible with
 the Xilinx PCI-Express core,   a Linux driver for register access, and high-
 level software to manage direct   memory transfers using AMD's DirectGMA
 technology. Measurements with a Gen3\,x8 link show a throughput of 6.4~GB/s
-for transfers to GPU memory and 6.6~GB/s to system memory.  We
-also evaluated DirectGMA performance for low latency applications: preliminary measurements show a round-trip latency of 2 \textmu s for data transfers up to 4 kB. However, the latency introduced by the OpenCL scheduling is in the order of 100 \textmu s. 
-Our implementation is suitable for real- time DAQ system applications ranging
-from photon science and medical imaging to High Energy Physics (HEP) trigger
-systems. }
+for transfers to GPU memory and 6.6~GB/s to system memory.  We also assesed
+the possibility of using DirectGMA in low latency systems: preliminary
+measurements show a latency as low as 1 \textmu s for data transfers to GPU
+memory. The additional latency introduced by OpenCL scheduling is the current
+performance bottleneck.  Our implementation is suitable for real- time DAQ
+system applications ranging from photon science and medical imaging to High
+Energy Physics (HEP) systems.}
 
 \keywords{FPGA; GPU; PCI-Express; OpenCL; DirectGMA}
 
@@ -75,11 +77,12 @@ continuous streaming mode to a computing stage. In order to collect data over
 long observation times, the readout architecture and the computing stages must
 be able to sustain high data rates.
 
-Recent years have also seen an increasing
-interest in GPU-based systems for High Energy Physics (HEP)  (\emph{e.g.}
-ATLAS~\cite{atlas_gpu}, ALICE~\cite{alice_gpu}, Mu3e~\cite{mu3e_gpu},
-PANDA~\cite{panda_gpu}) and photon science experiments. In time-deterministic
-applications, latency becomes the most stringent requirement for , \emph{e.g.} in Low/High-level trigger systems.  
+Recent years have also seen an increasing interest in GPU-based systems for
+High Energy Physics (HEP)  (\emph{e.g.} ATLAS~\cite{atlas_gpu},
+ALICE~\cite{alice_gpu}, Mu3e~\cite{mu3e_gpu}, PANDA~\cite{panda_gpu}) and
+photon science experiments. In time-deterministic applications,\emph{e.g.} in
+Low/High-level trigger systems, latency becomes the most stringent
+requirement.
 
 Due to its high bandwidth and modularity, PCIe quickly became the commercial
 standard for connecting high-throughput peripherals such as GPUs or solid
@@ -96,6 +99,7 @@ s, respectively.
 
 %LR: FPGA^2 it's the name of their thing... 
 %MV: best idea in the world :)
+%LR: Let's call ours FPGA^2_GPU
 
 When the FPGA is used as a master, a higher throughput can be achieved.  An
 example of this approach is the \emph{FPGA\textsuperscript{2}} framework by Thoma
@@ -183,12 +187,6 @@ utilization on a Virtex 7 device is reported in Table~\ref{table:utilization}.
   LUTRAM   & 56    & (0.03)           \\
   FF       & 5437  & (0.63)           \\
   BRAM     & 21    & (1.39)           \\
-  % Resource & Utilization & Available & Utilization \% \\
-  %   \midrule
-  % LUT      & 5331        & 433200    & 1.23           \\
-  % LUTRAM   & 56          & 174200    & 0.03           \\
-  % FF       & 5437        & 866400    & 0.63           \\
-  % BRAM     & 20.50       & 1470      & 1.39           \\
     \bottomrule
   \end{tabular}
 }{%
@@ -198,16 +196,6 @@ utilization on a Virtex 7 device is reported in Table~\ref{table:utilization}.
 \end{floatrow}
 \end{figure}
 
-
-% \begin{figure}[tb]
-%   \centering
-%   \includegraphics[width=0.6\textwidth]{figures/fpga-arch}
-%   \caption{%
-%     Architecture of the DMA engine.
-%   }
-%   \label{fig:fpga-arch}
-% \end{figure}
-
 The physical addresses of the host's memory buffers are stored into an internal
 memory and are dynamically updated by the driver or user, allowing highly
 efficient zero-copy data transfers. The maximum size associated with each
@@ -287,6 +275,8 @@ fashion. A complementary application programming interface allows users to
 develop custom applications written in C or high-level languages such as
 Python.
 
+
+%% --------------------------------------------------------------------------
 \section{Results}
 
 We carried out performance measurements on two different setups, which are