|
@@ -162,6 +162,15 @@ friendly interfaces with the custom logic with an input bandwidth of 7.45
|
|
|
GB/s. The user logic and the DMA engine are configured by the host through PIO
|
|
|
registers.
|
|
|
|
|
|
+\begin{figure}[t]
|
|
|
+ \centering
|
|
|
+ \includegraphics[width=0.5\textwidth]{figures/fpga-arch}
|
|
|
+ \caption{%
|
|
|
+ FPGA AAA
|
|
|
+ }
|
|
|
+ \label{fig:fpga-arch}
|
|
|
+\end{figure}
|
|
|
+
|
|
|
The physical addresses of the host's memory buffers are stored into an internal
|
|
|
memory and are dynamically updated by the driver or user, allowing highly
|
|
|
efficient zero-copy data transfers. The maximum size associated with each
|
|
@@ -290,31 +299,6 @@ PCIe link (FPGA-GPU) & x8 Gen3 & x8 Gen3 \\
|
|
|
\label{fig:throughput}
|
|
|
\end{figure}
|
|
|
|
|
|
-% \begin{figure}
|
|
|
-% \centering
|
|
|
-% \begin{subfigure}[b]{.49\textwidth}
|
|
|
-% \centering
|
|
|
-% \includegraphics[width=\textwidth]{figures/throughput}
|
|
|
-% \caption{%
|
|
|
-% DMA data transfer throughput.
|
|
|
-% }
|
|
|
-% \label{fig:throughput}
|
|
|
-% \end{subfigure}
|
|
|
-% \begin{subfigure}[b]{.49\textwidth}
|
|
|
-% \includegraphics[width=\textwidth]{figures/latency}
|
|
|
-% \caption{%
|
|
|
-% Latency distribution.
|
|
|
-% % for a single 4 KB packet transferred
|
|
|
-% % from FPGA-to-CPU and FPGA-to-GPU.
|
|
|
-% }
|
|
|
-% \label{fig:latency}
|
|
|
-% \end{subfigure}
|
|
|
-% \caption{%
|
|
|
-% Measured throuhput for data transfers from FPGA to main memory
|
|
|
-% (CPU) and from FPGA to the global GPU memory (GPU).
|
|
|
-% }
|
|
|
-% \end{figure}
|
|
|
-
|
|
|
The measured results for the pure data throughput is shown in
|
|
|
\figref{fig:throughput} for transfers from the FPGA to the system's main
|
|
|
memory as well as to the global memory as explained in \ref{sec:host}.
|
|
@@ -359,14 +343,20 @@ latency.
|
|
|
|
|
|
|
|
|
\subsection{Latency}
|
|
|
-
|
|
|
-\begin{figure}
|
|
|
- \includegraphics[width=\textwidth]{figures/latency-hist}
|
|
|
- \caption{%
|
|
|
- Latency distribution for a single 1024 B packet transferred from FPGA to
|
|
|
- GPU memory and to main memory.
|
|
|
- }
|
|
|
- \label{fig:latency-distribution}
|
|
|
+\begin{figure}[t]
|
|
|
+ \centering
|
|
|
+ \begin{subfigure}[b]{.8\textwidth}
|
|
|
+ \centering
|
|
|
+ \includegraphics[width=\textwidth]{figures/latency}
|
|
|
+ \caption{Latency }
|
|
|
+ \label{fig:latency_vs_size}
|
|
|
+ \end{subfigure}
|
|
|
+ \begin{subfigure}[b]{.8\textwidth}
|
|
|
+ \includegraphics[width=\textwidth]{figures/latency-hist}
|
|
|
+ \caption{Latency distribution.}
|
|
|
+ \label{fig:latency_hist}
|
|
|
+ \end{subfigure}
|
|
|
+ \label{fig:latency}
|
|
|
\end{figure}
|
|
|
|
|
|
For HEP experiments, low latencies are necessary to react in a reasonable time
|