|
@@ -178,18 +178,19 @@ automatized fashion.
|
|
|
\section{Results}
|
|
|
|
|
|
We measured the performance using a Xilinx VC709 evaluation board plugged into a
|
|
|
-desktop PC with an Intel Xeon E5-1630 3.7 GHz processor and an Intel C612 chipset.
|
|
|
+desktop PC with an Intel Xeon E5-1630 3.7 GHz processor and an Intel C612
|
|
|
+chipset.
|
|
|
|
|
|
Due to the size limitation of the DMA buffer as presented in Section
|
|
|
-\ref{sec:host}, we have to do intermediate copies in order to transfer data
|
|
|
-larger than the given size of 95 MB. In \figref{fig:intra-copy}, the throughput
|
|
|
-for a copy from a smaller sized buffer (representing the DMA buffer) to a larger
|
|
|
-buffer is shown. At a block size of about 384 KB, the throughput surpasses the
|
|
|
-maximum possible PCIe bandwidth, thus making a double buffering strategy a
|
|
|
-viable solution for large data transfers.
|
|
|
+\ref{sec:host}, we have to copy several sub buffers in order to transfer data
|
|
|
+larger than the maximum transfer size of 95 MB. In \figref{fig:intra-copy}, the
|
|
|
+throughput for a copy from a smaller sized buffer (representing the DMA buffer)
|
|
|
+to a larger buffer is shown. At a block size of about 384 KB, the throughput
|
|
|
+surpasses the maximum possible PCIe bandwidth, thus making a double buffering
|
|
|
+strategy a viable solution for very large data transfers.
|
|
|
|
|
|
\begin{figure}
|
|
|
- \includegraphics[width=\textwidth]{figures/intra-copy.png}
|
|
|
+ \includegraphics[width=\textwidth]{figures/intra-copy}
|
|
|
\caption{%
|
|
|
Throughput in MB/s for an intra-GPU data transfer of smaller block sizes
|
|
|
(4KB -- 24 MB) into a larger destination buffer (32 MB -- 128 MB). The lower
|
|
@@ -201,11 +202,25 @@ viable solution for large data transfers.
|
|
|
\label{fig:intra-copy}
|
|
|
\end{figure}
|
|
|
|
|
|
+\subsection{Throughput}
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+\subsection{Latency}
|
|
|
+
|
|
|
%% Change the specs for the small crate
|
|
|
For FPGA-to-GPU transfers, we also repeated the measurements using a low-end system
|
|
|
based on XXX and Intel Nano XXXX. The results does not show any significant difference
|
|
|
compared to the previous setup, making it a more cost-effective solution.
|
|
|
|
|
|
+\begin{figure}
|
|
|
+ \includegraphics[width=\textwidth]{figures/latency-michele}
|
|
|
+ \caption{%
|
|
|
+ FILL ME
|
|
|
+ }
|
|
|
+ \label{fig:intra-copy}
|
|
|
+\end{figure}
|
|
|
+
|
|
|
\begin{figure}
|
|
|
\centering
|
|
|
\includegraphics[width=0.6\textwidth]{figures/through_plot}
|