paper.tex 1.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
  1. \documentclass{JINST}
  2. \usepackage{lineno}
  3. \usepackage{ifthen}
  4. \newboolean{draft}
  5. \setboolean{draft}{true}
  6. \title{A high-throughput readout architecture based on PCIe Gen3 and DirectGMA technology}
  7. \author{N.~Zilio$^b$,
  8. M.~Weber$^a$\\
  9. \llap{$^a$}Institute for Data Processing and Electronics,\\
  10. Karlsruhe Institute of Technology (KIT),\\
  11. Herrmann-von-Helmholtz-Platz 1, Karlsruhe, Germany\\
  12. \llap{$^b$}Somewhere in France
  13. }
  14. \abstract{Abstract}
  15. \begin{document}
  16. \ifdraft
  17. \setpagewiselinenumbers
  18. \linenumbers
  19. \fi
  20. \section{Introduction}
  21. Citation~\cite{lonardo2015nanet}
  22. \section{Architecture}
  23. \subsection{Host interface}
  24. On the host side, AMD's DirectGMA technology, an implementation of the
  25. bus-addressable memory extension for OpenCL 1.1+, is used to prepare GPU buffers
  26. for writing data by FPGA as well as mapping the remote FPGA device for writing
  27. signals. To write into the GPU, the physical bus address of the GPU buffer is
  28. determined with a call to \texttt{clEnqueueMakeBuffersResidentAMD}. The address
  29. is written to an FPGA register and updated for each successful transfer of one
  30. or more pages of data. Due to hardware restrictions the largest possible GPU
  31. buffer sizes are about 95 MB. Larger transfers are achieved with a double
  32. buffering mechanism (MV: we should measure intra-GPU data transfers).
  33. \section{Results}
  34. \begin{figure}
  35. \includegraphics[width=\textwidth]{figures/intra-copy.png}
  36. \caption{Throughput in MB/s for an intra-GPU data transfer of smaller block
  37. sizes (4KB -- 24 MB) into a larger destination buffer (32 MB -- 128 MB). The lower
  38. performance for smaller block sizes is caused by the larger amount of transfers
  39. required to fill the destination buffer. The throughput has been estimated using
  40. the host side wall clock time. On-GPU data transfer is about twice as fast.}
  41. \label{fig:intra-copy}
  42. \end{figure}
  43. \section{Conclusion}
  44. \acknowledgments
  45. UFO? KSETA?
  46. \bibliographystyle{JHEP}
  47. \bibliography{literature}
  48. \end{document}