před 8 roky · a1b27632a7
--- a/literature.bib
+++ b/literature.bib
@@ -22,6 +22,29 @@ pages={1845-1851},
 
				 ISSN={0018-9499}, 
			
 
				 month={Aug},}
			
 
				 
			
 
				+@INPROCEEDINGS{optical_pcie, 
			
 
				+author={Liboiron-Ladouceur, O. and Wang, H. and Bergman, K.}, 
			
 
				+booktitle={Optical Fiber Communication and the National Fiber Optic Engineers Conference, 2007. OFC/NFOEC 2007. Conference on}, 
			
 
				+title={An All-Optical PCI-Express Network Interface for Optical Packet Switched Networks}, 
			
 
				+year={2007}, 
			
 
				+pages={1-3}, 
			
 
				+doi={10.1109/OFC.2007.4348447}, 
			
 
				+month={March},}
			
 
				+    
			
 
				+@article{mu3e_gpu,
			
 
				+  author={S Bachmann and N Berger and A Blondel and S Bravar and A Buniatyan and G Dissertori and P Eckert and P Fischer and C Grab and R Gredig and M
			
 
				+Hildebrandt and P -R Kettle and M Kiehn and A Papa and I Peric and M Pohl and S Ritt and P Robmann and A Schöning and H -C
			
 
				+Schultz-Coulon and W Shen and S Shresta and A Stoykov and U Straumann and R Wallny and D Wiedner and B Windelband},
			
 
				+  title={The proposed trigger-less TBit/s readout for the Mu3e experiment},
			
 
				+  journal={Journal of Instrumentation},
			
 
				+  volume={9},
			
 
				+  number={01},
			
 
				+  pages={C01011},
			
 
				+  url={http://stacks.iop.org/1748-0221/9/i=01/a=C01011},
			
 
				+  year={2014},
			
 
				+}
			
 
				+  
			
 
				+
			
 
				 @manual{xilinxgen3,
			
 
				   title        = {Virtex-7 FPGA Gen3 Integrated Block for PCI Express},
			
 
				   author       = {Xilinx}, 
			
--- a/paper.tex
+++ b/paper.tex
@@ -43,15 +43,15 @@ HEP experiments.}
 
				 GPU computing has become the main driving force for high-performance computing
			
 
				 due to an unprecedented parallelism and a low cost-benefit factor.  GPU
			
 
				 acceleration has found its way into numerous applications, ranging from
			
 
				-simulation to image processing. Recent years have also seen an increasing interest in GPU-based systems for HEP applications, which require a combination of high data rates, high computational power and low latency (e.g.: ATLAS[cite], CMS[cite], ALICE[\cite{alice_gpu}], Mu3e[cite] and PANDA[cite] high/low-level triggers). Moreover, the volumes of data produced in recent photon science facilities have become comparable to those traditionally associated with HEP.
			
 
				+simulation to image processing. Recent years have also seen an increasing interest in GPU-based systems for HEP applications, which require a combination of high data rates, high computational power and low latency (e.g.: ATLAS[cite], CMS[cite], ALICE~\cite{alice_gpu}, Mu3e~\cite{mu3e_gpu} and PANDA[cite]). Moreover, the volumes of data produced in recent photon science facilities have become comparable to those traditionally associated with HEP.
			
 
				 
			
 
				-In such experiments data is acquired by one or more read-out boards and then transmitted to GPUs in short bursts or in a continuous streaming mode. With expected data rates of several Gbytes/s, the data transmission link between the read-out boards and the host system can constitute the performance bottleneck. In case of high-level trigger, low-latency become the most stringent constraint.
			
 
				+In such experiments data is acquired by one or more read-out boards and then transmitted to GPUs in short bursts or in a continuous streaming mode. With expected data rates of several Gbytes/s, the data transmission link between the read-out boards and the host system can constitute the performance bottleneck. In case of High-Level Trigger applications, low-latency becomes the most stringent constraint.
			
 
				 
			
 
				-To address these problems we have developed a high-throuhgput and low-latency architecture that connects FPGA-based devices and external GPUs by PCIe data links.
			
 
				+To address these problems we propose complete hardware-software stack architecture based on our own DMA design and integration of AMD's DirectGMA technology into our processing pipeline.
			
 
				 
			
 
				+%% move this part after
			
 
				 In order to fully saturate the PCIe bus bandwidth\footnote{Net
			
 
				-bandwidth of 6.7 GB/s for PCIe 3.0 x8.} and decrease the total latency, we propose complete hardware-software stack architecture based on our own DMA design and integration of AMD's
			
 
				-DirectGMA technology into our processing pipeline.
			
 
				+bandwidth of 6.7 GB/s for PCIe 3.0 x8.}
			
 
				 
			
 
				 \section{Basic concepts}
			
 
				 
			
@@ -66,11 +66,15 @@ datagram.
 
				 
			
 
				 \section{Architecture}
			
 
				 
			
 
				-\subsection{FPGA side}
			
 
				+\subsection{FPGA readout board}
			
 
				 
			
 
				-Our implementation has been optimized in order to achieve the maximum data throughput and to minimize the FPGA resource utilization, while still maintaining the flexibility of a Scatter-Gather memory policy. The architecture of the DMA engine described in \cite{rota2015dma} has been 
			
 
				-extended to support the PCI-Express Gen3 Core \cite{xilinxgen3}. The implementation features two 
			
 
				-separate engines to handle large data transfers from/to the host.  
			
 
				+In a typical HEP data link scheme, FPGA boards connect front-end detectors with high-level computing stage. Optical links are preferred over electrical solutions because of high radiation hardness, low power consumption and high density. 
			
 
				+
			
 
				+In our solution, PCI-Express has been chosen as data link between FPGA boards and external computing. Thanks to its high-bandwidth and modularity, it became the commercial standard for connecting high-performance peripherals, in particular CPUs and GPUs. Optical PCI-Express networks have been demonstrated~\cite{optical_pcie}, opening the possibility of being used in HEP experiments.
			
 
				+
			
 
				+
			
 
				+\subsubsection{DMA engine}
			
 
				+A Direct Memory Access (DMA) engine is needed in order to maximize the data throghput. We have developed a DMA architecture~\cite{rota2015dma} that achieves maximum data throughput while minimizing resource utilization and maintaining the flexibility of a Scatter-Gather memory policy.The engine is now compatible with the Xilinx PCI-Express Gen3 IP-Core~\cite{xilinxgen3} and supports DMA data transfers from/to the host.  
			
 
				 
			
 
				 With respect to the previous version based on PCI-Express Gen2, the PCI-Express IP-Core provided by Xilinx has undergone some modifications, which are reflected in our logic implementation:
			
 
				 
			
@@ -134,10 +138,19 @@ the host side wall clock time. On-GPU data transfer is about twice as fast.}
 
				   \label{fig:intra-copy}
			
 
				 \end{figure}
			
 
				 
			
 
				-
			
 
				 \section{Conclusion}
			
 
				 
			
 
				-% Outlook
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+\section{Outlook}
			
 
				+
			
 
				+\subsection{Hi-flex Board}
			
 
				+
			
 
				+A custom FPGA is currently under development. Describe me please.
			
 
				+Picture of Board.
			
 
				+The board features a PCI-Express Gen3 x16 connection, with two PCI-Express x8 cores instantiated on the board. The board will be used in conjuction with a PEX chip, which allows to map the two cores as a single x16 device. We expect to achieve data throughputs up to 13 GB/s by using the dual-core architecture already used with the Gen2 version of the IP-Core~\cite{rota2015dma}.
			
 
				 
			
 
				 \begin{itemize}
			
 
				   \item PCIe might be changed for InfiniBand which offers such and such
			
@@ -146,7 +159,7 @@ the host side wall clock time. On-GPU data transfer is about twice as fast.}
 
				 
			
 
				 \acknowledgments
			
 
				 
			
 
				-UFO? KSETA?
			
 
				+UFO? KSETA? Are you joking?
			
 
				 
			
 
				 
			
 
				 \bibliographystyle{JHEP}