Procházet zdrojové kódy

Added some general infos and started Outlook section

Lorenzo před 8 roky
rodič
revize
a1b27632a7
2 změnil soubory, kde provedl 48 přidání a 12 odebrání
  1. 23 0
      literature.bib
  2. 25 12
      paper.tex

+ 23 - 0
literature.bib

@@ -22,6 +22,29 @@ pages={1845-1851},
 ISSN={0018-9499}, 
 month={Aug},}
 
+@INPROCEEDINGS{optical_pcie, 
+author={Liboiron-Ladouceur, O. and Wang, H. and Bergman, K.}, 
+booktitle={Optical Fiber Communication and the National Fiber Optic Engineers Conference, 2007. OFC/NFOEC 2007. Conference on}, 
+title={An All-Optical PCI-Express Network Interface for Optical Packet Switched Networks}, 
+year={2007}, 
+pages={1-3}, 
+doi={10.1109/OFC.2007.4348447}, 
+month={March},}
+    
+@article{mu3e_gpu,
+  author={S Bachmann and N Berger and A Blondel and S Bravar and A Buniatyan and G Dissertori and P Eckert and P Fischer and C Grab and R Gredig and M
+Hildebrandt and P -R Kettle and M Kiehn and A Papa and I Peric and M Pohl and S Ritt and P Robmann and A Schöning and H -C
+Schultz-Coulon and W Shen and S Shresta and A Stoykov and U Straumann and R Wallny and D Wiedner and B Windelband},
+  title={The proposed trigger-less TBit/s readout for the Mu3e experiment},
+  journal={Journal of Instrumentation},
+  volume={9},
+  number={01},
+  pages={C01011},
+  url={http://stacks.iop.org/1748-0221/9/i=01/a=C01011},
+  year={2014},
+}
+  
+
 @manual{xilinxgen3,
   title        = {Virtex-7 FPGA Gen3 Integrated Block for PCI Express},
   author       = {Xilinx}, 

+ 25 - 12
paper.tex

@@ -43,15 +43,15 @@ HEP experiments.}
 GPU computing has become the main driving force for high-performance computing
 due to an unprecedented parallelism and a low cost-benefit factor.  GPU
 acceleration has found its way into numerous applications, ranging from
-simulation to image processing. Recent years have also seen an increasing interest in GPU-based systems for HEP applications, which require a combination of high data rates, high computational power and low latency (e.g.: ATLAS[cite], CMS[cite], ALICE[\cite{alice_gpu}], Mu3e[cite] and PANDA[cite] high/low-level triggers). Moreover, the volumes of data produced in recent photon science facilities have become comparable to those traditionally associated with HEP.
+simulation to image processing. Recent years have also seen an increasing interest in GPU-based systems for HEP applications, which require a combination of high data rates, high computational power and low latency (e.g.: ATLAS[cite], CMS[cite], ALICE~\cite{alice_gpu}, Mu3e~\cite{mu3e_gpu} and PANDA[cite]). Moreover, the volumes of data produced in recent photon science facilities have become comparable to those traditionally associated with HEP.
 
-In such experiments data is acquired by one or more read-out boards and then transmitted to GPUs in short bursts or in a continuous streaming mode. With expected data rates of several Gbytes/s, the data transmission link between the read-out boards and the host system can constitute the performance bottleneck. In case of high-level trigger, low-latency become the most stringent constraint.
+In such experiments data is acquired by one or more read-out boards and then transmitted to GPUs in short bursts or in a continuous streaming mode. With expected data rates of several Gbytes/s, the data transmission link between the read-out boards and the host system can constitute the performance bottleneck. In case of High-Level Trigger applications, low-latency becomes the most stringent constraint.
 
-To address these problems we have developed a high-throuhgput and low-latency architecture that connects FPGA-based devices and external GPUs by PCIe data links.
+To address these problems we propose complete hardware-software stack architecture based on our own DMA design and integration of AMD's DirectGMA technology into our processing pipeline.
 
+%% move this part after
 In order to fully saturate the PCIe bus bandwidth\footnote{Net
-bandwidth of 6.7 GB/s for PCIe 3.0 x8.} and decrease the total latency, we propose complete hardware-software stack architecture based on our own DMA design and integration of AMD's
-DirectGMA technology into our processing pipeline.
+bandwidth of 6.7 GB/s for PCIe 3.0 x8.}
 
 \section{Basic concepts}
 
@@ -66,11 +66,15 @@ datagram.
 
 \section{Architecture}
 
-\subsection{FPGA side}
+\subsection{FPGA readout board}
 
-Our implementation has been optimized in order to achieve the maximum data throughput and to minimize the FPGA resource utilization, while still maintaining the flexibility of a Scatter-Gather memory policy. The architecture of the DMA engine described in \cite{rota2015dma} has been 
-extended to support the PCI-Express Gen3 Core \cite{xilinxgen3}. The implementation features two 
-separate engines to handle large data transfers from/to the host.  
+In a typical HEP data link scheme, FPGA boards connect front-end detectors with high-level computing stage. Optical links are preferred over electrical solutions because of high radiation hardness, low power consumption and high density. 
+
+In our solution, PCI-Express has been chosen as data link between FPGA boards and external computing. Thanks to its high-bandwidth and modularity, it became the commercial standard for connecting high-performance peripherals, in particular CPUs and GPUs. Optical PCI-Express networks have been demonstrated~\cite{optical_pcie}, opening the possibility of being used in HEP experiments.
+
+
+\subsubsection{DMA engine}
+A Direct Memory Access (DMA) engine is needed in order to maximize the data throghput. We have developed a DMA architecture~\cite{rota2015dma} that achieves maximum data throughput while minimizing resource utilization and maintaining the flexibility of a Scatter-Gather memory policy.The engine is now compatible with the Xilinx PCI-Express Gen3 IP-Core~\cite{xilinxgen3} and supports DMA data transfers from/to the host.  
 
 With respect to the previous version based on PCI-Express Gen2, the PCI-Express IP-Core provided by Xilinx has undergone some modifications, which are reflected in our logic implementation:
 
@@ -134,10 +138,19 @@ the host side wall clock time. On-GPU data transfer is about twice as fast.}
   \label{fig:intra-copy}
 \end{figure}
 
-
 \section{Conclusion}
 
-% Outlook
+
+
+
+
+\section{Outlook}
+
+\subsection{Hi-flex Board}
+
+A custom FPGA is currently under development. Describe me please.
+Picture of Board.
+The board features a PCI-Express Gen3 x16 connection, with two PCI-Express x8 cores instantiated on the board. The board will be used in conjuction with a PEX chip, which allows to map the two cores as a single x16 device. We expect to achieve data throughputs up to 13 GB/s by using the dual-core architecture already used with the Gen2 version of the IP-Core~\cite{rota2015dma}.
 
 \begin{itemize}
   \item PCIe might be changed for InfiniBand which offers such and such
@@ -146,7 +159,7 @@ the host side wall clock time. On-GPU data transfer is about twice as fast.}
 
 \acknowledgments
 
-UFO? KSETA?
+UFO? KSETA? Are you joking?
 
 
 \bibliographystyle{JHEP}