image/svg+xml
Programme Matter and Technologies CMS L1 Track Trigger Hardware Demonstrators Luis Ardila-Perez, Timo Dritschler, Thomas Schuh
▸ RDMA enables low-latency data transfers▸ Paired with efficient algorithms, GPUs can be used to realize tight feedback loops with less than 10 μs latency!▸ We illustrated the performance of such systems by a prototype implementation of the L1 CMS Track Trigger HT▸ Look at performance of newer GPUs - High Bandwidth Memory (HBM) 2-4x better throughput
▸ Total available L1 time will be 12.5 μs, but only ~4 μs is available for track finding▸ Processes very high data rates (~20,000 stubs per event) down to the O(10) genuine/interesting tracks expected on average▸ Track Finding Processor (TFP) receive data links from adjacent detector octants in φ▸ Fully time-multiplexed system: Processing of subsequent events done on parallel independent nodes▸ Each TFP processes 1 of 8 φ sectors and 1 event out of 36▸ One TFP become the demonstrator slice unit▸ Highly scalable system The Track Finding Processor (TFP) Kalman Filter (KF)A candidate cleaning and precision fitting algorithm Hough Transform (HT)Track finder that identifies groups of stubs consistent with a track in the r-φ plane Duplicate Removal (DR) Uses precise fit information to remove duplicate tracks generated by the HT
Motivation: High Luminosity LHC
▸ In 2026 - LHC will be upgraded in luminosity▸ Tracker will be replaced due to radiation damage and for high occupancy conditions▸ New design will allow tracker read out at 40 MHz
Pileup 200 simulation
CMS Tracker Upgrade
▸ High pT tracks are indicative of interesting physics (decays of high mass particles)▸ Novel tracking modules utilise two 1.6-4.0 mm spaced silicon sensors, to discriminate pT > 2-3 GeV▸ Rate reduction O(100) inside detector module
➤ (poll) Read/Uncompress data ➤ Ask for data ➤ Compute
▸ Hexagonal bins in Hough space▸ This Suppresses fake candidates by 80% ▸ Runtime comparable to FPGA approach ▸ Only 1 bin activated per row in Hough space Future FPGA Demonstrator Development Outlook of Heterogeneous Demonstrator FPGA-based Hardware Demonstrator ▸ Remote Direct Memory Access (RDMA) can be used to directly connect GPUs with FPGAs▸ This allows for highly flexible FPGA-based DAQ hardware combined with high performance computing GPUs
CPU
GPU
MEMORY
NETWORK
1 2 3
RDMA (Red) Conventional transfer (Green) RDMA transfer
Heterogeneous Hardware Demonstrator
r-φ Floating Point Hexagonal HT ▸ less algorithmic branching▸ Computational time of around 4 μs▸ More complex algorithms are possible: hexagonal approach
▸ Investigate new transfer technologies - PCIe 4.0 (2x faster ) - NV-Link (5-10x faster)
Geometric Processor (GP)Processes stub data, and sub-divides the octant into 36 finer sub-sectors