CERN Computing Seminar

Fast parallel event reconstruction

by Dr Ivan Kisel (GSI, Gesellschaft fuer Schwerionenforschung mbH)

Europe/Zurich
IT Auditorium (CERN)

IT Auditorium

CERN

Description

On-line processing of large data volumes produced in modern HEP experiments requires using maximum capabilities of modern and future many-core CPU and GPU architectures.

One of such powerful feature is a SIMD instruction set, which allows packing several data items in one register and to operate on all of them, thus achievingmore operations per clock cycle. Motivated by the idea of using the SIMD unit ofmodern processors, the KF based track fit has been adapted for parallelism, including memory optimization, numerical analysis, vectorization with inline operator overloading, and optimization using SDKs. The speed of the algorithm has been increased in 120000 times with 0.1 ms/track, running in parallel on 16 SPEs of a Cell Blade computer.  Running on a Nehalem CPU with 8 cores it shows the processing speed of 52 ns/track using the Intel Threading Building Blocks. The same KF algorithm running on an Nvidia GTX 280 in the CUDA frameworkprovides a plane throughput of 22 tracks/ms.In addition, a many-core architecture code named Larrabee can be considered an interesting platform to further scale the Kalman filter in the threading and vectorization dimensions. Less architecture-dependent programming frameworks, such as OpenCL and Intel Ct,may also better support future changes in architecture. Thus, for example, the KF algorithm demonstrates a linear many-core scalability being implemented in the Intel Ct parallel language.

The fully SIMDized CA track finder of the future heavy-ion CBM experiment (FAIR/GSI) with the included SIMD KF track fit shows the full reconstruction efficiency of 92%. High energetic particles have the reconstruction efficiency of 97%. The efficiency of low energetic tracks is 82% due to significant multiple scattering in the detector material. The level of ghost tracks is only about 3%. The CA track finder demonstrates the maximum throughput of 150 centralor 1100 minimum bias events/s running on a Nehalem CPU with 8 cores. The strong many-core scalability of the CA track finder makes possible to keep the reconstruction at the event-level parallelism.

More details on parallelism of the event reconstruction algorithms of the CBM, as well as ALICE and STAR experiments will be presented and discussed.

A short overview of the "Workshop for Future Challenges in Tracking and Trigger Concepts" (GSI, Germany, 07-11 June, 2010) will be also given.


Organised by: Sverre Jarp and Miguel Angel Marquina - IT Department
CERN Computing Seminars and Colloquia

More Information
Slides
Video in CDS