five

Additional Signal Models for the LHCO2020 R&D

收藏
DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.18983506
下载链接
链接失效反馈
官方服务:
资源简介:
This Zenodo contains the data used for the paper "Kitchen Sink Anomaly Detection". The data used in the paper is based on the LHC Olympics 2020 R&D dataset and the signal models inspired by the CMS study Rept.Prog.Phys. 88, 067802 (2024). Model generation For the generated models we provide the Event directories of the MadGraph5_aMC@NLO generation, as well as the clustered events and features calculated from the Delphes output. Furthermore, we provide the MadGraph5_aMC@NLO driver files (.mg5) for producing the signal event samples used in the paper, which can be found in the folder model_cards.tar.gz. The primary generation chain used here was: - MadGraph5_aMC@NLO 3.6.2- Pythia 8.313 for parton showering/hadronization- Delphes 3.5.0 for fast detector simulation We use at leading order MadGraph5_aMC@NLO 3.6.2 (LO) to simulate the hard process including the decay of all intermediate resonances so that the final state consists of quarks only. The width of the BSM resonances is set to a small value such that they are effectively treated in the narrow width approximation. The resulting parton-level events are passed to Pythia 8.313 for parton showering and hadronization, followed by Delphes 3.5.0 for detector simulation, using the same detector card as in the LHCO R&D setup. To ensure compatibility with the LHCO configuration, we use the default Pythia tune but switch off multi-parton interactions. For further details we recommend to have a look at the provided .mg5 driver files in the model_cards.tar.gz folder, which also contain the cards used for the Pythia and Delphes steps. The models are inspired by the CMS study Rept.Prog.Phys. 88, 067802 (2024) and many of the cards come from the genproductions collection. UFO model tarballs are available from the CMS project generators site. XtoWRto3W The signal $W_{KK} \to W  R \to 3W$, with $m_R=500$ GeV consists of a heavy Kaluza-Klein vector boson $W_{KK}$ decaying into a $W$ boson and a scalar radion $R$ (see JHEP01(2017)016 and PhysRevD.99.075016). The radion decays into two $W$ bosons. We analyze the fully hadronic channel, where all $W$ bosons decay into two light quarks each, resulting in a $2+4$ prong structure. XtoYYprime The signal $X \to Y  Y^\prime \to 4 q$ with $m_Y=100$ GeV and $m_{Y^\prime}=500$ GeV has a $2+2$ prong topology like the LHCO 2-prong signal, but while $X$ and $Y$ are again vector bosons, $Y^\prime$ is a scalar particle.  $Y^\prime$ decays into a pair of bottom quarks while $Y$ can also decay into pairs of light quarks. YtoHHto4T The signal $G_{KK} \to H H \to 4t$ with $m_H=400$ GeV consists of a heavy spin-2 Randall-Sundrum graviton $G_{KK}$ that decays into two Higgs-like scalars $H$ (see arXiv.1404.0102). The scalars decay into two top quarks, resulting in a $6+6$ prong structure in the fully hadronic channel which is considered here. ZpToTpTp  The signal $Z^\prime \to T^\prime T^\prime \to t Z t Z$ with $m_{T^\prime}=400$ GeV consists of a $Z^\prime$ vector boson that decays into two vector-like quarks $T^\prime$ (see https://DOI.org/10.1155/2013/364936 and https://DOI.org/10.1016/j.nuclphysb.2013.08.010).  The $T^\prime$ particles decay into a top quark and a $Z$ boson. Again we only consider the fully hadronic channel, where all the intermediate vector bosons decay into light quarks, resulting in a $5+5$ prong topology. Folder overview For a given MODEL_NAME, the data is organized as follows:MODEL_NAME.tar.gz├── MODEL_NAME_subjettinesses.h5   \\ Subjettiness features├── MODEL_NAME_efps.h5             \\ EFP features├── events.h5                      \\ unordered 3-momentum of the reconstructed jet constituents├── run_01_tag_1_banner.txt├── run_shower.sh├── tag_1_delphes.log├── tag_1_delphes_events.root├── tag_1_djrs.dat├── tag_1_pts.dat├── tag_1_pythia8.cmd├── tag_1_pythia8.log├── tag_1_pythia8_events.hepmc.gz└── unweighted_events.lhe.gzThis structure and be directly used by the feature set generation code provided by the paper repository: https://github.com/lang-lukas/KitchenSink. events.h5 The events.h5 file contains the unordered 3-momentum of the reconstructed jet constituents for each event. All the particles are considered to be massless.  The data frame has the shape (NUMBER_OF_EVENTS, 2101) where the first 2100 entries correspond to the 3-momentum of up to 700 jet constituents ($p_T$, $\eta$, $\phi$ for each constituent), and the last entry is the corresponding label (0 for background, 1 for signal). The columns are named as pt0, eta0, phi0, pt1, eta1, phi1, ..., pt699, eta699, phi699, and signal. MODEL_NAME_subjettinesses.h5 This file contains the N-subjettiness features for each event, calculated using the FastJet contrib package. The data frame has the shape (NUMBER_OF_EVENTS, 63) where the first 31 columns correspond to 4-momneta (pxj1, pyj1, pzj1, mj1) and the N-subjettiness features $\tau_N^{(\beta)}$ for $N \leq 9$ and $\beta \in \{0.5, 1.0, 2.0\}$ for the leading jet, and the next 31 columns correspond to the same features for the subleading jet.The columns are named as:  pxj1, pyj1, pzj1, mj1, tau1j1_5, ... , tau9j1_5, tau1j1_1, ... , tau9j1_1, tau1j1_2, ... , tau9j1_2 followed by the same features for the subleading jet with j1 replaced by j2 in the column names and end with the label column. MODEL_NAME_efps.h5 This file contains the Energy Flow Polynomial (EFP) features for each event, calculated using the EnergyFlow package. The set includes only prime EFPs with number of edges $d \leq 7$, $\beta = 1$ and $\kappa = 1$, calculated in the hardronic default (hdr) measure.  The data frame has the shape (NUMBER_OF_EVENTS, 980) where the first 490 columns correspond to the EFP features for the leading jet and the next 490 columns correspond to the EFP features for the subleading jet. The columns are indexed from 0 to 979 and the corresponding graph structure can be found in the EFPs_d<=7_graphs.txt file in the root directory of this Zenodo. Features for the LHC Olympics 2020 R&D dataset The low-level features for the LHCO dataset are provided by the original authors under https://zenodo.org/records/6466204 and the extra background events can be found under https://zenodo.org/records/8370758.For these events the subjettiness features have bin calculated for the "Back to the roots" paper under the repository https://github.com/uhh-pd-ml/treebased_anomaly_detection. We provide the EFP features for these events with: - LHCO_2-prong_efps.h5: EFP features for the 1 million LHCO background events and the 100k 2-prong signal events.- LHCO_3-prong_efps.h5: EFP features 100k 3-prong signal events.- LHCO_extra_bkg_efps.h5: EFP features for the additional background events in the SR  The EFP features are saved in the same format as the features for the generated models, as described above, and include the same set of EFPs (prime EFPs with number of edges $d \leq 7$, $\beta = 1$ and $\kappa = 1$, calculated in the hardronic default (hdr) measure).
提供机构:
Zenodo
创建时间:
2026-04-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作