Additional Signal Models for the LHCO2020 R&D
收藏DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.18983506
下载链接
链接失效反馈官方服务:
资源简介:
This Zenodo contains the data used for the paper "Kitchen Sink Anomaly Detection". The data used in the paper is based on the LHC Olympics 2020 R&D dataset and the signal models inspired by the CMS study Rept.Prog.Phys. 88, 067802 (2024).
Model generation
For the generated models we provide the Event directories of the MadGraph5_aMC@NLO generation, as well as the clustered events and features calculated from the Delphes output. Furthermore, we provide the MadGraph5_aMC@NLO driver files (.mg5) for producing the signal event samples used in the paper, which can be found in the folder model_cards.tar.gz. The primary generation chain used here was:
- MadGraph5_aMC@NLO 3.6.2- Pythia 8.313 for parton showering/hadronization- Delphes 3.5.0 for fast detector simulation
We use at leading order MadGraph5_aMC@NLO 3.6.2 (LO) to simulate the hard process including the decay of all intermediate resonances so that the final state consists of quarks only. The width of the BSM resonances is set to a small value such that they are effectively treated in the narrow width approximation. The resulting parton-level events are passed to Pythia 8.313 for parton showering and hadronization, followed by Delphes 3.5.0 for detector simulation, using the same detector card as in the LHCO R&D setup. To ensure compatibility with the LHCO configuration, we use the default Pythia tune but switch off multi-parton interactions. For further details we recommend to have a look at the provided .mg5 driver files in the model_cards.tar.gz folder, which also contain the cards used for the Pythia and Delphes steps.
The models are inspired by the CMS study Rept.Prog.Phys. 88, 067802 (2024) and many of the cards come from the genproductions collection. UFO model tarballs are available from the CMS project generators site.
XtoWRto3W
The signal $W_{KK} \to W R \to 3W$, with $m_R=500$ GeV consists of a heavy Kaluza-Klein vector boson $W_{KK}$ decaying into a $W$ boson and a scalar radion $R$ (see JHEP01(2017)016 and PhysRevD.99.075016). The radion decays into two $W$ bosons. We analyze the fully hadronic channel, where all $W$ bosons decay into two light quarks each, resulting in a $2+4$ prong structure.
XtoYYprime
The signal $X \to Y Y^\prime \to 4 q$ with $m_Y=100$ GeV and $m_{Y^\prime}=500$ GeV has a $2+2$ prong topology like the LHCO 2-prong signal, but while $X$ and $Y$ are again vector bosons, $Y^\prime$ is a scalar particle. $Y^\prime$ decays into a pair of bottom quarks while $Y$ can also decay into pairs of light quarks.
YtoHHto4T
The signal $G_{KK} \to H H \to 4t$ with $m_H=400$ GeV consists of a heavy spin-2 Randall-Sundrum graviton $G_{KK}$ that decays into two Higgs-like scalars $H$ (see arXiv.1404.0102). The scalars decay into two top quarks, resulting in a $6+6$ prong structure in the fully hadronic channel which is considered here.
ZpToTpTp
The signal $Z^\prime \to T^\prime T^\prime \to t Z t Z$ with $m_{T^\prime}=400$ GeV consists of a $Z^\prime$ vector boson that decays into two vector-like quarks $T^\prime$ (see https://DOI.org/10.1155/2013/364936 and https://DOI.org/10.1016/j.nuclphysb.2013.08.010). The $T^\prime$ particles decay into a top quark and a $Z$ boson. Again we only consider the fully hadronic channel, where all the intermediate vector bosons decay into light quarks, resulting in a $5+5$ prong topology.
Folder overview
For a given MODEL_NAME, the data is organized as follows:MODEL_NAME.tar.gz├── MODEL_NAME_subjettinesses.h5 \\ Subjettiness features├── MODEL_NAME_efps.h5 \\ EFP features├── events.h5 \\ unordered 3-momentum of the reconstructed jet constituents├── run_01_tag_1_banner.txt├── run_shower.sh├── tag_1_delphes.log├── tag_1_delphes_events.root├── tag_1_djrs.dat├── tag_1_pts.dat├── tag_1_pythia8.cmd├── tag_1_pythia8.log├── tag_1_pythia8_events.hepmc.gz└── unweighted_events.lhe.gzThis structure and be directly used by the feature set generation code provided by the paper repository: https://github.com/lang-lukas/KitchenSink.
events.h5
The events.h5 file contains the unordered 3-momentum of the reconstructed jet constituents for each event. All the particles are considered to be massless. The data frame has the shape (NUMBER_OF_EVENTS, 2101) where the first 2100 entries correspond to the 3-momentum of up to 700 jet constituents ($p_T$, $\eta$, $\phi$ for each constituent), and the last entry is the corresponding label (0 for background, 1 for signal). The columns are named as pt0, eta0, phi0, pt1, eta1, phi1, ..., pt699, eta699, phi699, and signal.
MODEL_NAME_subjettinesses.h5
This file contains the N-subjettiness features for each event, calculated using the FastJet contrib package. The data frame has the shape (NUMBER_OF_EVENTS, 63) where the first 31 columns correspond to 4-momneta (pxj1, pyj1, pzj1, mj1) and the N-subjettiness features $\tau_N^{(\beta)}$ for $N \leq 9$ and $\beta \in \{0.5, 1.0, 2.0\}$ for the leading jet, and the next 31 columns correspond to the same features for the subleading jet.The columns are named as:
pxj1, pyj1, pzj1, mj1, tau1j1_5, ... , tau9j1_5, tau1j1_1, ... , tau9j1_1, tau1j1_2, ... , tau9j1_2
followed by the same features for the subleading jet with j1 replaced by j2 in the column names and end with the label column.
MODEL_NAME_efps.h5
This file contains the Energy Flow Polynomial (EFP) features for each event, calculated using the EnergyFlow package. The set includes only prime EFPs with number of edges $d \leq 7$, $\beta = 1$ and $\kappa = 1$, calculated in the hardronic default (hdr) measure.
The data frame has the shape (NUMBER_OF_EVENTS, 980) where the first 490 columns correspond to the EFP features for the leading jet and the next 490 columns correspond to the EFP features for the subleading jet. The columns are indexed from 0 to 979 and the corresponding graph structure can be found in the EFPs_d<=7_graphs.txt file in the root directory of this Zenodo.
Features for the LHC Olympics 2020 R&D dataset
The low-level features for the LHCO dataset are provided by the original authors under https://zenodo.org/records/6466204 and the extra background events can be found under https://zenodo.org/records/8370758.For these events the subjettiness features have bin calculated for the "Back to the roots" paper under the repository https://github.com/uhh-pd-ml/treebased_anomaly_detection.
We provide the EFP features for these events with:
- LHCO_2-prong_efps.h5: EFP features for the 1 million LHCO background events and the 100k 2-prong signal events.- LHCO_3-prong_efps.h5: EFP features 100k 3-prong signal events.- LHCO_extra_bkg_efps.h5: EFP features for the additional background events in the SR
The EFP features are saved in the same format as the features for the generated models, as described above, and include the same set of EFPs (prime EFPs with number of edges $d \leq 7$, $\beta = 1$ and $\kappa = 1$, calculated in the hardronic default (hdr) measure).
提供机构:
Zenodo
创建时间:
2026-04-22



