Test Sets for Jet Anomaly Detection at the LHC
收藏Mendeley Data2024-03-27 更新2024-06-27 收录
下载链接:
https://zenodo.org/record/4614656
下载链接
链接失效反馈官方服务:
资源简介:
A few datasets are updated in Version 2.1. These datasets are tagged with 'new' in the file names. - pT~1.2TeV top jets, the mother Z' mass is slightly adjusted to better match pT peaks around 1.2TeV - Especially there was a bug in the previous version for top jets with mass 80 GeV `*top_m80_100k*` (the top jet mass was not correctly set). So please be careful when you use those datasets. Data Description These datasets are generated as a series of test sets for anomalous jet tagging at the LHC. They include boosted W jets, Top jets, and Higgs jets. Jet transverse momentum is focused around 600 GeV and 1200 GeV (with prefix "pt1200_" in file names). Each file includes 100k original events from MadGraph, but might have slightly less events in the final h5 files due to fatjet pre-selection. Production processes include: pp -> W' -> W (jj) Z(\(\nu \nu\)); \(m_{W} = 59, 80, 120, 174 ~ \textrm{GeV}\) pp -> Z' -> t t~; \(m_t=80, 174 ~ \textrm{GeV}\). For m_t=80 GeV, the decay product W mass is set to 20 GeV. pp -> HH -> (hh) (hh), (h -> jj); \(m_H=174 ~ \textrm{GeV}\), \(m_h = 20, 80 ~ \textrm{GeV}\) Data Generation Jet samples in this dataset are generated with MadGraph, Pythia8, and Delphes (no pile-up effects simulated). Particle flow objects are used to cluster jets. FastJet was used for jet clustering. Jets are clustered using the anti-kt algorithm with the cone size R=1.0. Leading jet: \(p_T>450 ~ \textrm{GeV}\); sub-leading jet: \(p_T>200 ~ \textrm{GeV}\) Data Structure To get jets: f['objects/jets'] For jets, there are two datasets: ['constituents', 'obs']. (jets information is stored with the higher-pt jet first) `obs[:, n_j - 1]`: jet four vectors and n-subjettiness for the \(n_j\)-th jet (pt, eta, phi, m, tau1, tau2, tau3, tau4, tau5) pt-sorted (highest first) jet constituents information are stored in variable length arrays for the \(n_j\)-th jet `constituents[:, n_j - 1]`: \(\{ E_i, P_{xi}, P_{yi}, P_{zi}, \textrm{PID}_i\}\) (PID: PDG for tracks; [22] for photons; [0] for neutral hadrons) Extra Notes Since the dataset is structured as events, for W jet samples, only leading jet is available; while for Top and Higgs jets, leading and sub-leading jets are both valid. One might need to restrict the jet \(p_T\) range at use. e.g. to get leading jet constituents: `f["objects/jets/constituents"][:, 0]` The file names are self-explanatory on the corresponding generation process. Each file was generated in 100K original Madgraph events. After the preselection, a small fraction of events is discarded. Contact we are welcoming any feedback, suggestions, or requests on new test samples. Please contact chengtaoli.1990@gmail.com for more information.
创建时间:
2023-06-28



