RODEM Jet Datasets
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12793615
下载链接
链接失效反馈官方服务:
资源简介:
A detailed description of the RODEM Jet Datasets is provided at arXiv:2408.11616.
Jet types
There are five different types of datasets:
Light jets: simulated via QCD dijet events (QCD.tar.gz)
Jets from W bosons: simulated via WZ production (WZ.tar.gz)
Jets from top quarks: simulated via ttbar production (ttbar.tar.gz)
Semi-visible jets: simulated via dark-sector quarks (SIMP.tar.gz)
Resonant Higgs boson production: simulated via type-II two-Higgs-doublet models (2HDM.tar.gz)
The tar.gz archives contain files in the HDF5 format, compressed using 7z. For types 1 to 4, validation and training splits of 5% of the total event count are provided. The remaining events are split into (decompressed) chunks no larger than 8GB.
For the 2HDM models, two production modes (via g-g fusion and b-bbar annihilation) and two decay modes (h --> jj and t --> tb) are simulated. In addition, various heavy-Higgs and light-Higgs mass combinations were produced.
Dataset content
All HDF5 files contain four dataset objects:
jet1_obs – observables for the leading jet
jet1_cnsts – constituent array for the leading jet
jet2_obs – observables for the subleading jet
jet2_cnsts – constituent array for the subleading jet
The latter two are not present in the WZ files.
The observable dataset objects contain one row per event with 11 entries (in this order): pT, eta, phi, mass, tau1, tau2, tau3, d12, d23, ECF2, ECF3 (for details on the calculation, see arXiv).
The constituent dataset objects contain 100 rows per event with seven entries each. The 100 rows represent (up to) 100 jet constituents; if the jet has fewer, the rows are zero-padded. The seven entries per row are (in this order): pT, eta, phi, mass, charge, D0, DZ (for details, see arXiv).
Usage Example
The following snippet loads 100,000 jets and their constituents from one of the QCD input files, then creates distributions of the jet transverse momenta and the number of constituents:
import h5py
import numpy as np
import matplotlib.pyplot as plt
# The input HDF5 file containing the QCD jets.
input_qcd = "h5files/QCDjj_pT_450_1200_train01.h5"
# The number of jets to load.
n_jets = 100_000
def load_jets(ifile: str, n_jets: int):
"""Load jets and constituents from an HDF5 file."""
with h5py.File(ifile, "r") as f:
cnsts = f["objects/jets/jet1_cnsts"][:n_jets]
jets = f["objects/jets/jet1_obs"][:n_jets]
zeros = np.repeat(cnsts[:, :, 0] == 0, cnsts.shape[2])
zeros = zeros.reshape(-1, cnsts.shape[1], cnsts.shape[2])
cnsts = np.ma.masked_where(zeros, cnsts)
return jets, cnsts
qcd_jets, qcd_constituents = load_jets(input_qcd, n_jets=n_jets)
# Plot the transverse momentum of the jets.
plt.hist(qcd_jets[:, 0], label="QCD jets", bins=30)
plt.xlabel(r"$p_{\mathrm{T}}$ [GeV]")
plt.ylabel("Number of jets")
plt.show()
# Plot the number of constituents in the jets.
plt.hist(qcd_constituents.count(axis=1)[:, 0], label="QCD jets", bins=100, range=(0.5, 100.5))
plt.xlabel("Number of constituents")
plt.ylabel("Number of jets")
plt.show()
Citing this work
Please cite the work as follows:
K. Zoch, J. A. Raine, D. Sengupta, and T. Golling. RODEM Jet Datasets. Available on Zenodo: 10.5281/zenodo.12793616. Aug. 2024. arXiv: 2408.11616 [hep-ph].
Bibtex entry:
@misc{Zoch:2024eyp,
author = "Zoch, Knut and Raine, John Andrew and Sengupta, Debajyoti and Golling, Tobias",
title = "{RODEM Jet Datasets}",
eprint = "2408.11616",
archivePrefix = "arXiv",
primaryClass = "hep-ph",
month = "8",
year = "2024",
note = "Available on Zenodo: \href{https://doi.org/10.5281/zenodo.12793616}{10.5281/zenodo.12793616}."
}
创建时间:
2024-08-23



