Fuτure - dataset for studies, development, and training of algorithms for reconstructing and identifying hadronically decaying tau leptons
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12664633
下载链接
链接失效反馈官方服务:
资源简介:
Data description
MC Simulation
The Fuτure dataset is intended for studies, development, and training of algorithms for reconstructing and identifying hadronically decaying tau leptons. The dataset is generated with Pythia 8, with the full detector simulation being performed by Geant4 with the CLIC-like detector setup CLICdet (CLIC_o3_v14) setup. Events are reconstructed using the Marlin reconstruction framework and interfaced with Key4HEP. Particle candidates in the reconstructed events are reconstructed using the PandoraPF algorithm.
In this version of the dataset no γγ -> hadrons background is included.
Samples
This dataset contains e+e- samples with Z->ττ, ZH,H->ττ and Z->qq events, with approximately 2 million events simulated in each category.
The following processes e+e- were simulated with Pythia 8 at sqrt(s) = 380 GeV:
p8_ee_qq_ecm380 [Z -> qq events]
p8_ee_ZH_Htautau [ZH -> Ztautau]
p8_ee_Z_Ztautau_ecm380 [ZH -> Ztautau]
The .root files from the MC simulation chain are eventually processed by the software found in Github in order to create flat ntuples as the final product.
Features
The basis of the ntuples are the particle flow (PF) candidates from PandoraPF. Each PF candidate has four momenta, charge and particle label (electron / muon / photon / charged hadron / neutral hadron). The PF candidates in a given event are clustered into jets using generalized kt algorithm for ee collisions, with parameters p=-1 and R=0.4. The minimum pT is set to be 0 GeV for both generator level jets and reconstructed jets. The dataset contains the four momenta of the jets, with the PF candidates in the jets with the above listed properties.
Additionally, a set of variables describing the tau lifetime are calculated using the software in Github. As tau lifetime is very short, these variables are sensitive to true tau decays. In the calculation of these lifetime variables, we use a linear approximation.
In summary, the features found in the flat ntuples are:
Name
Description
reco_cand_p4s
4-momenta per particle in the reco jet.
reco_cand_charge
Charge per particle in the jet.
reco_cand_pdg
PDGid per particle in the jet.
reco_jet_p4s
RecoJet 4-momenta.
reco_cand_dz
Longitudinal impact parameter per particle in the jet. For future steps. Fill value used for neutral particles as no track parameters can be calculated.
reco_cand_dz_err
Uncertainty of the longitudinal impact parameter per particle in the jet. For future steps. Fill value used for neutral particles as no track parameters can be calculated.
reco_cand_dxy
Transverse impact parameter per particle in the jet. For future steps. Fill value used for neutral particles as no track parameters can be calculated.
reco_cand_dxy_err
Uncertainty of the transverse impact parameter per particle in the jet. For future steps. Fill value used for neutral particles as no track parameters can be calculated.
gen_jet_p4s
GenJet 4-momenta. Matched with RecoJet within a cone of radius dR < 0.3.
gen_jet_tau_decaymode
Decay mode of the associated genTau. Jets that have associated leptonically decaying taus are removed, so there are no DM=16 jets. If no GenTau can be matched to GenJet within dR < 0.4, a fill value is used.
gen_jet_tau_p4s
Visible 4-momenta of the genTau. If no GenTau can be matched to GenJet within dR<0.4, a fill value is used.
The ground truth is based on stable particles at the generator level, before detector simulation. These particles are clustered into generator-level jets and are matched to generator-level τ leptons as well as reconstructed jets. In order for a generator-level jet to be matched to generator-level τ lepton, the τ lepton needs to be inside a cone of dR = 0.4. The same applies for the reconstructed jet, with the requirement on dR being set to dR = 0.3. For each reconstructed jet, we define three target values related to τ lepton reconstruction:
a binary flag isTau if it was matched to a generator-level hadronically decaying τ lepton. gen_jet_tau_decaymode of value -1 indicates no match to generator-level hadronically decaying τ.
the categorical decay mode of the τ gen_jet_tau_decaymode in terms of the number of generator level charged and neutral hadrons. Possible gen_jet_tau_decaymode are {0, 1, . . . , 15}.
if matched, the visible (neglecting neutrinos), reconstructable pT of the τ lepton. This is inferred from the gen_jet_tau_p4s
Contents:
qq_test.parquet
qq_train.parquet
zh_test.parquet
zh_train.parquet
z_test.parquet
z_train.parquet
data_intro.ipynb
Dataset characteristics
File
# Jets
Size
z_test.parquet
870 843
171 MB
z_train.parquet
3 483 369
681 MB
zh_test.parquet
1 068 606
213 MB
zh_train.parquet
4 274 423
851 MB
qq_test.parquet
6 366 715
1.4 GB
qq_train.parquet
25 466 858
5.6 GB
The dataset consists of 6 files of 8.9 GB in total.
How can you use these data?
The .parquet files can be directly loaded with the Awkward Array Python library.An example how one might use the dataset and the features is given in data_intro.ipynb
创建时间:
2024-10-03



