five

HEPMASS-IMB

收藏
Zenodo2022-04-20 更新2026-05-25 收录
下载链接:
https://zenodo.org/record/6453047
下载链接
链接失效反馈
官方服务:
资源简介:
<strong>HEPMASS-IMB</strong> is a benchmark dataset for <em>signal-background classification</em> in High-Energy Physics (HEP), derived from HEPMASS (Baldi et al.) by imbalancing it two times: on the class labels, as well as on the mass labels. It has 27 feature columns (named from <em>f0</em> to <em>f26</em>), and a 28-th mass feature (named <em>mass</em>). The 27 features are <em>already normalized</em> to have approximately zero-mean and unitary variance. The mass feature has five unique values: <em>500</em>, <em>750</em>, <em>1000</em>, <em>1250</em>, and <em>1500</em>. There are two class labels: 1 (signal), and 0 (background). The dataset describes the decay of an hypothetical particle: \(X \to t\bar{t}\to X-&gt;t\bar{t} \to W^+bW^-\bar{b}\). Further details about the original dataset are available here, whereas a description of our modifications is presented in our paper. NOTE: The files provided here represent only the <em>training-set</em>, since it's what is diverse compared to the original HEPMASS. The label column has been renamed from "# label" to "type". There are two new columns: <em>name</em>, and <em>weight</em>. Steps to adapt `all_test.csv` (from HEPMASS): <pre><code class="language-python"># 1. Load csv df = pd.read_csv('&lt;your-path&gt;/all_test.csv') # 2. Rename columns df.rename(columns={'# label': 'type'}, inplace=True) # 3. Adjust mass column mass = np.sort(df['mass'].unique()) df.loc[df['mass'] == mass[0], 'mass'] = 500.0 # 4. Finally save the new csv df.to_csv('&lt;your-path&gt;/test.csv', index=False)</code></pre>
提供机构:
Zenodo
创建时间:
2022-04-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作