DeepLearnPhysics/PILArNet-M
收藏Hugging Face2025-12-02 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/DeepLearnPhysics/PILArNet-M
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- image-segmentation
- object-detection
tags:
- particle
- physics
- 3D
- simulation
- lartpc
- pointcloud
pretty_name: >-
Public Dataset for Particle Imaging Liquid Argon Detectors in High Energy
Physics - Medium
size_categories:
- 1M<n<10M
---
# Public Dataset for Particle Imaging Liquid Argon Detectors in High Energy Physics
We provide the 168 GB **PILArNet-Medium** dataset, a continuation of the [PILArNet](https://arxiv.org/abs/2006.01993) dataset, consisting of ~1.2 million events from liquid argon time projection chambers ([LArTPCs](https://www.symmetrymagazine.org/article/october-2012/time-projection-chambers-a-milestone-in-particle-detector-technology?language_content_entity=und)).
Each event contains 3D ionization trajectories of particles as they traverse the detector. Typical downstream tasks include:
- Semantic segmentation of voxels into particle-like categories
- Particle-level (instance-level) segmentation and identification
- Interaction-level grouping of particles that belong to the same interaction
## Directory structure
The dataset is stored in HDF5 format and organized as:
```plaintext
/path/to/dataset/
/train/
/generic_v2_196200_v2.h5
/generic_v2_153600_v1.h5
...
/val/
/generic_v2_66800_v2.h5
...
/test/
/generic_v2_50000_v1.h5
...
````
The number preceding the second `v2` indicates the number of events contained in the file.
Dataset split:
* **Train:** 1,082,400 events
* **Validation:** 66,800 events
* **Test:** 50,000 events
## Data format
Each HDF5 file contains three main datasets: `point`, `cluster`, and `cluster_extra`.
Entries are stored as variable length 1D arrays and should be reshaped event by event.
### `point` dataset
Each entry of `point` corresponds to a single event and encodes all spacepoints for that event in a flattened array. After reshaping, each row corresponds to a point:
Shape per event: `(N, 8)`
Columns (per point):
1. `x` coordinate (integer voxel index, 0 to 768)
2. `y` coordinate (integer voxel index, 0 to 768)
3. `z` coordinate (integer voxel index, 0 to 768)
4. Voxel value (in MeV)
5. Energy deposit `dE` (in MeV)
6. Absolute time in nanoseconds
7. Number of electrons
8. `dx` in millimeters
Example:
```python
import h5py
EVENT_IDX = 0
with h5py.File("/path/to/dataset/train/generic_v2_196200_v2.h5", "r") as h5f:
point_flat = h5f["point"][EVENT_IDX]
points = point_flat.reshape(-1, 8) # (N, 8)
```
### `cluster` dataset
Each entry of `cluster` corresponds to the set of clusters for a single event. After reshaping, each row corresponds to a cluster:
Shape per event: `(M, 6)`
Columns (per cluster):
1. Number of points in the cluster
2. Fragment ID
3. Group ID
4. Interaction ID
5. Semantic type (class ID, see below)
6. Particle ID (PID, see below)
Example:
```python
with h5py.File("/path/to/dataset/train/generic_v2_196200_v2.h5", "r") as h5f:
cluster_flat = h5f["cluster"][EVENT_IDX]
clusters = cluster_flat.reshape(-1, 6) # (M, 6)
```
### `cluster_extra` dataset
Each entry of `cluster_extra` provides additional per-cluster information for a single event. After reshaping, each row corresponds to a cluster:
Shape per event: `(M, 5)`
Columns (per cluster):
1. Particle mass (from PDG)
2. Particle momentum (magnitude)
3. Particle vertex `x` coordinate
4. Particle vertex `y` coordinate
5. Particle vertex `z` coordinate
Example:
```python
with h5py.File("/path/to/dataset/train/generic_v2_196200_v2.h5", "r") as h5f:
cluster_extra_flat = h5f["cluster_extra"][EVENT_IDX]
cluster_extra = cluster_extra_flat.reshape(-1, 5) # (M, 5)
```
### Cluster and point ordering
Points in the `point` array are ordered by the cluster they belong to. For a given event:
* Let `clusters[i, 0]` be the number of points in cluster `i`
* Then points for cluster `0` occupy the first `clusters[0, 0]` rows in `points`
* Points for cluster `1` occupy the next `clusters[1, 0]` rows, and so on
This ordering allows you to map cluster-level attributes (`cluster` and `cluster_extra`) back to the underlying points.
### Removing low energy deposits (LED)
By construction, the first cluster in each event (`cluster[0]`) corresponds to amorphous low energy deposits or blips: these are treated as uncountable "stuff" and labeled as LED.
To remove LED points from an event:
```python
EVENT_IDX = 0
with h5py.File("/path/to/dataset/train/generic_v2_196200_v2.h5", "r") as h5f:
point_flat = h5f["point"][EVENT_IDX]
cluster_flat = h5f["cluster"][EVENT_IDX]
points = point_flat.reshape(-1, 8) # (N, 8)
clusters = cluster_flat.reshape(-1, 6) # (M, 6)
# Number of points belonging to LED (cluster 0)
n_led_points = clusters[0, 0]
# Drop LED points
points_no_led = points[n_led_points:] # points belonging to non-LED clusters
```
LED clusters also have special values in the ID fields, described in the label schema below.
## Label schema
This section summarizes the label conventions used in the dataset for semantic segmentation, particle identification, and instance or interaction level grouping.
### Semantic segmentation classes
Semantic labels are given by the field in `cluster[:, 4]`.
The mapping is:
| Semantic ID | Class name |
| ----------- | ---------- |
| 0 | Shower |
| 1 | Track |
| 2 | Michel |
| 3 | Delta |
| 4 | LED |
Here, LED denotes low energy deposits or amorphous "stuff" that is not counted as a particle instance.
To perform semantic segmentation at the point level, use the cluster ordering:
1. Expand cluster semantic labels to per-point labels according to the point counts per cluster.
2. Optionally remove LED points (Semantic ID 4) as shown above.
### Particle identification (PID) labels
Particle identification uses the Particle ID field in `cluster[:, 5]`.
The mapping is:
| ID | Particle type |
| --- | ---------------------------------- |
| 0 | Photon |
| 1 | Electron |
| 2 | Muon |
| 3 | Pion |
| 4 | Proton |
| 5 | Kaon (not present in this dataset) |
| 6 | None (LED) |
LED clusters that correspond to low energy deposits use `PID = 6`.
These clusters are typically also `Semantic ID = 4` and treated as "stuff".
### Instance and interaction IDs
The `cluster` dataset contains several integer IDs to support different grouping granularities:
* **Fragment ID** (`cluster[:, 1]`):
Identifies contiguous fragments of a particle. Multiple fragments may belong to the same particle.
* **Group ID** (`cluster[:, 2]`):
Identifies particle-level instances. All clusters with the same group ID correspond to the same physical particle.
* Use `Group ID` for particle instance segmentation or particle-level identification tasks.
* **Interaction ID** (`cluster[:, 3]`):
Identifies interaction-level groups. All particles with the same interaction ID belong to the same interaction (for example a neutrino interaction and its secondaries).
* Use `Interaction ID` for interaction-level segmentation or classification.
For LED clusters, all three IDs
* Fragment ID
* Group ID
* Interaction ID
are set to `-1`. This differentiates LED clusters from genuine particle or interaction instances.
## Reconstruction Tasks
Typical uses of this dataset include:
* **Semantic segmentation**:
Predict voxelwise semantic labels (shower, track, Michel, delta, LED) using the `Semantic type` field.
* **Particle-level segmentation and PID**:
* Use `Group ID` to define particle instances.
* Use `PID` to assign particle type (photon, electron, muon, pion, proton, None).
* **Interaction-level reconstruction**:
* Use `Interaction ID` to group particles belonging to the same physics interaction.
* Use `cluster_extra` for per-particle momentum and vertex information.
## Getting started
A [Colab notebook](https://colab.research.google.com/drive/1x8WatdJa5D7Fxd3sLX5XSJiMkT_sG_im) is provided for a hands-on introduction to loading and inspecting the dataset.
## Citation
```bibtex
@misc{young2025particletrajectoryrepresentationlearning,
title={Particle Trajectory Representation Learning with Masked Point Modeling},
author={Sam Young and Yeon-jae Jwa and Kazuhiro Terao},
year={2025},
eprint={2502.02558},
archivePrefix={arXiv},
primaryClass={hep-ex},
doi={10.48550/arXiv.2502.02558},
url={https://arxiv.org/abs/2502.02558},
}
```
提供机构:
DeepLearnPhysics



