avrecum/opensam-discrimination-coco
收藏Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/avrecum/opensam-discrimination-coco
下载链接
链接失效反馈官方服务:
资源简介:
# SAM Discrimination Fine-tuning Dataset
A dataset of images containing multiple same-class objects, each annotated
with discriminative referring expressions and SAM-generated segmentation masks.
**Total images:** 3,962
**Parquet shards:** 4
## Schema
Each parquet row represents one **image** with all its annotated objects.
| Column | Type | Description |
|--------|------|-------------|
| `image` | `binary` | Raw image bytes (JPEG) |
| `image_id` | `string` | Unique image identifier |
| `image_width` | `int32` | Image width in pixels |
| `image_height` | `int32` | Image height in pixels |
| `dataset` | `string` | Source dataset (e.g. `"coco_train2017"`, `"cc3m_train"`) |
| `objects_json` | `string` | JSON array of object annotations (see below) |
| `num_objects` | `int32` | Number of annotated objects in this image |
### Object annotation format (`objects_json`)
Each element in the `objects_json` array is a dict with:
| Field | Type | Description |
|-------|------|-------------|
| `class_name` | `string` | Object class from RT-DETR (e.g. `"person"`, `"car"`) |
| `bbox` | `list[float]` | Bounding box `[x1, y1, x2, y2]` in absolute pixels |
| `prompt` | `string` | Discriminative referring expression (max ~20 words) |
| `mask_rle` | `dict \| null` | SAM-generated mask in COCO RLE format (see below) |
| `peer_indices` | `list[int]` | Indices of same-class distractor objects in this list |
| `detector_score` | `float` | RT-DETR detection confidence (0–1) |
### Mask RLE format (`mask_rle`)
Masks use COCO-style run-length encoding, compatible with `pycocotools`:
| Field | Type | Description |
|-------|------|-------------|
| `size` | `list[int]` | `[height, width]` of the mask |
| `counts` | `string` | Base64-encoded RLE bytes |
To decode a mask:
```python
import base64
import numpy as np
import pycocotools.mask as mask_util
rle = {"size": obj["mask_rle"]["size"],
"counts": base64.b64decode(obj["mask_rle"]["counts"])}
mask = mask_util.decode(rle) # np.ndarray [H, W], dtype=uint8
```
### Peer indices
Each object's `peer_indices` lists the indices of other objects in the same
`objects_json` array that belong to the **same class**. This encodes the
discrimination structure: the `prompt` for object *i* should uniquely identify
it among objects `[i] + peer_indices[i]`.
## Usage
```python
from project.discrimination.schema import DiscriminationDatasetReader
reader = DiscriminationDatasetReader("path/to/parquet_dir")
for image_row in reader:
# image_row.image_bytes, image_row.objects, etc.
for obj in image_row.objects:
print(obj.prompt, obj.bbox, obj.mask_rle)
```
## Generation
Produced by the OpenSAM discrimination mining pipeline:
1. RT-DETR object detection → find same-class duplicates
2. InternVL-14B captioning, distinctness filtering, occlusion filtering
3. Colored-box discriminative prompt generation + validation
4. SAM 3 mask generation (text + box prompt, clipped to bbox)
提供机构:
avrecum



