five

avrecum/opensam-discrimination-coco

收藏
Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/avrecum/opensam-discrimination-coco
下载链接
链接失效反馈
官方服务:
资源简介:
# SAM Discrimination Fine-tuning Dataset A dataset of images containing multiple same-class objects, each annotated with discriminative referring expressions and SAM-generated segmentation masks. **Total images:** 3,962 **Parquet shards:** 4 ## Schema Each parquet row represents one **image** with all its annotated objects. | Column | Type | Description | |--------|------|-------------| | `image` | `binary` | Raw image bytes (JPEG) | | `image_id` | `string` | Unique image identifier | | `image_width` | `int32` | Image width in pixels | | `image_height` | `int32` | Image height in pixels | | `dataset` | `string` | Source dataset (e.g. `"coco_train2017"`, `"cc3m_train"`) | | `objects_json` | `string` | JSON array of object annotations (see below) | | `num_objects` | `int32` | Number of annotated objects in this image | ### Object annotation format (`objects_json`) Each element in the `objects_json` array is a dict with: | Field | Type | Description | |-------|------|-------------| | `class_name` | `string` | Object class from RT-DETR (e.g. `"person"`, `"car"`) | | `bbox` | `list[float]` | Bounding box `[x1, y1, x2, y2]` in absolute pixels | | `prompt` | `string` | Discriminative referring expression (max ~20 words) | | `mask_rle` | `dict \| null` | SAM-generated mask in COCO RLE format (see below) | | `peer_indices` | `list[int]` | Indices of same-class distractor objects in this list | | `detector_score` | `float` | RT-DETR detection confidence (0–1) | ### Mask RLE format (`mask_rle`) Masks use COCO-style run-length encoding, compatible with `pycocotools`: | Field | Type | Description | |-------|------|-------------| | `size` | `list[int]` | `[height, width]` of the mask | | `counts` | `string` | Base64-encoded RLE bytes | To decode a mask: ```python import base64 import numpy as np import pycocotools.mask as mask_util rle = {"size": obj["mask_rle"]["size"], "counts": base64.b64decode(obj["mask_rle"]["counts"])} mask = mask_util.decode(rle) # np.ndarray [H, W], dtype=uint8 ``` ### Peer indices Each object's `peer_indices` lists the indices of other objects in the same `objects_json` array that belong to the **same class**. This encodes the discrimination structure: the `prompt` for object *i* should uniquely identify it among objects `[i] + peer_indices[i]`. ## Usage ```python from project.discrimination.schema import DiscriminationDatasetReader reader = DiscriminationDatasetReader("path/to/parquet_dir") for image_row in reader: # image_row.image_bytes, image_row.objects, etc. for obj in image_row.objects: print(obj.prompt, obj.bbox, obj.mask_rle) ``` ## Generation Produced by the OpenSAM discrimination mining pipeline: 1. RT-DETR object detection → find same-class duplicates 2. InternVL-14B captioning, distinctness filtering, occlusion filtering 3. Colored-box discriminative prompt generation + validation 4. SAM 3 mask generation (text + box prompt, clipped to bbox)
提供机构:
avrecum
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作