moondream/refcoco-m

Name: moondream/refcoco-m
Creator: moondream
Published: 2025-11-17 22:43:51
License: 暂无描述

Hugging Face2025-11-17 更新2026-01-03 收录

下载链接：

https://hf-mirror.com/datasets/moondream/refcoco-m

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: image dtype: image - name: image_id dtype: int64 - name: file_name dtype: string - name: samples list: - name: id dtype: int64 - name: image_id dtype: int64 - name: category dtype: string - name: supercategory dtype: string - name: label dtype: string - name: sentences list: string - name: bbox list: float32 - name: mask struct: - name: counts dtype: string - name: size list: int32 splits: - name: validation num_bytes: 610011498 num_examples: 1190 download_size: 609559173 dataset_size: 610011498 configs: - config_name: default data_files: - split: validation path: data/validation-* license: mit language: - en pretty_name: RefCOCO-M size_categories: - 1K<n<10K --- ![RefCOCO-M banner](assets/refcocom_banner.png) ### RefCOCO-M: Refined Referring Expression Segmentation RefCOCO has long been a standard benchmark for [referring expression segmentation](https://arxiv.org/pdf/1603.06180), but it has two major issues: poor mask quality and harmful referring expressions. Modern models now produce masks that are more accurate than the ground-truth annotations, which makes RefCOCO an imprecise measure of segmentation quality. RefCOCO-M is a cleaned version of the RefCOCO (UNC) validation split. We replace the original instance masks with pixel-accurate masks and remove harmful samples. RefCOCO-M contains 1,190 images, 2,080 instance masks, and 5,598 referring expressions. The images and referring expressions remain identical to the original RefCOCO validation set. --- #### Construction For each referred instance in the original RefCOCO validation set, we run a re-segmentation pipeline with an ensemble of models and keep only high-confidence masks. This removes 47% of masks due to unrecoverable quality. A separate model removes a further 0.5% of samples for harmful language. #### Before/After Re-segmentation The original RefCOCO masks are hand-drawn polygons and can be highly inaccurate: they are coarse, with inflated boundaries and missing fine structure. The examples below show that RefCOCO-M masks have tighter boundaries and capture details that are missing from the original masks. ![RefCOCO-M banner](assets/refcocom_old_new.png) #### Harmful Examples The original RefCOCO validation set includes descriptions with slurs, sexualized language, and degrading phrases. The examples below are drawn from the 46 samples removed by the RefCOCO-M safety pipeline. ![RefCOCO-M banner](assets/refcocom_filtered.png) --- #### Data Format The data is structured in COCO format. Each image-level record contains: * `file_name`: COCO 2014 file name. * `image_meta`: dict containing `width`, `height`, and `image_id`. * `image`: dict with raw bytes and a relative path: {"bytes": ..., "path": "images/<file_name>"}. * `samples`: list of instance annotations for that image. Each sample entry describes one referred instance and its mask: * `id`: unique instance id. * `category`: COCO category label. * `supercategory`: COCO supercategory label. * `sentences`: list of referring expressions for this instance. * `bbox`: [x, y, w, h] in COCO pixel coordinates. * `mask`: single COCO-style RLE mask, given as `{"counts": str, "size": [H, W]}`, where `H` and `W` are the image height and width. --- #### Evaluation Protocol For each sample and each sentence in `sample["sentences"]`, we treat (image, sentence) as one evaluation example with ground-truth mask `sample["mask"]`. Given a predicted binary mask for each example, we compute IoU with respect to the corresponding ground-truth mask and average IoU across all examples: $$ \mathrm{IoU} = \frac{|\hat{M} \cap M|}{|\hat{M} \cup M|}, \qquad \mathrm{mIoU} = \frac{1}{N} \sum_{i=1}^N \mathrm{IoU}_i $$ where N is the total number of evaluation examples (image, sentence) in RefCOCO-M.

提供机构：

moondream

5,000+

优质数据集

54 个

任务类型

进入经典数据集