refcoco-m

Name: refcoco-m
Creator: maas
Published: 2025-12-05 16:56:47
License: 暂无描述

魔搭社区2025-12-05 更新2025-11-22 收录

下载链接：

https://modelscope.cn/datasets/moondream/refcoco-m

下载链接

链接失效反馈

官方服务：

资源简介：

![RefCOCO-M banner](assets/refcocom_banner.png) ### RefCOCO-M: Refined Referring Expression Segmentation RefCOCO has long been a standard benchmark for [referring expression segmentation](https://arxiv.org/pdf/1603.06180), but it has two major issues: poor mask quality and harmful referring expressions. Modern models now produce masks that are more accurate than the ground-truth annotations, which makes RefCOCO an imprecise measure of segmentation quality. RefCOCO-M is a cleaned version of the RefCOCO (UNC) validation split. We replace the original instance masks with pixel-accurate masks and remove harmful samples. RefCOCO-M contains 1,190 images, 2,080 instance masks, and 5,598 referring expressions. The images and referring expressions remain identical to the original RefCOCO validation set. --- #### Construction For each referred instance in the original RefCOCO validation set, we run a re-segmentation pipeline with an ensemble of models and keep only high-confidence masks. This removes 47% of masks due to unrecoverable quality. A separate model removes a further 0.5% of samples for harmful language. #### Before/After Re-segmentation The original RefCOCO masks are hand-drawn polygons and can be highly inaccurate: they are coarse, with inflated boundaries and missing fine structure. The examples below show that RefCOCO-M masks have tighter boundaries and capture details that are missing from the original masks. ![RefCOCO-M banner](assets/refcocom_old_new.png) #### Harmful Examples The original RefCOCO validation set includes descriptions with slurs, sexualized language, and degrading phrases. The examples below are drawn from the 46 samples removed by the RefCOCO-M safety pipeline. ![RefCOCO-M banner](assets/refcocom_filtered.png) --- #### Data Format The data is structured in COCO format. Each image-level record contains: * `file_name`: COCO 2014 file name. * `image_meta`: dict containing `width`, `height`, and `image_id`. * `image`: dict with raw bytes and a relative path: {"bytes": ..., "path": "images/<file_name>"}. * `samples`: list of instance annotations for that image. Each sample entry describes one referred instance and its mask: * `id`: unique instance id. * `category`: COCO category label. * `supercategory`: COCO supercategory label. * `sentences`: list of referring expressions for this instance. * `bbox`: [x, y, w, h] in COCO pixel coordinates. * `mask`: single COCO-style RLE mask, given as `{"counts": str, "size": [H, W]}`, where `H` and `W` are the image height and width. --- #### Evaluation Protocol For each sample and each sentence in `sample["sentences"]`, we treat (image, sentence) as one evaluation example with ground-truth mask `sample["mask"]`. Given a predicted binary mask for each example, we compute IoU with respect to the corresponding ground-truth mask and average IoU across all examples: $$ \mathrm{IoU} = \frac{|\hat{M} \cap M|}{|\hat{M} \cup M|}, \qquad \mathrm{mIoU} = \frac{1}{N} \sum_{i=1}^N \mathrm{IoU}_i $$ where N is the total number of evaluation examples (image, sentence) in RefCOCO-M.

![RefCOCO-M banner](assets/refcocom_banner.png) ### RefCOCO-M：精细化指代表达式分割基准 RefCOCO长期作为指代表达式分割（referring expression segmentation）的标准基准数据集，相关原始研究可参阅：https://arxiv.org/pdf/1603.06180，但存在两大核心缺陷：掩码质量不佳，以及存在问题的指代表达式。如今的现代模型生成的掩码精度已超越原始标注的真值掩码，这使得RefCOCO不再是衡量分割性能的精准指标。 RefCOCO-M是RefCOCO（UNC）验证集的清洗版本。我们将原始实例掩码替换为像素级精准的掩码，并移除存在问题的样本。RefCOCO-M包含1190张图像、2080个实例掩码以及5598条指代表达式。其图像与指代表达式与原始RefCOCO验证集完全一致。 --- #### 数据集构建流程针对原始RefCOCO验证集中的每个指代实例，我们采用模型集成的重分割流水线进行处理，仅保留高置信度的掩码。由于质量无法修复，我们移除了47%的掩码；另有独立模型基于有害语言过滤，进一步移除了0.5%的样本。 #### 重分割前后对比原始RefCOCO掩码为手工绘制的多边形，往往精度极差：边界粗糙、存在膨胀问题且缺失精细结构。如下示例可见，RefCOCO-M的掩码边界更紧凑，且捕捉到了原始掩码遗漏的细节。 ![RefCOCO-M banner](assets/refcocom_old_new.png) #### 有害样本示例原始RefCOCO验证集包含带有诽谤性言论、低俗性语言以及贬低性表述的描述。如下示例均来自RefCOCO-M安全流水线所移除的46个样本。 ![RefCOCO-M banner](assets/refcocom_filtered.png) --- #### 数据格式该数据集采用COCO格式进行组织。每张图像的记录包含以下字段： * `file_name`: COCO 2014格式的文件名。 * `image_meta`: 包含`width`（图像宽度）、`height`（图像高度）与`image_id`（图像ID）的字典。 * `image`: 包含原始字节数据与相对路径的字典，格式为`{"bytes": ..., "path": "images/<file_name>"}`。 * `samples`: 该图像对应的实例注释列表。每个样本条目描述一个指代实例及其掩码： * `id`: 唯一实例ID。 * `category`: COCO类别标签。 * `supercategory`: COCO超类别标签。 * `sentences`: 该实例对应的指代表达式列表。 * `bbox`: 采用COCO像素坐标的边界框`[x, y, w, h]`。 * `mask`: 单条COCO风格的RLE掩码，格式为`{"counts": str, "size": [H, W]}`，其中`H`与`W`分别为图像的高度与宽度。 --- #### 评估协议针对每个样本及其`sample["sentences"]`中的每条指代表达式，我们将（图像，表达式）视为一个评估示例，其真值掩码为`sample["mask"]`。针对每个示例的预测二值掩码，我们计算交并比（IoU），并对所有示例的交并比取平均： $$mathrm{IoU} = frac{|hat{M} cap M|}{|hat{M} cup M|}, qquad mathrm{mIoU} = frac{1}{N} sum_{i=1}^N mathrm{IoU}_i$$ 其中$N$为RefCOCO-M中所有评估示例（图像，表达式）的总数量。

提供机构：

maas

创建时间：

2025-11-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集