refcoco-m
收藏魔搭社区2025-12-05 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/moondream/refcoco-m
下载链接
链接失效反馈官方服务:
资源简介:

### RefCOCO-M: Refined Referring Expression Segmentation
RefCOCO has long been a standard benchmark for [referring expression segmentation](https://arxiv.org/pdf/1603.06180), but it has two major issues: poor mask quality and harmful referring expressions. Modern models now produce masks that are more accurate than the ground-truth annotations, which makes RefCOCO an imprecise measure of segmentation quality.
RefCOCO-M is a cleaned version of the RefCOCO (UNC) validation split. We replace the original instance masks with pixel-accurate masks and remove harmful samples. RefCOCO-M contains 1,190 images, 2,080 instance masks, and 5,598 referring expressions. The images and referring expressions remain identical to the original RefCOCO validation set.
---
#### Construction
For each referred instance in the original RefCOCO validation set, we run a re-segmentation pipeline with an ensemble of models and keep only high-confidence masks. This removes 47% of masks due to unrecoverable quality. A separate model removes a further 0.5% of samples for harmful language.
#### Before/After Re-segmentation
The original RefCOCO masks are hand-drawn polygons and can be highly inaccurate: they are coarse, with inflated boundaries and missing fine structure. The examples below show that RefCOCO-M masks have tighter boundaries and capture details that are missing from the original masks.

#### Harmful Examples
The original RefCOCO validation set includes descriptions with slurs, sexualized language, and degrading phrases. The examples below are drawn from the 46 samples removed by the RefCOCO-M safety pipeline.

---
#### Data Format
The data is structured in COCO format. Each image-level record contains:
* `file_name`: COCO 2014 file name.
* `image_meta`: dict containing `width`, `height`, and `image_id`.
* `image`: dict with raw bytes and a relative path: {"bytes": ..., "path": "images/<file_name>"}.
* `samples`: list of instance annotations for that image.
Each sample entry describes one referred instance and its mask:
* `id`: unique instance id.
* `category`: COCO category label.
* `supercategory`: COCO supercategory label.
* `sentences`: list of referring expressions for this instance.
* `bbox`: [x, y, w, h] in COCO pixel coordinates.
* `mask`: single COCO-style RLE mask, given as `{"counts": str, "size": [H, W]}`, where `H` and `W` are the image height and width.
---
#### Evaluation Protocol
For each sample and each sentence in `sample["sentences"]`, we treat (image, sentence) as one evaluation example with ground-truth mask `sample["mask"]`. Given a predicted binary mask for each example, we compute IoU with respect to the corresponding ground-truth mask and average IoU across all examples:
$$
\mathrm{IoU} = \frac{|\hat{M} \cap M|}{|\hat{M} \cup M|}, \qquad
\mathrm{mIoU} = \frac{1}{N} \sum_{i=1}^N \mathrm{IoU}_i
$$
where N is the total number of evaluation examples (image, sentence) in RefCOCO-M.

### RefCOCO-M:精细化指代表达式分割基准
RefCOCO长期作为指代表达式分割(referring expression segmentation)的标准基准数据集,相关原始研究可参阅:https://arxiv.org/pdf/1603.06180,但存在两大核心缺陷:掩码质量不佳,以及存在问题的指代表达式。如今的现代模型生成的掩码精度已超越原始标注的真值掩码,这使得RefCOCO不再是衡量分割性能的精准指标。
RefCOCO-M是RefCOCO(UNC)验证集的清洗版本。我们将原始实例掩码替换为像素级精准的掩码,并移除存在问题的样本。RefCOCO-M包含1190张图像、2080个实例掩码以及5598条指代表达式。其图像与指代表达式与原始RefCOCO验证集完全一致。
---
#### 数据集构建流程
针对原始RefCOCO验证集中的每个指代实例,我们采用模型集成的重分割流水线进行处理,仅保留高置信度的掩码。由于质量无法修复,我们移除了47%的掩码;另有独立模型基于有害语言过滤,进一步移除了0.5%的样本。
#### 重分割前后对比
原始RefCOCO掩码为手工绘制的多边形,往往精度极差:边界粗糙、存在膨胀问题且缺失精细结构。如下示例可见,RefCOCO-M的掩码边界更紧凑,且捕捉到了原始掩码遗漏的细节。

#### 有害样本示例
原始RefCOCO验证集包含带有诽谤性言论、低俗性语言以及贬低性表述的描述。如下示例均来自RefCOCO-M安全流水线所移除的46个样本。

---
#### 数据格式
该数据集采用COCO格式进行组织。每张图像的记录包含以下字段:
* `file_name`: COCO 2014格式的文件名。
* `image_meta`: 包含`width`(图像宽度)、`height`(图像高度)与`image_id`(图像ID)的字典。
* `image`: 包含原始字节数据与相对路径的字典,格式为`{"bytes": ..., "path": "images/<file_name>"}`。
* `samples`: 该图像对应的实例注释列表。
每个样本条目描述一个指代实例及其掩码:
* `id`: 唯一实例ID。
* `category`: COCO类别标签。
* `supercategory`: COCO超类别标签。
* `sentences`: 该实例对应的指代表达式列表。
* `bbox`: 采用COCO像素坐标的边界框`[x, y, w, h]`。
* `mask`: 单条COCO风格的RLE掩码,格式为`{"counts": str, "size": [H, W]}`,其中`H`与`W`分别为图像的高度与宽度。
---
#### 评估协议
针对每个样本及其`sample["sentences"]`中的每条指代表达式,我们将(图像,表达式)视为一个评估示例,其真值掩码为`sample["mask"]`。针对每个示例的预测二值掩码,我们计算交并比(IoU),并对所有示例的交并比取平均:
$$mathrm{IoU} = frac{|hat{M} cap M|}{|hat{M} cup M|}, qquad
mathrm{mIoU} = frac{1}{N} sum_{i=1}^N mathrm{IoU}_i$$
其中$N$为RefCOCO-M中所有评估示例(图像,表达式)的总数量。
提供机构:
maas
创建时间:
2025-11-18



