five

refcoco-m

收藏
魔搭社区2025-12-05 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/moondream/refcoco-m
下载链接
链接失效反馈
官方服务:
资源简介:
![RefCOCO-M banner](assets/refcocom_banner.png) ### RefCOCO-M: Refined Referring Expression Segmentation RefCOCO has long been a standard benchmark for [referring expression segmentation](https://arxiv.org/pdf/1603.06180), but it has two major issues: poor mask quality and harmful referring expressions. Modern models now produce masks that are more accurate than the ground-truth annotations, which makes RefCOCO an imprecise measure of segmentation quality. RefCOCO-M is a cleaned version of the RefCOCO (UNC) validation split. We replace the original instance masks with pixel-accurate masks and remove harmful samples. RefCOCO-M contains 1,190 images, 2,080 instance masks, and 5,598 referring expressions. The images and referring expressions remain identical to the original RefCOCO validation set. --- #### Construction For each referred instance in the original RefCOCO validation set, we run a re-segmentation pipeline with an ensemble of models and keep only high-confidence masks. This removes 47% of masks due to unrecoverable quality. A separate model removes a further 0.5% of samples for harmful language. #### Before/After Re-segmentation The original RefCOCO masks are hand-drawn polygons and can be highly inaccurate: they are coarse, with inflated boundaries and missing fine structure. The examples below show that RefCOCO-M masks have tighter boundaries and capture details that are missing from the original masks. ![RefCOCO-M banner](assets/refcocom_old_new.png) #### Harmful Examples The original RefCOCO validation set includes descriptions with slurs, sexualized language, and degrading phrases. The examples below are drawn from the 46 samples removed by the RefCOCO-M safety pipeline. ![RefCOCO-M banner](assets/refcocom_filtered.png) --- #### Data Format The data is structured in COCO format. Each image-level record contains: * `file_name`: COCO 2014 file name. * `image_meta`: dict containing `width`, `height`, and `image_id`. * `image`: dict with raw bytes and a relative path: {"bytes": ..., "path": "images/<file_name>"}. * `samples`: list of instance annotations for that image. Each sample entry describes one referred instance and its mask: * `id`: unique instance id. * `category`: COCO category label. * `supercategory`: COCO supercategory label. * `sentences`: list of referring expressions for this instance. * `bbox`: [x, y, w, h] in COCO pixel coordinates. * `mask`: single COCO-style RLE mask, given as `{"counts": str, "size": [H, W]}`, where `H` and `W` are the image height and width. --- #### Evaluation Protocol For each sample and each sentence in `sample["sentences"]`, we treat (image, sentence) as one evaluation example with ground-truth mask `sample["mask"]`. Given a predicted binary mask for each example, we compute IoU with respect to the corresponding ground-truth mask and average IoU across all examples: $$ \mathrm{IoU} = \frac{|\hat{M} \cap M|}{|\hat{M} \cup M|}, \qquad \mathrm{mIoU} = \frac{1}{N} \sum_{i=1}^N \mathrm{IoU}_i $$ where N is the total number of evaluation examples (image, sentence) in RefCOCO-M.

![RefCOCO-M banner](assets/refcocom_banner.png) ### RefCOCO-M:精细化指代表达式分割基准 RefCOCO长期作为指代表达式分割(referring expression segmentation)的标准基准数据集,相关原始研究可参阅:https://arxiv.org/pdf/1603.06180,但存在两大核心缺陷:掩码质量不佳,以及存在问题的指代表达式。如今的现代模型生成的掩码精度已超越原始标注的真值掩码,这使得RefCOCO不再是衡量分割性能的精准指标。 RefCOCO-M是RefCOCO(UNC)验证集的清洗版本。我们将原始实例掩码替换为像素级精准的掩码,并移除存在问题的样本。RefCOCO-M包含1190张图像、2080个实例掩码以及5598条指代表达式。其图像与指代表达式与原始RefCOCO验证集完全一致。 --- #### 数据集构建流程 针对原始RefCOCO验证集中的每个指代实例,我们采用模型集成的重分割流水线进行处理,仅保留高置信度的掩码。由于质量无法修复,我们移除了47%的掩码;另有独立模型基于有害语言过滤,进一步移除了0.5%的样本。 #### 重分割前后对比 原始RefCOCO掩码为手工绘制的多边形,往往精度极差:边界粗糙、存在膨胀问题且缺失精细结构。如下示例可见,RefCOCO-M的掩码边界更紧凑,且捕捉到了原始掩码遗漏的细节。 ![RefCOCO-M banner](assets/refcocom_old_new.png) #### 有害样本示例 原始RefCOCO验证集包含带有诽谤性言论、低俗性语言以及贬低性表述的描述。如下示例均来自RefCOCO-M安全流水线所移除的46个样本。 ![RefCOCO-M banner](assets/refcocom_filtered.png) --- #### 数据格式 该数据集采用COCO格式进行组织。每张图像的记录包含以下字段: * `file_name`: COCO 2014格式的文件名。 * `image_meta`: 包含`width`(图像宽度)、`height`(图像高度)与`image_id`(图像ID)的字典。 * `image`: 包含原始字节数据与相对路径的字典,格式为`{"bytes": ..., "path": "images/<file_name>"}`。 * `samples`: 该图像对应的实例注释列表。 每个样本条目描述一个指代实例及其掩码: * `id`: 唯一实例ID。 * `category`: COCO类别标签。 * `supercategory`: COCO超类别标签。 * `sentences`: 该实例对应的指代表达式列表。 * `bbox`: 采用COCO像素坐标的边界框`[x, y, w, h]`。 * `mask`: 单条COCO风格的RLE掩码,格式为`{"counts": str, "size": [H, W]}`,其中`H`与`W`分别为图像的高度与宽度。 --- #### 评估协议 针对每个样本及其`sample["sentences"]`中的每条指代表达式,我们将(图像,表达式)视为一个评估示例,其真值掩码为`sample["mask"]`。针对每个示例的预测二值掩码,我们计算交并比(IoU),并对所有示例的交并比取平均: $$mathrm{IoU} = frac{|hat{M} cap M|}{|hat{M} cup M|}, qquad mathrm{mIoU} = frac{1}{N} sum_{i=1}^N mathrm{IoU}_i$$ 其中$N$为RefCOCO-M中所有评估示例(图像,表达式)的总数量。
提供机构:
maas
创建时间:
2025-11-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作