jxu124/refcoco-benchmark

Name: jxu124/refcoco-benchmark
Creator: jxu124
Published: 2023-10-30 13:15:05
License: 暂无描述

Hugging Face2023-10-30 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/jxu124/refcoco-benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default data_files: - split: refcoco_unc_val path: data/refcoco_unc_val-* - split: refcoco_unc_testA path: data/refcoco_unc_testA-* - split: refcoco_unc_testB path: data/refcoco_unc_testB-* - split: refcoco_google_val path: data/refcoco_google_val-* - split: refcoco_google_test path: data/refcoco_google_test-* - split: refcocog_umd_val path: data/refcocog_umd_val-* - split: refcocog_umd_test path: data/refcocog_umd_test-* - split: refcocog_google_val path: data/refcocog_google_val-* - split: refcoco_plus_unc_val path: data/refcoco_plus_unc_val-* - split: refcoco_plus_unc_testA path: data/refcoco_plus_unc_testA-* - split: refcoco_plus_unc_testB path: data/refcoco_plus_unc_testB-* dataset_info: features: - name: ref_list list: - name: ann_info struct: - name: area dtype: float64 - name: bbox sequence: float64 - name: category_id dtype: int64 - name: id dtype: int64 - name: image_id dtype: int64 - name: iscrowd dtype: int64 - name: segmentation sequence: sequence: float64 - name: ref_info struct: - name: ann_id dtype: int64 - name: category_id dtype: int64 - name: file_name dtype: string - name: image_id dtype: int64 - name: ref_id dtype: int64 - name: sent_ids sequence: int64 - name: sentences list: - name: raw dtype: string - name: sent dtype: string - name: sent_id dtype: int64 - name: tokens sequence: string - name: split dtype: string - name: image_info struct: - name: coco_url dtype: string - name: date_captured dtype: string - name: file_name dtype: string - name: flickr_url dtype: string - name: height dtype: int64 - name: id dtype: int64 - name: license dtype: int64 - name: width dtype: int64 - name: image dtype: image splits: - name: refcoco_unc_val num_bytes: 264438667.5 num_examples: 1500 - name: refcoco_unc_testA num_bytes: 129028843.0 num_examples: 750 - name: refcoco_unc_testB num_bytes: 133102482.0 num_examples: 750 - name: refcoco_google_val num_bytes: 814855470.214 num_examples: 4559 - name: refcoco_google_test num_bytes: 800980159.978 num_examples: 4527 - name: refcocog_umd_val num_bytes: 220021282.2 num_examples: 1300 - name: refcocog_umd_test num_bytes: 442746080.0 num_examples: 2600 - name: refcocog_google_val num_bytes: 800691386.6 num_examples: 4650 - name: refcoco_plus_unc_val num_bytes: 264451297.5 num_examples: 1500 - name: refcoco_plus_unc_testA num_bytes: 129035632.0 num_examples: 750 - name: refcoco_plus_unc_testB num_bytes: 133095545.0 num_examples: 750 download_size: 4072689321 dataset_size: 4132446845.9919996 --- # Dataset Card for "refcoco-benchmark" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

配置项： - 配置名称：default 数据文件： - 拆分集：refcoco_unc_val，路径：data/refcoco_unc_val-* - 拆分集：refcoco_unc_testA，路径：data/refcoco_unc_testA-* - 拆分集：refcoco_unc_testB，路径：data/refcoco_unc_testB-* - 拆分集：refcoco_google_val，路径：data/refcoco_google_val-* - 拆分集：refcoco_google_test，路径：data/refcoco_google_test-* - 拆分集：refcocog_umd_val，路径：data/refcocog_umd_val-* - 拆分集：refcocog_umd_test，路径：data/refcocog_umd_test-* - 拆分集：refcocog_google_val，路径：data/refcocog_google_val-* - 拆分集：refcoco_plus_unc_val，路径：data/refcoco_plus_unc_val-* - 拆分集：refcoco_plus_unc_testA，路径：data/refcoco_plus_unc_testA-* - 拆分集：refcoco_plus_unc_testB，路径：data/refcoco_plus_unc_testB-* 数据集信息：特征字段： - 字段名：ref_list，为列表类型，其元素为结构体： - 字段名：ann_info，结构体： - 字段名：area，数据类型：64位浮点数（float64） - 字段名：bbox（边界框，bounding box），为64位浮点数序列 - 字段名：category_id，数据类型：64位整数（int64） - 字段名：id，数据类型：64位整数（int64） - 字段名：image_id，数据类型：64位整数（int64） - 字段名：iscrowd，数据类型：64位整数（int64） - 字段名：segmentation（分割标注），为双层64位浮点数序列 - 字段名：ref_info，结构体： - 字段名：ann_id，数据类型：64位整数（int64） - 字段名：category_id，数据类型：64位整数（int64） - 字段名：file_name，数据类型：字符串（string） - 字段名：image_id，数据类型：64位整数（int64） - 字段名：ref_id，数据类型：64位整数（int64） - 字段名：sent_ids，为64位整数序列 - 字段名：sentences，为列表类型，其元素为结构体： - 字段名：raw，数据类型：字符串（string） - 字段名：sent，数据类型：字符串（string） - 字段名：sent_id，数据类型：64位整数（int64） - 字段名：tokens，为字符串序列 - 字段名：split，数据类型：字符串（string） - 字段名：image_info，结构体： - 字段名：coco_url，数据类型：字符串（string） - 字段名：date_captured，数据类型：字符串（string） - 字段名：file_name，数据类型：字符串（string） - 字段名：flickr_url，数据类型：字符串（string） - 字段名：height，数据类型：64位整数（int64） - 字段名：id，数据类型：64位整数（int64） - 字段名：license，数据类型：64位整数（int64） - 字段名：width，数据类型：64位整数（int64） - 字段名：image，数据类型：图像（image）数据集拆分： - 拆分集名称：refcoco_unc_val，字节数：264438667.5，样本数量：1500 - 拆分集名称：refcoco_unc_testA，字节数：129028843.0，样本数量：750 - 拆分集名称：refcoco_unc_testB，字节数：133102482.0，样本数量：750 - 拆分集名称：refcoco_google_val，字节数：814855470.214，样本数量：4559 - 拆分集名称：refcoco_google_test，字节数：800980159.978，样本数量：4527 - 拆分集名称：refcocog_umd_val，字节数：220021282.2，样本数量：1300 - 拆分集名称：refcocog_umd_test，字节数：442746080.0，样本数量：2600 - 拆分集名称：refcocog_google_val，字节数：800691386.6，样本数量：4650 - 拆分集名称：refcoco_plus_unc_val，字节数：264451297.5，样本数量：1500 - 拆分集名称：refcoco_plus_unc_testA，字节数：129035632.0，样本数量：750 - 拆分集名称：refcoco_plus_unc_testB，字节数：133095545.0，样本数量：750 下载总大小：4072689321 数据集总大小：4132446845.9919996 --- # 「refcoco-benchmark」数据集卡片 [需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

jxu124

原始信息汇总

数据集概述

数据集配置

默认配置：包含多个数据文件，分为不同的分割（split），路径格式为data/refcoco_*。

数据集信息

特征：
- ref_list：包含多个子结构，如ann_info和ref_info，每个子结构包含多个字段，如area、bbox、category_id等。
- image_info：包含图像相关信息，如coco_url、date_captured、file_name等。
- image：图像数据类型。

数据集分割

refcoco_unc_val：1500个样本，264438667.5字节。
refcoco_unc_testA：750个样本，129028843.0字节。
refcoco_unc_testB：750个样本，133102482.0字节。
refcoco_google_val：4559个样本，814855470.214字节。
refcoco_google_test：4527个样本，800980159.978字节。
refcocog_umd_val：1300个样本，220021282.2字节。
refcocog_umd_test：2600个样本，442746080.0字节。
refcocog_google_val：4650个样本，800691386.6字节。
refcoco_plus_unc_val：1500个样本，264451297.5字节。
refcoco_plus_unc_testA：750个样本，129035632.0字节。
refcoco_plus_unc_testB：750个样本，133095545.0字节。

数据集大小

下载大小：4072689321字节。
数据集大小：4132446845.9919996字节。

搜集汇总

数据集介绍

构建方式

在视觉语言理解领域，构建高质量的数据集对于推动指代表达理解任务的发展至关重要。本数据集基于RefCOCO系列基准，通过整合多个子集如RefCOCO、RefCOCO+和RefCOCOg，并融合了来自不同标注来源（如UNC和Google）的数据。其构建过程涉及对图像中目标对象的边界框标注与自然语言描述的精细对齐，每个样本均包含图像信息、目标标注及对应的指代表达文本，确保了数据在视觉与语言模态间的严格对应。

特点

该数据集以其丰富的多模态结构和细致的标注层次而著称。每个样本不仅提供图像及其元数据，还包含目标对象的详细标注信息，如边界框、分割掩码和类别标识，同时关联了多个自然语言描述句子。数据集涵盖了多个评估分割，包括验证集和测试集，支持对不同模型性能的全面评测。其结构化的特征设计便于直接应用于指代表达理解任务的训练与评估，为研究社区提供了标准化的基准平台。

使用方法

使用本数据集时，研究人员可通过HuggingFace数据集库直接加载指定分割，例如refcoco_unc_val或refcocog_google_test。数据加载后，可访问图像、标注信息及对应的文本描述，进而构建视觉语言模型。典型应用包括指代表达定位任务，其中模型需根据给定文本描述在图像中预测目标区域。数据集的标准化格式支持端到端训练，并可通过评估脚本在多个分割上进行性能比较，以推动模型在复杂场景下的泛化能力研究。

背景与挑战

背景概述

在计算机视觉与自然语言处理的交叉领域，指代表达理解任务旨在通过自然语言描述精准定位图像中的特定对象。RefCOCO系列数据集作为该领域的核心基准，由北卡罗来纳大学教堂山分校等机构的研究团队于2014年构建，其核心研究问题聚焦于解决视觉与语言之间的语义对齐难题。该数据集通过整合MS COCO图像与人工标注的指代表达，推动了视觉定位、视觉问答及多模态理解等方向的发展，为模型评估提供了标准化测试环境。

当前挑战

指代表达理解任务面临的核心挑战在于处理语言描述的多样性与视觉场景的复杂性。模型需准确解析含有代词、关系从句及抽象概念的表述，并在遮挡、小目标或多对象干扰下实现精准定位。数据构建过程中，标注一致性是一大难题，不同标注者对同一对象的描述可能存在显著差异，且需确保表达与边界框的严格对应。此外，数据集的规模与多样性平衡亦需精心设计，以覆盖日常场景中的常见指代模式。

常用场景

经典使用场景

在视觉与语言交叉领域的研究中，jxu124/refcoco-benchmark数据集为指代表达理解任务提供了经典评估框架。该数据集整合了RefCOCO、RefCOCO+和RefCOCOg等多个子集，每个样本包含图像、目标对象的边界框标注以及对应的自然语言描述。研究者通常利用该数据集训练和测试模型，使其能够根据文本描述精准定位图像中的特定对象，从而推动视觉语言理解技术的发展。

衍生相关工作

围绕该数据集，学术界衍生了一系列经典研究工作。例如，基于注意力机制的视觉语言模型如MAttNet和VilBERT，通过融合多模态特征提升了指代表达理解的精度；后续研究进一步探索了跨数据集泛化、少样本学习以及对抗性样本鲁棒性等问题。这些工作不仅深化了对视觉语言交互机制的理解，也为更复杂的多模态任务奠定了理论基础。

数据集最近研究