crag-mm-2025/crag-mm-single-turn-public
收藏魔搭社区2025-09-24 更新2025-04-19 收录
下载链接:
https://modelscope.cn/datasets/gaojing8500/crag-mm-single-turn-public
下载链接
链接失效反馈官方服务:
资源简介:
# CRAG-MM: Comprehensive multi-modal, multi-turn RAG Benchmark

This repository contains the CRAG-MM dataset, a high-quality conversational benchmark for multimodal assistants. The dataset features conversations about images with varied complexity levels, designed to evaluate AI systems' visual understanding and conversational abilities.
CRAG-MM is a visual question-answering benchmark that focuses on factual questions, offering a unique collection of image and question-answering sets to enable comprehensive assessment of wearable devices.
The benchmark includes egocentric images and captured by RayBan Meta smart glasses and public image (urls), covering 13 domains. It features 4 types of questions, from simple queries answerable by looking at the image to complex ones requiring multi-source retrieval and reasoning.
CRAG-MM encompasses both single-turn and multi-turn conversations, providing a comprehensive evaluation of MM-RAG solutions.
Currently, only the `validation` split is available, as the other splits are used for evaluations for the **Meta CRAG-MM Challenge** at **KDD Cup 2025**. More details about the dataset and the associated tasks are available on the [KDD Cup 2025 Challenge Page](https://www.aicrowd.com/challenges/meta-crag-mm-challenge-2025).
## Dataset Description
CRAG-MM is available in two variants:
- **Single-turn**: One question-answer exchange per image
- **Multi-turn**: Extended conversations with multiple questions and answers about the same image
Both variants feature rich, human-generated questions and expert answers about diverse images, covering various visual reasoning tasks.
## Usage
You can easily load and explore the dataset using the Hugging Face `datasets` library:
```python
from datasets import load_dataset
# For single-turn dataset
dataset = load_dataset("crag-mm-2025/crag-mm-single_turn-public", revision="v0.1.1")
# For multi-turn dataset
dataset = load_dataset("crag-mm-2025/crag-mm-multi_turn-public", revision="v0.1.1")
# View available splits
print(f"Available splits: {', '.join(dataset.keys())}")
# Access examples
example = dataset["validation"][0]
print(f"Session ID: {example['session_id']}")
print(f"Image: {example['image']}")
print(f"Image URL: {example['image_url']}")
"""
Note: Either 'image' or 'image_url' will be provided in the dataset, but not necessarily both.
When the actual image cannot be included, only the image_url will be available.
The evaluation servers will nevertheless always include the loaded 'image' field.
"""
# Show image
import matplotlib.pyplot as plt
plt.imshow(example['image'])
```
## Example Visualization
Here's how to print a complete conversation from the dataset:
```python
def _prepare_feature_vocabularies(dataset_split):
"""Extract feature vocabularies for category encoding from dataset.
These vocabularies allow conversion between integer indices and string labels.
"""
return {
"domain": dataset_split.features["turns"][0]["domain"],
"query_category": dataset_split.features["turns"][0]["query_category"],
"dynamism": dataset_split.features["turns"][0]["dynamism"],
"image_quality": dataset_split.features["image_quality"],
}
def print_conversation(example: Dict[str, Any], feature_vocabularies: Dict[str, Any]) -> None:
"""Print a conversation in an indented format.
Args:
example: A single dataset example containing conversation turns
feature_vocabularies: Mapping of features to their vocabularies for encoding/decoding from idx to str
"""
# Print session ID
print(f"Session ID: {example['session_id']}")
# Print image info
print(f"Image: {example['image']}")
print(f"Image URL: {example['image_url']}")
"""
Note: Either 'image' or 'image_url' will be provided in the dataset, but not necessarily both.
When the actual image cannot be included, only the image_url will be available.
The evaluation servers will nevertheless always include the loaded 'image' field.
"""
image_quality_str = feature_vocabularies["image_quality"].int2str(example['image_quality'])
print(f"Image Quality: {image_quality_str}")
# Determine if single-turn or multi-turn
is_single_turn = len(example['turns']) == 1
print(f"Type: {'Single-turn' if is_single_turn else 'Multi-turn'} ({len(example['turns'])} turns)")
# Create answer lookup dictionary if answers exist
answer_lookup = {}
if 'answers' in example and example['answers'] is not None:
answer_lookup = {a["interaction_id"]: a["ans_full"] for a in example["answers"]}
# Print each turn
print("\nConversation:")
for i, turn in enumerate(example['turns']):
# For multi-turn, show turn number
if not is_single_turn:
print(f"\tTurn {i+1}:")
# Convert metadata to string representations
domain_str = feature_vocabularies["domain"].int2str(turn['domain'])
category_str = feature_vocabularies["query_category"].int2str(turn['query_category'])
dynamism_str = feature_vocabularies["dynamism"].int2str(turn['dynamism'])
# Print metadata
prefix = "\t\t" if not is_single_turn else "\t"
print(f"{prefix}Domain: {domain_str} | Category: {category_str} | Dynamism: {dynamism_str}")
# Print query and answer with fixed tab indentation
print(f"{prefix}Q: {turn['query']}")
ans = answer_lookup.get(turn['interaction_id'], "No answer available")
print(f"{prefix}A: {ans}")
if not is_single_turn and i < len(example['turns']) - 1:
print() # Add blank line between turns in multi-turn conversations
print("\n" + "-" * 60 + "\n") # Add separator between examples
split_to_use = "validation"
feature_vocabularies = _prepare_feature_vocabularies(dataset[split_to_use])
print_conversation(dataset[split_to_use][0], feature_vocabularies)
```
## Dataset Splits
The dataset includes the following splits:
- `validation`: A small subset for quick testing and exploration
- Additional splits may be available depending on the specific version
## Versions
The dataset is versioned using the `revision` parameter. Latest version: `v0.1.1`
## Dataset Structure
### Single-Turn Format
```json
{
"session_id": "string",
"image": Image(),
"image_url": "string",
"image_quality": "string",
"turns": [
{
"interaction_id": "string",
"domain": "string",
"query_category": "string",
"dynamism": "string",
"query": "string",
}
],
"answers": [
{
"interaction_id": "string",
"ans_full": "string"
}
]
}
```
### Multi-Turn Format
```json
{
"session_id": "string",
"image": Image(),
"image_url": "string",
"image_quality": "string",
"turns": [
{
"interaction_id": "string",
"domain": "string",
"query_category": "string",
"dynamism": "string",
"query": "string",
},
...
],
"answers": [
{
"interaction_id": "string",
"ans_full": "string"
},
...
]
}
```
## Citation
If you use this dataset in your research, please cite:
```
@inproceedings{crag-mm-2025,
title = {CRAG-MM: A Comprehensive RAG Benchmark for Multi-modal, Multi-turn Question Answering},
author = {CRAG-MM Team},
year = {2025},
url = {https://www.aicrowd.com/challenges/meta-crag-mm-challenge-2025}
}
```
## License
[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0)
## Contact
For questions or issues related to the dataset, please reach out to us on the [challenge forums](https://discourse.aicrowd.com/c/meta-comprehensive-rag-benchmark-kdd-cup-2-524854/2929) or email us at: [crag-mm-2025@aicrowd.com](mailto:crag-mm-2025@aicrowd.com).
# CRAG-MM:多模态多轮检索增强生成基准测试集(Retrieval-Augmented Generation, RAG)

本仓库收录CRAG-MM数据集,这是一款面向多模态助手的高质量对话式基准测试集。该数据集涵盖复杂度各异的图像对话任务,旨在评估人工智能系统的视觉理解与对话交互能力。
CRAG-MM是一款聚焦事实性问题的视觉问答基准测试集,收录了独特的图像与问答配对集合,可实现对可穿戴设备的全面评估。
该基准测试集包含由RayBan Meta智能眼镜拍摄的第一人称视角图像,以及公开网络图像(附带URL),涵盖13个不同领域。其设置了四类问答任务,难度跨度从仅需观察图像即可作答的简单查询,到需要多源检索与推理才能解决的复杂问题。
CRAG-MM涵盖单轮与多轮对话场景,可对多模态检索增强生成(Multi-modal Retrieval-Augmented Generation, MM-RAG)方案实现全面评估。
目前仅开放`validation`(验证集)划分,其余划分将用于**2025年KDD杯**的**Meta CRAG-MM挑战赛**评估。有关该数据集与相关任务的更多细节,请参阅[2025年KDD杯挑战赛页面](https://www.aicrowd.com/challenges/meta-crag-mm-challenge-2025)。
## 数据集说明
CRAG-MM包含两种变体形式:
- **单轮对话**:单张图像对应一组问答交互
- **多轮对话**:针对同一张图像展开的多轮问答扩展对话
两类变体均包含针对多样化图像的高质量人工生成问题与专家级答案,覆盖多种视觉推理任务。
## 使用方法
你可以通过Hugging Face的`datasets`库便捷地加载并探索本数据集:
python
from datasets import load_dataset
# 加载单轮对话数据集
dataset = load_dataset("crag-mm-2025/crag-mm-single_turn-public", revision="v0.1.1")
# 加载多轮对话数据集
dataset = load_dataset("crag-mm-2025/crag-mm-multi_turn-public", revision="v0.1.1")
# 查看可用的数据划分
print(f"可用数据划分:{', '.join(dataset.keys())}")
# 访问示例数据
example = dataset["validation"][0]
print(f"会话ID:{example['session_id']}")
print(f"图像:{example['image']}")
print(f"图像链接:{example['image_url']}")
"""
注意:数据集中仅会提供`image`(图像对象)或`image_url`(图像链接)中的一项,不会同时包含两者。若无法直接嵌入图像文件,则仅会提供图像链接。但评估服务器始终会加载`image`字段对应的图像内容。
"""
# 展示图像
import matplotlib.pyplot as plt
plt.imshow(example['image'])
## 示例可视化
以下为如何打印数据集中的完整对话:
python
def _prepare_feature_vocabularies(dataset_split):
"""从数据集中提取特征词汇表以用于类别编码。
这些词汇表可实现整数索引与字符串标签之间的相互转换。
"""
return {
"domain": dataset_split.features["turns"][0]["domain"],
"query_category": dataset_split.features["turns"][0]["query_category"],
"dynamism": dataset_split.features["turns"][0]["dynamism"],
"image_quality": dataset_split.features["image_quality"],
}
def print_conversation(example: Dict[str, Any], feature_vocabularies: Dict[str, Any]) -> None:
"""以缩进格式打印一段对话。
参数:
example:包含对话轮次的单条数据集示例
feature_vocabularies:特征到其词汇表的映射,用于实现索引到字符串的编码/解码
"""
# 打印会话ID
print(f"会话ID:{example['session_id']}")
# 打印图像信息
print(f"图像:{example['image']}")
print(f"图像链接:{example['image_url']}")
"""
注意:数据集中仅会提供`image`(图像对象)或`image_url`(图像链接)中的一项,不会同时包含两者。若无法直接嵌入图像文件,则仅会提供图像链接。但评估服务器始终会加载`image`字段对应的图像内容。
"""
image_quality_str = feature_vocabularies["image_quality"].int2str(example['image_quality'])
print(f"图像质量:{image_quality_str}")
# 判断为单轮还是多轮对话
is_single_turn = len(example['turns']) == 1
print(f"对话类型:{'单轮对话' if is_single_turn else '多轮对话'}(共{len(example['turns'])}轮)")
# 若存在答案则创建答案查找字典
answer_lookup = {}
if 'answers' in example and example['answers'] is not None:
answer_lookup = {a["interaction_id"]: a["ans_full"] for a in example["answers"]}
# 打印每一轮对话
print("
对话内容:")
for i, turn in enumerate(example['turns']):
# 多轮对话中显示轮次编号
if not is_single_turn:
print(f" 第{i+1}轮:")
# 将元数据转换为字符串表示
domain_str = feature_vocabularies["domain"].int2str(turn['domain'])
category_str = feature_vocabularies["query_category"].int2str(turn['query_category'])
dynamism_str = feature_vocabularies["dynamism"].int2str(turn['dynamism'])
# 打印元数据
prefix = " " if not is_single_turn else " "
print(f"{prefix}领域:{domain_str} | 查询类别:{category_str} | 动态属性:{dynamism_str}")
# 打印问题与答案,使用固定制表符缩进
print(f"{prefix}Q:{turn['query']}")
ans = answer_lookup.get(turn['interaction_id'], "无可用答案")
print(f"{prefix}A:{ans}")
if not is_single_turn and i < len(example['turns']) - 1:
print() # 多轮对话中在轮次之间添加空行分隔
print("
" + "-" * 60 + "
") # 在示例之间添加分隔线
split_to_use = "validation"
feature_vocabularies = _prepare_feature_vocabularies(dataset[split_to_use])
print_conversation(dataset[split_to_use][0], feature_vocabularies)
## 数据集划分
本数据集包含以下划分:
- `validation`(验证集):用于快速测试与探索的小型子集
- 根据具体版本的不同,可能会提供额外的数据划分
## 版本说明
本数据集通过`revision`参数进行版本控制,最新版本为`v0.1.1`
## 数据集结构
### 单轮对话格式
json
{
"session_id": "string",
"image": Image(),
"image_url": "string",
"image_quality": "string",
"turns": [
{
"interaction_id": "string",
"domain": "string",
"query_category": "string",
"dynamism": "string",
"query": "string",
}
],
"answers": [
{
"interaction_id": "string",
"ans_full": "string"
}
]
}
### 多轮对话格式
json
{
"session_id": "string",
"image": Image(),
"image_url": "string",
"image_quality": "string",
"turns": [
{
"interaction_id": "string",
"domain": "string",
"query_category": "string",
"dynamism": "string",
"query": "string",
},
...
],
"answers": [
{
"interaction_id": "string",
"ans_full": "string"
},
...
]
}
## 引用方式
若您在研究中使用本数据集,请引用以下文献:
@inproceedings{crag-mm-2025,
title = {CRAG-MM: A Comprehensive RAG Benchmark for Multi-modal, Multi-turn Question Answering},
author = {CRAG-MM Team},
year = {2025},
url = {https://www.aicrowd.com/challenges/meta-crag-mm-challenge-2025}
}
## 授权协议
本数据集采用[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0)协议进行授权。
## 联系方式
若您对数据集有任何疑问或问题,请前往[挑战赛论坛](https://discourse.aicrowd.com/c/meta-comprehensive-rag-benchmark-kdd-cup-2-524854/2929)留言,或发送邮件至:[crag-mm-2025@aicrowd.com](mailto:crag-mm-2025@aicrowd.com)。
提供机构:
maas
创建时间:
2025-04-17
搜集汇总
数据集介绍

背景与挑战
背景概述
CRAG-MM是一个高质量的多模态对话基准数据集,专注于图像相关的视觉问答,用于评估AI系统的视觉理解和对话能力。该数据集包含单轮和多轮对话变体,目前仅验证集可用,其他部分用于KDD Cup 2025的Meta CRAG-MM挑战赛。数据来源于RayBan Meta智能眼镜拍摄的以自我为中心图像和公共图像,覆盖13个领域,包含4种问题类型。
以上内容由遇见数据集搜集并总结生成



