SEED-Bench-2
收藏魔搭社区2025-12-26 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/TencentARC/SEED-Bench-2
下载链接
链接失效反馈官方服务:
资源简介:
# SEED-Bench Card
## Benchmark details
**Benchmark type:**
SEED-Bench-2 is a comprehensive large-scale benchmark for evaluating Multimodal Large Language Models (MLLMs), featuring 24K multiple-choice questions with precise human annotations.
It spans 27 evaluation dimensions, assessing both text and image generation.
**Benchmark date:**
SEED-Bench was collected in November 2023.
**Paper or resources for more information:**
https://github.com/AILab-CVC/SEED-Bench
**License:**
Attribution-NonCommercial 4.0 International. It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use.
Data Sources:
- Dimensions 1-9, 23 (In-Context Captioning): Conceptual Captions Dataset (https://ai.google.com/research/ConceptualCaptions/) under its license (https://github.com/google-research-datasets/conceptual-captions/blob/master/LICENSE). Copyright belongs to the original dataset owner.
- Dimension 9 (Text Recognition): ICDAR2003 (http://www.imglab.org/db/index.html), ICDAR2013(https://rrc.cvc.uab.es/?ch=2), IIIT5k(https://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset), and SVT(http://vision.ucsd.edu/~kai/svt/). Copyright belongs to the original dataset owner.
- Dimension 10 (Celebrity Recognition): MME (https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation) and MMBench (https://github.com/open-compass/MMBench) under MMBench license (https://github.com/open-compass/MMBench/blob/main/LICENSE). Copyright belongs to the original dataset owners.
- Dimension 11 (Landmark Recognition): Google Landmark Dataset v2 (https://github.com/cvdfoundation/google-landmark) under CC-BY licenses without ND restrictions.
- Dimension 12 (Chart Understanding): PlotQA (https://github.com/NiteshMethani/PlotQA) under its license (https://github.com/NiteshMethani/PlotQA/blob/master/LICENSE).
- Dimension 13 (Visual Referring Expression): VCR (http://visualcommonsense.com) under its license (http://visualcommonsense.com/license/).
- Dimension 14 (Science Knowledge): ScienceQA (https://github.com/lupantech/ScienceQA) under its license (https://github.com/lupantech/ScienceQA/blob/main/LICENSE-DATA).
- Dimension 15 (Emotion Recognition): FER2013 (https://www.kaggle.com/competitions/challenges-in-representation-learning-facial-expression-recognition-challenge/data) under its license (https://www.kaggle.com/competitions/challenges-in-representation-learning-facial-expression-recognition-challenge/rules#7-competition-data).
- Dimension 16 (Visual Mathematics): MME (https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation) and data from the internet under CC-BY licenses.
- Dimension 17 (Difference Spotting): MIMICIT (https://github.com/Luodian/Otter/blob/main/mimic-it/README.md) under its license (https://github.com/Luodian/Otter/tree/main/mimic-it#eggs).
- Dimension 18 (Meme Comprehension): Data from the internet under CC-BY licenses.
- Dimension 19 (Global Video Understanding): Charades (https://prior.allenai.org/projects/charades) under its license (https://prior.allenai.org/projects/data/charades/license.txt). SEED-Bench-2 provides 8 frames per video.
- Dimensions 20-22 (Action Recognition, Action Prediction, Procedure Understanding): Something-Something v2 (https://developer.qualcomm.com/software/ai-datasets/something-something), Epic-Kitchen 100 (https://epic-kitchens.github.io/2023), and Breakfast (https://serre-lab.clps.brown.edu/resource/breakfast-actions-dataset/). SEED-Bench-2 provides 8 frames per video.
- Dimension 24 (Interleaved Image-Text Analysis): Data from the internet under CC-BY licenses.
- Dimension 25 (Text-to-Image Generation): CC-500 (https://github.com/weixi-feng/Structured-Diffusion-Guidance) and ABC-6k (https://github.com/weixi-feng/Structured-Diffusion-Guidance) under their license (https://github.com/weixi-feng/Structured-Diffusion-Guidance/blob/master/LICENSE), with images generated by Stable-Diffusion-XL (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) under its license (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md).
- Dimension 26 (Next Image Prediction): Epic-Kitchen 100 (https://epic-kitchens.github.io/2023) under its license (https://creativecommons.org/licenses/by-nc/4.0/).
- Dimension 27 (Text-Image Creation): Data from the internet under CC-BY licenses.
Please contact us if you believe any data infringes upon your rights, and we will remove it.
**Where to send questions or comments about the benchmark:**
https://github.com/AILab-CVC/SEED-Bench/issues
## Intended use
**Primary intended uses:**
SEED-Bench-2 is primarily designed to evaluate Multimodal Large Language Models in text and image generation tasks.
**Primary intended users:**
Researchers and enthusiasts in computer vision, natural language processing, machine learning, and artificial intelligence are the main target users of the benchmark.
# SEED-Bench 数据集卡片
## 基准集详情
**基准类型:**
SEED-Bench-2是一款用于评估多模态大语言模型(Multimodal Large Language Models, MLLMs)的综合性大规模基准测试集,包含2.4万道经人工精准标注的多项选择题。该基准覆盖27项评估维度,可对文本与图像生成能力进行全面测评。
**基准采集时间:**
SEED-Bench于2023年11月完成数据采集。
**获取更多信息的论文或资源:**
https://github.com/AILab-CVC/SEED-Bench
**授权协议:**
采用署名-非商业性使用4.0国际版(Attribution-NonCommercial 4.0 International),同时需遵守OpenAI相关政策:https://openai.com/policies/terms-of-use。
### 数据来源:
- 维度1-9、23(上下文字幕生成):采用Conceptual Captions数据集(https://ai.google.com/research/ConceptualCaptions/),需遵循其授权协议(https://github.com/google-research-datasets/conceptual-captions/blob/master/LICENSE),版权归原数据集所有者所有。
- 维度9(文本识别):采用ICDAR2003(http://www.imglab.org/db/index.html)、ICDAR2013(https://rrc.cvc.uab.es/?ch=2)、IIIT5k(https://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset)及SVT(http://vision.ucsd.edu/~kai/svt/)数据集,版权归原数据集所有者所有。
- 维度10(名人识别):采用MME(https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation)与MMBench(https://github.com/open-compass/MMBench)数据集,需遵循MMBench授权协议(https://github.com/open-compass/MMBench/blob/main/LICENSE),版权归原数据集所有者所有。
- 维度11(地标识别):采用Google Landmark Dataset v2(https://github.com/cvdfoundation/google-landmark),遵循CC-BY授权协议,无ND限制。
- 维度12(图表理解):采用PlotQA(https://github.com/NiteshMethani/PlotQA),需遵循其授权协议(https://github.com/NiteshMethani/PlotQA/blob/master/LICENSE)。
- 维度13(视觉指代表达):采用VCR(http://visualcommonsense.com),需遵循其授权协议(http://visualcommonsense.com/license/)。
- 维度14(科学知识):采用ScienceQA(https://github.com/lupantech/ScienceQA),需遵循其授权协议(https://github.com/lupantech/ScienceQA/blob/main/LICENSE-DATA)。
- 维度15(情感识别):采用FER2013(https://www.kaggle.com/competitions/challenges-in-representation-learning-facial-expression-recognition-challenge/data),需遵循其竞赛数据规则(https://www.kaggle.com/competitions/challenges-in-representation-learning-facial-expression-recognition-challenge/rules#7-competition-data)。
- 维度16(视觉数学):采用MME(https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation)及互联网公开数据,遵循CC-BY授权协议。
- 维度17(差异检测):采用MIMICIT(https://github.com/Luodian/Otter/blob/main/mimic-it/README.md),需遵循其授权协议(https://github.com/Luodian/Otter/tree/main/mimic-it#eggs)。
- 维度18(表情包理解):采用互联网公开数据,遵循CC-BY授权协议。
- 维度19(全局视频理解):采用Charades数据集(https://prior.allenai.org/projects/charades),需遵循其授权协议(https://prior.allenai.org/projects/data/charades/license.txt)。SEED-Bench-2为每个视频提供8帧采样帧。
- 维度20-22(动作识别、动作预测、流程理解):采用Something-Something v2(https://developer.qualcomm.com/software/ai-datasets/something-something)、Epic-Kitchen 100(https://epic-kitchens.github.io/2023)及Breakfast(https://serre-lab.clps.brown.edu/resource/breakfast-actions-dataset/)数据集。SEED-Bench-2为每个视频提供8帧采样帧。
- 维度24(跨模态图文分析):采用互联网公开数据,遵循CC-BY授权协议。
- 维度25(文本到图像生成):采用CC-500(https://github.com/weixi-feng/Structured-Diffusion-Guidance)与ABC-6k(https://github.com/weixi-feng/Structured-Diffusion-Guidance)数据集,需遵循其授权协议(https://github.com/weixi-feng/Structured-Diffusion-Guidance/blob/master/LICENSE);图像由Stable-Diffusion-XL(https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)生成,需遵循其授权协议(https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)。
- 维度26(下一帧图像预测):采用Epic-Kitchen 100(https://epic-kitchens.github.io/2023)数据集,需遵循CC-BY-NC 4.0授权协议(https://creativecommons.org/licenses/by-nc/4.0/)。
- 维度27(图文创作):采用互联网公开数据,遵循CC-BY授权协议。
若您认为本基准集中的任何数据侵犯了您的权益,请联系我们,我们将立即移除相关内容。
**基准集相关问题或意见反馈渠道:**
https://github.com/AILab-CVC/SEED-Bench/issues
## 适用场景
**主要用途:**
SEED-Bench-2主要用于评估多模态大语言模型的文本与图像生成任务性能。
**主要用户群体:**
计算机视觉、自然语言处理、机器学习及人工智能领域的研究人员与爱好者为本基准集的核心目标用户。
提供机构:
maas
创建时间:
2024-07-09



