SEED-Bench-H
收藏魔搭社区2026-01-06 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/TencentARC/SEED-Bench-H
下载链接
链接失效反馈官方服务:
资源简介:
license: cc-by-nc-4.0
task_categories:
- visual-question-answering
language:
- en
pretty_name: SEED-Bench-H
size_categories:
- 1K<n<10K
---
# SEED-Bench-H Card
## Benchmark details
**Benchmark type:**
SEED-Bench-H is a large-scale benchmark to evaluate Multimodal Large Language Models (MLLMs).
It consists of 28K multiple-choice questions with precise human annotations, spanning 34 dimensions, including the evaluation of both text and
image generation.
**Benchmark date:**
SEED-Bench-H was collected in April 2024.
**Paper or resources for more information:**
https://github.com/AILab-CVC/SEED-Bench
**License:**
Attribution-NonCommercial 4.0 International. It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use.
Data Sources:
- Dimensions 1-9, 23 (In-Context Captioning): Conceptual Captions Dataset (https://ai.google.com/research/ConceptualCaptions/) under its license (https://github.com/google-research-datasets/conceptual-captions/blob/master/LICENSE). Copyright belongs to the original dataset owner.
- Dimension 9 (Text Recognition): ICDAR2003 (http://www.imglab.org/db/index.html), ICDAR2013(https://rrc.cvc.uab.es/?ch=2), IIIT5k(https://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset), and SVT(http://vision.ucsd.edu/~kai/svt/). Copyright belongs to the original dataset owner.
- Dimension 10 (Celebrity Recognition): MME (https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation) and MMBench (https://github.com/open-compass/MMBench) under MMBench license (https://github.com/open-compass/MMBench/blob/main/LICENSE). Copyright belongs to the original dataset owners.
- Dimension 11 (Landmark Recognition): Google Landmark Dataset v2 (https://github.com/cvdfoundation/google-landmark) under CC-BY licenses without ND restrictions.
- Dimension 12 (Chart Understanding): PlotQA (https://github.com/NiteshMethani/PlotQA) under its license (https://github.com/NiteshMethani/PlotQA/blob/master/LICENSE).
- Dimension 13 (Visual Referring Expression): VCR (http://visualcommonsense.com) under its license (http://visualcommonsense.com/license/).
- Dimension 14 (Science Knowledge): ScienceQA (https://github.com/lupantech/ScienceQA) under its license (https://github.com/lupantech/ScienceQA/blob/main/LICENSE-DATA).
- Dimension 15 (Emotion Recognition): FER2013 (https://www.kaggle.com/competitions/challenges-in-representation-learning-facial-expression-recognition-challenge/data) under its license (https://www.kaggle.com/competitions/challenges-in-representation-learning-facial-expression-recognition-challenge/rules#7-competition-data).
- Dimension 16 (Visual Mathematics): MME (https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation) and data from the internet under CC-BY licenses.
- Dimension 17 (Difference Spotting): MIMICIT (https://github.com/Luodian/Otter/blob/main/mimic-it/README.md) under its license (https://github.com/Luodian/Otter/tree/main/mimic-it#eggs).
- Dimension 18 (Meme Comprehension): Data from the internet under CC-BY licenses.
- Dimension 19 (Global Video Understanding): Charades (https://prior.allenai.org/projects/charades) under its license (https://prior.allenai.org/projects/data/charades/license.txt). SEED-Bench-2 provides 8 frames per video.
- Dimensions 20-22 (Action Recognition, Action Prediction, Procedure Understanding): Something-Something v2 (https://developer.qualcomm.com/software/ai-datasets/something-something), Epic-Kitchen 100 (https://epic-kitchens.github.io/2023), and Breakfast (https://serre-lab.clps.brown.edu/resource/breakfast-actions-dataset/). SEED-Bench-2 provides 8 frames per video.
- Dimension 24 (Interleaved Image-Text Analysis): Data from the internet under CC-BY licenses.
- Dimension 25 (Text-to-Image Generation): CC-500 (https://github.com/weixi-feng/Structured-Diffusion-Guidance) and ABC-6k (https://github.com/weixi-feng/Structured-Diffusion-Guidance) under their license (https://github.com/weixi-feng/Structured-Diffusion-Guidance/blob/master/LICENSE), with images generated by Stable-Diffusion-XL (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) under its license (https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md).
- Dimension 26 (Next Image Prediction): Epic-Kitchen 100 (https://epic-kitchens.github.io/2023) under its license (https://creativecommons.org/licenses/by-nc/4.0/).
- Dimension 27 (Text-Image Creation): Data from the internet under CC-BY licenses.
- Dimension 28 (Few-shot Segmentation): MSCOCO dataset (https://cocodataset.org/) under its licenses (https://creativecommons.org/licenses/by/4.0/legalcode).
- Dimension 29 (Few-shot Kyepoint): MSCOCO dataset (https://cocodataset.org/) under its licenses (https://creativecommons.org/licenses/by/4.0/legalcode).
- Dimension 30 (Few-shot Depth): Middlebury stereo dataset (https://vision.middlebury.edu/stereo/) under CC-BY licenses.
- Dimension 31 (Few-shot Object): MSCOCO dataset (https://cocodataset.org/) under its licenses (https://creativecommons.org/licenses/by/4.0/legalcode).
- Dimension 32 (Image to Latex): Im2Latex dataset (https://lstmvis.vizhub.ai/) under its licenses (https://github.com/HendrikStrobelt/LSTMVis/blob/master/LICENSE.md).
- Dimension 33 (Text-Rich Visual Comprehension): Data from the internet under CC-BY licenses.
**Where to send questions or comments about the benchmark:**
https://github.com/AILab-CVC/SEED-Bench/issues
## Intended use
**Primary intended uses:**
The primary use of SEED-Bench-H is evaluate Multimodal Large Language Models in text and image generation tasks.
**Primary intended users:**
The primary intended users of the Benchmark are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
许可证:CC BY-NC 4.0
任务类别:视觉问答(visual-question-answering)
语言:英语(en)
数据集名称:SEED-Bench-H
规模范围:1000 < 样本量 < 10000
---
# SEED-Bench-H 数据集卡片
## 基准详情
**基准类型:**
SEED-Bench-H 是一款用于评估多模态大语言模型(Multimodal Large Language Models, MLLMs)的大规模基准测试集。该数据集包含2.8万道带有精准人工标注的选择题,覆盖34个评估维度,可同时对文本生成与图像生成任务进行评测。
**基准采集时间:**
SEED-Bench-H 于2024年4月完成采集。
**更多信息的论文或资源:**
https://github.com/AILab-CVC/SEED-Bench
**许可证:**
知识共享署名-非商业性使用4.0国际许可协议(Attribution-NonCommercial 4.0 International)。需同时遵守OpenAI相关政策:https://openai.com/policies/terms-of-use。
**数据源:**
- 维度1-9、23(上下文式图像描述(In-Context Captioning)):采用遵循其许可协议的概念性图像描述数据集(Conceptual Captions Dataset,https://ai.google.com/research/ConceptualCaptions/),其许可协议链接为https://github.com/google-research-datasets/conceptual-captions/blob/master/LICENSE。版权归原数据集所有者所有。
- 维度9(文本识别):采用ICDAR2003(http://www.imglab.org/db/index.html)、ICDAR2013(https://rrc.cvc.uab.es/?ch=2)、IIIT5k(https://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset)以及SVT(http://vision.ucsd.edu/~kai/svt/)数据集。版权归原数据集所有者所有。
- 维度10(名人识别(Celebrity Recognition)):采用MME(https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation)与MMBench(https://github.com/open-compass/MMBench)数据集,需遵循MMBench许可协议(https://github.com/open-compass/MMBench/blob/main/LICENSE)。版权归原数据集所有者所有。
- 维度11(地标识别(Landmark Recognition)):采用知识共享署名许可且无禁止演绎条款的谷歌地标数据集v2(Google Landmark Dataset v2,https://github.com/cvdfoundation/google-landmark)。
- 维度12(图表理解(Chart Understanding)):采用PlotQA(https://github.com/NiteshMethani/PlotQA)数据集,需遵循其许可协议(https://github.com/NiteshMethani/PlotQA/blob/master/LICENSE)。
- 维度13(视觉指代表达(Visual Referring Expression)):采用VCR(http://visualcommonsense.com)数据集,需遵循其许可协议(http://visualcommonsense.com/license/)。
- 维度14(科学知识(Science Knowledge)):采用ScienceQA(https://github.com/lupantech/ScienceQA)数据集,需遵循其许可协议(https://github.com/lupantech/ScienceQA/blob/main/LICENSE-DATA)。
- 维度15(情感识别(Emotion Recognition)):采用FER2013(https://www.kaggle.com/competitions/challenges-in-representation-learning-facial-expression-recognition-challenge/data)数据集,需遵循其许可协议(https://www.kaggle.com/competitions/challenges-in-representation-learning-facial-expression-recognition-challenge/rules#7-competition-data)。
- 维度16(视觉数学(Visual Mathematics)):采用MME(https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation)数据集与遵循知识共享署名许可的互联网公开数据。
- 维度17(差异找茬(Difference Spotting)):采用MIMICIT(https://github.com/Luodian/Otter/blob/main/mimic-it/README.md)数据集,需遵循其许可协议(https://github.com/Luodian/Otter/tree/main/mimic-it#eggs)。
- 维度18(迷因理解(Meme Comprehension)):采用遵循知识共享署名许可的互联网公开数据。
- 维度19(全局视频理解(Global Video Understanding)):采用Charades(https://prior.allenai.org/projects/charades)数据集,需遵循其许可协议(https://prior.allenai.org/projects/data/charades/license.txt)。SEED-Bench-2 为每个视频提供8帧画面。
- 维度20-22(动作识别(Action Recognition)、动作预测(Action Prediction)、流程理解(Procedure Understanding)):采用Something-Something v2(https://developer.qualcomm.com/software/ai-datasets/something-something)、Epic-Kitchen 100(https://epic-kitchens.github.io/2023)以及Breakfast(https://serre-lab.clps.brown.edu/resource/breakfast-actions-dataset/)数据集。SEED-Bench-2 为每个视频提供8帧画面。
- 维度24(跨模态图文分析(Interleaved Image-Text Analysis)):采用遵循知识共享署名许可的互联网公开数据。
- 维度25(文本到图像生成(Text-to-Image Generation)):采用CC-500(https://github.com/weixi-feng/Structured-Diffusion-Guidance)与ABC-6k(https://github.com/weixi-feng/Structured-Diffusion-Guidance)数据集,需遵循其许可协议(https://github.com/weixi-feng/Structured-Diffusion-Guidance/blob/master/LICENSE);图像由Stable-Diffusion-XL(https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)生成,需遵循其许可协议(https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)。
- 维度26(图像帧预测(Next Image Prediction)):采用Epic-Kitchen 100(https://epic-kitchens.github.io/2023)数据集,需遵循其许可协议(https://creativecommons.org/licenses/by-nc/4.0/)。
- 维度27(图文创作(Text-Image Creation)):采用遵循知识共享署名许可的互联网公开数据。
- 维度28(少样本分割(Few-shot Segmentation)):采用MSCOCO数据集(https://cocodataset.org/),需遵循其许可协议(https://creativecommons.org/licenses/by/4.0/legalcode)。
- 维度29(少样本关键点检测(Few-shot Keypoint)):采用MSCOCO数据集(https://cocodataset.org/),需遵循其许可协议(https://creativecommons.org/licenses/by/4.0/legalcode)。
- 维度30(少样本深度估计(Few-shot Depth)):采用Middlebury立体数据集(https://vision.middlebury.edu/stereo/),需遵循知识共享署名许可。
- 维度31(少样本目标检测(Few-shot Object)):采用MSCOCO数据集(https://cocodataset.org/),需遵循其许可协议(https://creativecommons.org/licenses/by/4.0/legalcode)。
- 维度32(图像转LaTeX(Image to Latex)):采用Im2Latex数据集(https://lstmvis.vizhub.ai/),需遵循其许可协议(https://github.com/HendrikStrobelt/LSTMVis/blob/master/LICENSE.md)。
- 维度33(富文本视觉理解(Text-Rich Visual Comprehension)):采用遵循知识共享署名许可的互联网公开数据。
**问题与意见反馈渠道:**
https://github.com/AILab-CVC/SEED-Bench/issues
## 预期用途
**主要用途:**
SEED-Bench-H 的核心用途为评估多模态大语言模型在文本生成与图像生成任务上的性能表现。
**主要适用人群:**
该基准测试集的目标用户为计算机视觉、自然语言处理、机器学习以及人工智能领域的研究人员与爱好者。
提供机构:
maas
创建时间:
2024-07-09



