WildVision/wildvision-internal-data
收藏Hugging Face2024-08-21 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/WildVision/wildvision-internal-data
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: battle
features:
- name: question_id
dtype: string
- name: model_a
dtype: string
- name: model_b
dtype: string
- name: conversation_a
list:
- name: role
dtype: string
- name: content
dtype: string
- name: conversation_b
list:
- name: role
dtype: string
- name: content
dtype: string
- name: language
dtype: string
- name: image
dtype: image
- name: turn
dtype: int32
- name: anony
dtype: bool
- name: winner
dtype: string
- name: tstamp
dtype: int32
- name: judge
dtype: string
- name: domain
dtype: string
splits:
- name: test
num_bytes: 18605192639.8
num_examples: 6200
download_size: 8818061879
dataset_size: 18605192639.8
- config_name: battle_2024_08_21
features:
- name: question_id
dtype: string
- name: model_a
dtype: string
- name: model_b
dtype: string
- name: conversation_a
list:
- name: role
dtype: string
- name: content
dtype: string
- name: conversation_b
list:
- name: role
dtype: string
- name: content
dtype: string
- name: language
dtype: string
- name: image
dtype: image
- name: turn
dtype: int32
- name: anony
dtype: bool
- name: winner
dtype: string
- name: tstamp
dtype: int32
- name: judge
dtype: string
splits:
- name: test
num_bytes: 39514031276.948
num_examples: 13126
download_size: 15521524077
dataset_size: 39514031276.948
- config_name: battle_2024_08_21_raw
features:
- name: question_id
dtype: string
- name: model_a
dtype: string
- name: model_b
dtype: string
- name: conversation_a
list:
- name: role
dtype: string
- name: content
dtype: string
- name: conversation_b
list:
- name: role
dtype: string
- name: content
dtype: string
- name: language
dtype: string
- name: image
dtype: image
- name: turn
dtype: int32
- name: anony
dtype: bool
- name: winner
dtype: string
- name: tstamp
dtype: int32
- name: judge
dtype: string
splits:
- name: test
num_bytes: 39227303456.13
num_examples: 13070
download_size: 15359156748
dataset_size: 39227303456.13
- config_name: battle_5_29
features:
- name: question_id
dtype: string
- name: model_a
dtype: string
- name: model_b
dtype: string
- name: conversation_a
list:
- name: role
dtype: string
- name: content
dtype: string
- name: conversation_b
list:
- name: role
dtype: string
- name: content
dtype: string
- name: language
dtype: string
- name: image
dtype: image
- name: turn
dtype: int32
- name: anony
dtype: bool
- name: winner
dtype: string
- name: tstamp
dtype: int32
- name: judge
dtype: string
splits:
- name: test
num_bytes: 26549445231.573
num_examples: 8847
download_size: 11520256673
dataset_size: 26549445231.573
- config_name: chat
features:
- name: question_id
dtype: string
- name: model
dtype: string
- name: conversation
list:
- name: role
dtype: string
- name: content
dtype: string
- name: language
dtype: string
- name: image
dtype: image
- name: turn
dtype: int32
- name: domain
dtype: string
- name: tstamp
dtype: int32
splits:
- name: test
num_bytes: 76283030751.608
num_examples: 34577
download_size: 28317275024
dataset_size: 76283030751.608
- config_name: chat_and_battle_image
features:
- name: question_id
dtype: string
- name: model
dtype: string
- name: conversation
list:
- name: role
dtype: string
- name: content
dtype: string
- name: language
dtype: string
- name: image
dtype: image
- name: turn
dtype: int32
- name: source
dtype: string
splits:
- name: train
num_bytes: 10500475382.445
num_examples: 3977
download_size: 7732811345
dataset_size: 10500475382.445
- config_name: chat_image
features:
- name: question_id
dtype: string
- name: model
dtype: string
- name: conversation
list:
- name: role
dtype: string
- name: content
dtype: string
- name: language
dtype: string
- name: image
dtype: image
- name: turn
dtype: int32
- name: domain
dtype: string
- name: tstamp
dtype: int32
splits:
- name: train
num_bytes: 123011255696.48
num_examples: 55745
download_size: 42601616538
dataset_size: 123011255696.48
- config_name: keep_bad_only
features:
- name: question_id
dtype: string
- name: model
dtype: string
- name: conversation
list:
- name: role
dtype: string
- name: content
dtype: string
- name: language
dtype: string
- name: image
dtype: image
- name: turn
dtype: int32
splits:
- name: test
num_bytes: 4760442474.92
num_examples: 1654
download_size: 3093490423
dataset_size: 4760442474.92
- config_name: release_100_as_bench
features:
- name: question_id
dtype: string
- name: model
dtype: string
- name: conversation
list:
- name: role
dtype: string
- name: content
dtype: string
- name: language
dtype: string
- name: image
dtype: image
- name: turn
dtype: int32
splits:
- name: test
num_bytes: 306531348.0
num_examples: 144
- name: val
num_bytes: 75199805.0
num_examples: 52
download_size: 492304000
dataset_size: 381731153.0
- config_name: release_100_as_bench_battle
features:
- name: question_id
dtype: string
- name: model_a
dtype: string
- name: model_b
dtype: string
- name: conversation_a
list:
- name: role
dtype: string
- name: content
dtype: string
- name: conversation_b
list:
- name: role
dtype: string
- name: content
dtype: string
- name: language
dtype: string
- name: image
dtype: image
- name: turn
dtype: int32
- name: anony
dtype: bool
- name: winner
dtype: string
- name: tstamp
dtype: int32
- name: judge
dtype: string
splits:
- name: precompute_gpt4v_vote
num_bytes: 8584763789.0
num_examples: 4032
- name: woprecompute_user_vote
num_bytes: 168025531.0
num_examples: 73
- name: precompute_evaluator_vote
num_bytes: 8584863881.0
num_examples: 4032
download_size: 906902218
dataset_size: 17337653201.0
- config_name: taxonmy
features:
- name: question_id
dtype: string
- name: model_a
dtype: string
- name: model_b
dtype: string
- name: conversation_a
list:
- name: role
dtype: string
- name: content
dtype: string
- name: conversation_b
list:
- name: role
dtype: string
- name: content
dtype: string
- name: language
dtype: string
- name: image
dtype: image
- name: turn
dtype: int32
- name: anony
dtype: bool
- name: winner
dtype: string
- name: tstamp
dtype: int32
- name: judge
dtype: string
- name: question_category
dtype: string
- name: question_subcategory
dtype: string
- name: image_domain
dtype: string
- name: image_subdomain
dtype: string
splits:
- name: test_with_taxnomy
num_bytes: 13170968746.43
num_examples: 5695
- name: test_with_taxnomy_100
num_bytes: 182934614.0
num_examples: 100
download_size: 8261937043
dataset_size: 13353903360.43
- config_name: taxonomy_battle_5_29
features:
- name: question_id
dtype: string
- name: model_a
dtype: string
- name: model_b
dtype: string
- name: conversation_a
list:
- name: role
dtype: string
- name: content
dtype: string
- name: conversation_b
list:
- name: role
dtype: string
- name: content
dtype: string
- name: language
dtype: string
- name: image
dtype: image
- name: turn
dtype: int32
- name: anony
dtype: bool
- name: winner
dtype: string
- name: tstamp
dtype: int32
- name: judge
dtype: string
- name: question_category
dtype: string
- name: question_subcategory
dtype: string
- name: image_domain
dtype: string
- name: image_subdomain
dtype: string
splits:
- name: test_with_taxonomy
num_bytes: 17273443740.424
num_examples: 8076
download_size: 10659233517
dataset_size: 17273443740.424
configs:
- config_name: battle
data_files:
- split: test
path: battle/test-*
- config_name: battle_2024_08_21
data_files:
- split: test
path: battle_2024_08_21/test-*
- config_name: battle_2024_08_21_raw
data_files:
- split: test
path: battle_2024_08_21_raw/test-*
- config_name: battle_5_29
data_files:
- split: test
path: battle_5_29/test-*
- config_name: chat
data_files:
- split: test
path: chat/test-*
- config_name: chat_and_battle_image
data_files:
- split: train
path: chat_and_battle_image/train-*
- config_name: chat_image
data_files:
- split: train
path: chat_image/train-*
- config_name: keep_bad_only
data_files:
- split: test
path: keep_bad_only/test-*
- config_name: release_100_as_bench
data_files:
- split: test
path: release_100_as_bench/test-*
- split: val
path: release_100_as_bench/val-*
- config_name: release_100_as_bench_battle
data_files:
- split: precompute_gpt4v_vote
path: release_100_as_bench_battle/precompute_gpt4v_vote-*
- split: woprecompute_user_vote
path: release_100_as_bench_battle/woprecompute_user_vote-*
- split: precompute_evaluator_vote
path: release_100_as_bench_battle/precompute_evaluator_vote-*
- config_name: taxonmy
data_files:
- split: test_with_taxnomy
path: taxonmy/test_with_taxnomy-*
- split: test_with_taxnomy_100
path: taxonmy/test_with_taxnomy_100-*
- config_name: taxonomy_battle_5_29
data_files:
- split: test_with_taxonomy
path: taxonomy_battle_5_29/test_with_taxonomy-*
---
# Dataset Card for Dataset Name
<!-- Provide a quick summary of the dataset. -->
This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1).
## Dataset Details
### Dataset Description
<!-- Provide a longer summary of what this dataset is. -->
- **Curated by:** [More Information Needed]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [optional]:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
### Dataset Sources [optional]
<!-- Provide the basic links for the dataset. -->
- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]
## Uses
<!-- Address questions around how the dataset is intended to be used. -->
### Direct Use
<!-- This section describes suitable use cases for the dataset. -->
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. -->
[More Information Needed]
## Dataset Structure
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
[More Information Needed]
## Dataset Creation
### Curation Rationale
<!-- Motivation for the creation of this dataset. -->
[More Information Needed]
### Source Data
<!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). -->
#### Data Collection and Processing
<!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. -->
[More Information Needed]
#### Who are the source data producers?
<!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. -->
[More Information Needed]
### Annotations [optional]
<!-- If the dataset contains annotations which are not part of the initial data collection, use this section to describe them. -->
#### Annotation process
<!-- This section describes the annotation process such as annotation tools used in the process, the amount of data annotated, annotation guidelines provided to the annotators, interannotator statistics, annotation validation, etc. -->
[More Information Needed]
#### Who are the annotators?
<!-- This section describes the people or systems who created the annotations. -->
[More Information Needed]
#### Personal and Sensitive Information
<!-- State whether the dataset contains data that might be considered personal, sensitive, or private (e.g., data that reveals addresses, uniquely identifiable names or aliases, racial or ethnic origins, sexual orientations, religious beliefs, political opinions, financial or health data, etc.). If efforts were made to anonymize the data, describe the anonymization process. -->
[More Information Needed]
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations.
## Citation [optional]
<!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the dataset or dataset card. -->
[More Information Needed]
## More Information [optional]
[More Information Needed]
## Dataset Card Authors [optional]
[More Information Needed]
## Dataset Card Contact
[More Information Needed]
提供机构:
WildVision
原始信息汇总
数据集概述
数据集配置详情
配置名称:battle
- 特征:
question_id: 字符串model_a: 字符串model_b: 字符串conversation_a: 列表,包含role和content,均为字符串conversation_b: 列表,包含role和content,均为字符串language: 字符串image: 图像turn: 整数anony: 布尔值winner: 字符串tstamp: 整数judge: 字符串domain: 字符串
- 分割:
test: 字节数 18605192639.8,样本数 6200
- 下载大小: 8818061879 字节
- 数据集大小: 18605192639.8 字节
配置名称:battle_5_29
- 特征:
question_id: 字符串model_a: 字符串model_b: 字符串conversation_a: 列表,包含role和content,均为字符串conversation_b: 列表,包含role和content,均为字符串language: 字符串image: 图像turn: 整数anony: 布尔值winner: 字符串tstamp: 整数judge: 字符串
- 分割:
test: 字节数 26549445231.573,样本数 8847
- 下载大小: 11520256673 字节
- 数据集大小: 26549445231.573 字节
配置名称:chat
- 特征:
question_id: 字符串model: 字符串conversation: 列表,包含role和content,均为字符串language: 字符串image: 图像turn: 整数domain: 字符串tstamp: 整数
- 分割:
test: 字节数 76283030751.608,样本数 34577
- 下载大小: 28317275024 字节
- 数据集大小: 76283030751.608 字节
配置名称:chat_and_battle_image
- 特征:
question_id: 字符串model: 字符串conversation: 列表,包含role和content,均为字符串language: 字符串image: 图像turn: 整数source: 字符串
- 分割:
train: 字节数 10500475382.445,样本数 3977
- 下载大小: 7732811345 字节
- 数据集大小: 10500475382.445 字节
配置名称:keep_bad_only
- 特征:
question_id: 字符串model: 字符串conversation: 列表,包含role和content,均为字符串language: 字符串image: 图像turn: 整数
- 分割:
test: 字节数 4760442474.92,样本数 1654
- 下载大小: 3093490423 字节
- 数据集大小: 4760442474.92 字节
配置名称:release_100_as_bench
- 特征:
question_id: 字符串model: 字符串conversation: 列表,包含role和content,均为字符串language: 字符串image: 图像turn: 整数
- 分割:
test: 字节数 306531348.0,样本数 144val: 字节数 75199805.0,样本数 52
- 下载大小: 492304000 字节
- 数据集大小: 381731153.0 字节
配置名称:release_100_as_bench_battle
- 特征:
question_id: 字符串model_a: 字符串model_b: 字符串conversation_a: 列表,包含role和content,均为字符串conversation_b: 列表,包含role和content,均为字符串language: 字符串image: 图像turn: 整数anony: 布尔值winner: 字符串tstamp: 整数judge: 字符串
- 分割:
precompute_gpt4v_vote: 字节数 8584763789.0,样本数 4032woprecompute_user_vote: 字节数 168025531.0,样本数 73precompute_evaluator_vote: 字节数 8584863881.0,样本数 4032
- 下载大小: 906902218 字节
- 数据集大小: 17337653201.0 字节
配置名称:taxonmy
- 特征:
question_id: 字符串model_a: 字符串model_b: 字符串conversation_a: 列表,包含role和content,均为字符串conversation_b: 列表,包含role和content,均为字符串language: 字符串image: 图像turn: 整数anony: 布尔值winner: 字符串tstamp: 整数judge: 字符串question_category: 字符串question_subcategory: 字符串image_domain: 字符串image_subdomain: 字符串
- 分割:
test_with_taxnomy: 字节数 13170968746.43,样本数 5695test_with_taxnomy_100: 字节数 182934614.0,样本数 100
- 下载大小: 8261937043 字节
- 数据集大小: 13353903360.43 字节
配置名称:taxonomy_battle_5_29
- 特征:
question_id: 字符串model_a: 字符串model_b: 字符串conversation_a: 列表,包含role和content,均为字符串conversation_b: 列表,包含role和content,均为字符串language: 字符串image: 图像turn: 整数anony: 布尔值winner: 字符串tstamp: 整数judge: 字符串question_category: 字符串question_subcategory: 字符串image_domain: 字符串image_subdomain: 字符串
- 分割:
test_with_taxonomy: 字节数 17273443740.424,样本数 8076
- 下载大小: 10659233517 字节
- 数据集大小: 17273443740.424 字节
数据文件配置
配置名称:battle
- 数据文件:
test:battle/test-*
配置名称:battle_5_29
- 数据文件:
test:battle_5_29/test-*
配置名称:chat
- 数据文件:
test:chat/test-*
配置名称:chat_and_battle_image
- 数据文件:
train:chat_and_battle_image/train-*
配置名称:keep_bad_only
- 数据文件:
test:keep_bad_only/test-*
配置名称:release_100_as_bench
- 数据文件:
test:release_100_as_bench/test-*val:release_100_as_bench/val-*
配置名称:release_100_as_bench_battle
- 数据文件:
precompute_gpt4v_vote:release_100_as_bench_battle/precompute_gpt4v_vote-*woprecompute_user_vote:release_100_as_bench_battle/woprecompute_user_vote-*precompute_evaluator_vote:release_100_as_bench_battle/precompute_evaluator_vote-*
配置名称:taxonmy
- 数据文件:
test_with_taxnomy:taxonmy/test_with_taxnomy-*test_with_taxnomy_100:taxonmy/test_with_taxnomy_100-*
配置名称:taxonomy_battle_5_29
- 数据文件:
test_with_taxonomy:taxonomy_battle_5_29/test_with_taxonomy-*
搜集汇总
数据集介绍

构建方式
在视觉语言模型评估领域,WildVision/wildvision-internal-data数据集通过系统化采集多轮对话与图像交互数据构建而成。其核心机制涉及组织不同模型对同一视觉问题生成响应,并引入人工或自动化评判机制标注优胜方。数据采集过程涵盖多样化视觉场景与语言类型,通过时间戳记录与匿名化处理确保数据追踪与公平性,最终形成结构化的模型对战与对话记录。
特点
该数据集以多模态对话与模型对战为核心特色,深度融合图像内容与文本交互。其结构设计包含对话轮次、语言类型、图像域及细粒度分类标签,支持对模型性能进行多维度剖析。数据规模庞大且持续更新,涵盖从通用对话到专项评估的多种配置,为视觉语言模型的鲁棒性、公平性及领域适应性研究提供了丰富且层次分明的实验素材。
使用方法
研究人员可通过加载特定配置直接访问数据集,例如利用对战配置进行模型对比评估,或使用对话配置分析单模型生成质量。数据集支持基于图像域、问题类别等标签进行子集筛选,便于开展针对性实验。典型应用包括视觉问答基准测试、多模态对话系统优化以及模型偏差分析,其预计算的评判结果亦可作为自动化评估的参考标准。
背景与挑战
背景概述
在人工智能领域,多模态大模型的评估与优化是当前研究的核心议题。WildVision/wildvision-internal-data数据集由WildVision团队构建,旨在系统性地评估视觉-语言模型在对话与图像理解任务中的性能。该数据集收录了丰富的多轮对话记录与图像数据,通过精心设计的“对战”配置,使不同模型在相同问题下生成响应,并由人工或自动化评估者判定优劣。其创建反映了研究界对模型鲁棒性、泛化能力及人类偏好对齐的深度关切,为推进多模态智能体的实用化奠定了数据基础。
当前挑战
该数据集致力于解决多模态对话模型评估中的核心挑战,即如何客观、全面地衡量模型在复杂视觉-语言交互任务中的表现。具体挑战包括:设计公平且多样化的评估场景以覆盖广泛领域,确保评估标准的一致性以减少主观偏差,以及处理多轮对话中上下文依赖性与图像语义理解的交织难题。在构建过程中,数据收集面临规模与质量的平衡,需整合海量异构数据并保证标注的准确性;同时,匿名化处理与胜者判定机制的建立也增加了工程复杂度,对数据集的可靠性与可扩展性提出了较高要求。
常用场景
经典使用场景
在视觉语言模型评估领域,WildVision/wildvision-internal-data数据集以其丰富的多模态对话记录和模型对战数据,为研究者提供了经典的使用场景。该数据集通过记录不同模型在图像对话任务中的表现,并标注胜出模型,使得研究者能够系统性地比较各类视觉语言模型的性能差异。这种对战式评估框架,不仅涵盖了文本与图像的交互,还引入了匿名化处理和多样化领域划分,为模型能力的横向对比奠定了坚实基础。
衍生相关工作
围绕该数据集,已衍生出多项经典研究工作,主要集中在自动化评估框架与模型能力细粒度分析方向。部分研究利用其对战数据训练轻量级评估模型,以替代高成本的人工标注;另有工作基于数据集的领域分类字段,深入探究模型在不同图像类型和问题类别下的性能波动。这些衍生成果不仅丰富了视觉语言模型的评估生态,也为后续构建更高效、更全面的基准测试体系提供了方法论启示。
数据集最近研究
最新研究方向
在视觉语言模型评估领域,WildVision数据集凭借其多模态对话与对战结构,正推动着模型性能评估的前沿探索。该数据集整合了图像与文本交互,通过匿名对战机制记录不同模型的响应,为研究者提供了细粒度的性能对比数据。当前研究聚焦于利用该数据集开发自动化评估框架,以降低人工标注成本,同时探索模型在跨语言、跨领域任务中的泛化能力。随着多模态人工智能的快速发展,该数据集在促进模型公平比较、揭示潜在偏差方面具有重要影响,为构建更稳健、透明的视觉语言系统奠定了数据基础。
以上内容由遇见数据集搜集并总结生成



