WildVision/wildvision-internal-data

Name: WildVision/wildvision-internal-data
Creator: WildVision
Published: 2024-08-21 20:32:43
License: 暂无描述

Hugging Face2024-08-21 更新2024-06-22 收录

下载链接：

https://hf-mirror.com/datasets/WildVision/wildvision-internal-data

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: battle features: - name: question_id dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: conversation_a list: - name: role dtype: string - name: content dtype: string - name: conversation_b list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: anony dtype: bool - name: winner dtype: string - name: tstamp dtype: int32 - name: judge dtype: string - name: domain dtype: string splits: - name: test num_bytes: 18605192639.8 num_examples: 6200 download_size: 8818061879 dataset_size: 18605192639.8 - config_name: battle_2024_08_21 features: - name: question_id dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: conversation_a list: - name: role dtype: string - name: content dtype: string - name: conversation_b list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: anony dtype: bool - name: winner dtype: string - name: tstamp dtype: int32 - name: judge dtype: string splits: - name: test num_bytes: 39514031276.948 num_examples: 13126 download_size: 15521524077 dataset_size: 39514031276.948 - config_name: battle_2024_08_21_raw features: - name: question_id dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: conversation_a list: - name: role dtype: string - name: content dtype: string - name: conversation_b list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: anony dtype: bool - name: winner dtype: string - name: tstamp dtype: int32 - name: judge dtype: string splits: - name: test num_bytes: 39227303456.13 num_examples: 13070 download_size: 15359156748 dataset_size: 39227303456.13 - config_name: battle_5_29 features: - name: question_id dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: conversation_a list: - name: role dtype: string - name: content dtype: string - name: conversation_b list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: anony dtype: bool - name: winner dtype: string - name: tstamp dtype: int32 - name: judge dtype: string splits: - name: test num_bytes: 26549445231.573 num_examples: 8847 download_size: 11520256673 dataset_size: 26549445231.573 - config_name: chat features: - name: question_id dtype: string - name: model dtype: string - name: conversation list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: domain dtype: string - name: tstamp dtype: int32 splits: - name: test num_bytes: 76283030751.608 num_examples: 34577 download_size: 28317275024 dataset_size: 76283030751.608 - config_name: chat_and_battle_image features: - name: question_id dtype: string - name: model dtype: string - name: conversation list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: source dtype: string splits: - name: train num_bytes: 10500475382.445 num_examples: 3977 download_size: 7732811345 dataset_size: 10500475382.445 - config_name: chat_image features: - name: question_id dtype: string - name: model dtype: string - name: conversation list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: domain dtype: string - name: tstamp dtype: int32 splits: - name: train num_bytes: 123011255696.48 num_examples: 55745 download_size: 42601616538 dataset_size: 123011255696.48 - config_name: keep_bad_only features: - name: question_id dtype: string - name: model dtype: string - name: conversation list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 splits: - name: test num_bytes: 4760442474.92 num_examples: 1654 download_size: 3093490423 dataset_size: 4760442474.92 - config_name: release_100_as_bench features: - name: question_id dtype: string - name: model dtype: string - name: conversation list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 splits: - name: test num_bytes: 306531348.0 num_examples: 144 - name: val num_bytes: 75199805.0 num_examples: 52 download_size: 492304000 dataset_size: 381731153.0 - config_name: release_100_as_bench_battle features: - name: question_id dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: conversation_a list: - name: role dtype: string - name: content dtype: string - name: conversation_b list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: anony dtype: bool - name: winner dtype: string - name: tstamp dtype: int32 - name: judge dtype: string splits: - name: precompute_gpt4v_vote num_bytes: 8584763789.0 num_examples: 4032 - name: woprecompute_user_vote num_bytes: 168025531.0 num_examples: 73 - name: precompute_evaluator_vote num_bytes: 8584863881.0 num_examples: 4032 download_size: 906902218 dataset_size: 17337653201.0 - config_name: taxonmy features: - name: question_id dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: conversation_a list: - name: role dtype: string - name: content dtype: string - name: conversation_b list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: anony dtype: bool - name: winner dtype: string - name: tstamp dtype: int32 - name: judge dtype: string - name: question_category dtype: string - name: question_subcategory dtype: string - name: image_domain dtype: string - name: image_subdomain dtype: string splits: - name: test_with_taxnomy num_bytes: 13170968746.43 num_examples: 5695 - name: test_with_taxnomy_100 num_bytes: 182934614.0 num_examples: 100 download_size: 8261937043 dataset_size: 13353903360.43 - config_name: taxonomy_battle_5_29 features: - name: question_id dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: conversation_a list: - name: role dtype: string - name: content dtype: string - name: conversation_b list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: anony dtype: bool - name: winner dtype: string - name: tstamp dtype: int32 - name: judge dtype: string - name: question_category dtype: string - name: question_subcategory dtype: string - name: image_domain dtype: string - name: image_subdomain dtype: string splits: - name: test_with_taxonomy num_bytes: 17273443740.424 num_examples: 8076 download_size: 10659233517 dataset_size: 17273443740.424 configs: - config_name: battle data_files: - split: test path: battle/test-* - config_name: battle_2024_08_21 data_files: - split: test path: battle_2024_08_21/test-* - config_name: battle_2024_08_21_raw data_files: - split: test path: battle_2024_08_21_raw/test-* - config_name: battle_5_29 data_files: - split: test path: battle_5_29/test-* - config_name: chat data_files: - split: test path: chat/test-* - config_name: chat_and_battle_image data_files: - split: train path: chat_and_battle_image/train-* - config_name: chat_image data_files: - split: train path: chat_image/train-* - config_name: keep_bad_only data_files: - split: test path: keep_bad_only/test-* - config_name: release_100_as_bench data_files: - split: test path: release_100_as_bench/test-* - split: val path: release_100_as_bench/val-* - config_name: release_100_as_bench_battle data_files: - split: precompute_gpt4v_vote path: release_100_as_bench_battle/precompute_gpt4v_vote-* - split: woprecompute_user_vote path: release_100_as_bench_battle/woprecompute_user_vote-* - split: precompute_evaluator_vote path: release_100_as_bench_battle/precompute_evaluator_vote-* - config_name: taxonmy data_files: - split: test_with_taxnomy path: taxonmy/test_with_taxnomy-* - split: test_with_taxnomy_100 path: taxonmy/test_with_taxnomy_100-* - config_name: taxonomy_battle_5_29 data_files: - split: test_with_taxonomy path: taxonomy_battle_5_29/test_with_taxonomy-* --- # Dataset Card for Dataset Name  This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1). ## Dataset Details ### Dataset Description  - **Curated by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] ### Dataset Sources [optional]  - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses  ### Direct Use  [More Information Needed] ### Out-of-Scope Use  [More Information Needed] ## Dataset Structure  [More Information Needed] ## Dataset Creation ### Curation Rationale  [More Information Needed] ### Source Data  #### Data Collection and Processing  [More Information Needed] #### Who are the source data producers?  [More Information Needed] ### Annotations [optional]  #### Annotation process  [More Information Needed] #### Who are the annotators?  [More Information Needed] #### Personal and Sensitive Information  [More Information Needed] ## Bias, Risks, and Limitations  [More Information Needed] ### Recommendations  Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Citation [optional]  **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional]  [More Information Needed] ## More Information [optional] [More Information Needed] ## Dataset Card Authors [optional] [More Information Needed] ## Dataset Card Contact [More Information Needed]

提供机构：

WildVision

原始信息汇总

数据集概述

数据集配置详情

配置名称：battle

特征:
- question_id: 字符串
- model_a: 字符串
- model_b: 字符串
- conversation_a: 列表，包含 role 和 content，均为字符串
- conversation_b: 列表，包含 role 和 content，均为字符串
- language: 字符串
- image: 图像
- turn: 整数
- anony: 布尔值
- winner: 字符串
- tstamp: 整数
- judge: 字符串
- domain: 字符串
分割:
- test: 字节数 18605192639.8，样本数 6200
下载大小: 8818061879 字节
数据集大小: 18605192639.8 字节

配置名称：battle_5_29

特征:
- question_id: 字符串
- model_a: 字符串
- model_b: 字符串
- conversation_a: 列表，包含 role 和 content，均为字符串
- conversation_b: 列表，包含 role 和 content，均为字符串
- language: 字符串
- image: 图像
- turn: 整数
- anony: 布尔值
- winner: 字符串
- tstamp: 整数
- judge: 字符串
分割:
- test: 字节数 26549445231.573，样本数 8847
下载大小: 11520256673 字节
数据集大小: 26549445231.573 字节

配置名称：chat

特征:
- question_id: 字符串
- model: 字符串
- conversation: 列表，包含 role 和 content，均为字符串
- language: 字符串
- image: 图像
- turn: 整数
- domain: 字符串
- tstamp: 整数
分割:
- test: 字节数 76283030751.608，样本数 34577
下载大小: 28317275024 字节
数据集大小: 76283030751.608 字节

配置名称：chat_and_battle_image

特征:
- question_id: 字符串
- model: 字符串
- conversation: 列表，包含 role 和 content，均为字符串
- language: 字符串
- image: 图像
- turn: 整数
- source: 字符串
分割:
- train: 字节数 10500475382.445，样本数 3977
下载大小: 7732811345 字节
数据集大小: 10500475382.445 字节

配置名称：keep_bad_only

特征:
- question_id: 字符串
- model: 字符串
- conversation: 列表，包含 role 和 content，均为字符串
- language: 字符串
- image: 图像
- turn: 整数
分割:
- test: 字节数 4760442474.92，样本数 1654
下载大小: 3093490423 字节
数据集大小: 4760442474.92 字节

配置名称：release_100_as_bench

特征:
- question_id: 字符串
- model: 字符串
- conversation: 列表，包含 role 和 content，均为字符串
- language: 字符串
- image: 图像
- turn: 整数
分割:
- test: 字节数 306531348.0，样本数 144
- val: 字节数 75199805.0，样本数 52
下载大小: 492304000 字节
数据集大小: 381731153.0 字节

配置名称：release_100_as_bench_battle

特征:
- question_id: 字符串
- model_a: 字符串
- model_b: 字符串
- conversation_a: 列表，包含 role 和 content，均为字符串
- conversation_b: 列表，包含 role 和 content，均为字符串
- language: 字符串
- image: 图像
- turn: 整数
- anony: 布尔值
- winner: 字符串
- tstamp: 整数
- judge: 字符串
分割:
- precompute_gpt4v_vote: 字节数 8584763789.0，样本数 4032
- woprecompute_user_vote: 字节数 168025531.0，样本数 73
- precompute_evaluator_vote: 字节数 8584863881.0，样本数 4032
下载大小: 906902218 字节
数据集大小: 17337653201.0 字节

配置名称：taxonmy

特征:
- question_id: 字符串
- model_a: 字符串
- model_b: 字符串
- conversation_a: 列表，包含 role 和 content，均为字符串
- conversation_b: 列表，包含 role 和 content，均为字符串
- language: 字符串
- image: 图像
- turn: 整数
- anony: 布尔值
- winner: 字符串
- tstamp: 整数
- judge: 字符串
- question_category: 字符串
- question_subcategory: 字符串
- image_domain: 字符串
- image_subdomain: 字符串
分割:
- test_with_taxnomy: 字节数 13170968746.43，样本数 5695
- test_with_taxnomy_100: 字节数 182934614.0，样本数 100
下载大小: 8261937043 字节
数据集大小: 13353903360.43 字节

配置名称：taxonomy_battle_5_29

特征:
- question_id: 字符串
- model_a: 字符串
- model_b: 字符串
- conversation_a: 列表，包含 role 和 content，均为字符串
- conversation_b: 列表，包含 role 和 content，均为字符串
- language: 字符串
- image: 图像
- turn: 整数
- anony: 布尔值
- winner: 字符串
- tstamp: 整数
- judge: 字符串
- question_category: 字符串
- question_subcategory: 字符串
- image_domain: 字符串
- image_subdomain: 字符串
分割:
- test_with_taxonomy: 字节数 17273443740.424，样本数 8076
下载大小: 10659233517 字节
数据集大小: 17273443740.424 字节

数据文件配置

配置名称：battle

数据文件:
- test: battle/test-*

配置名称：battle_5_29

数据文件:
- test: battle_5_29/test-*

配置名称：chat

数据文件:
- test: chat/test-*

配置名称：chat_and_battle_image

数据文件:
- train: chat_and_battle_image/train-*

配置名称：keep_bad_only

数据文件:
- test: keep_bad_only/test-*

配置名称：release_100_as_bench

数据文件:
- test: release_100_as_bench/test-*
- val: release_100_as_bench/val-*

配置名称：release_100_as_bench_battle

数据文件:
- precompute_gpt4v_vote: release_100_as_bench_battle/precompute_gpt4v_vote-*
- woprecompute_user_vote: release_100_as_bench_battle/woprecompute_user_vote-*
- precompute_evaluator_vote: release_100_as_bench_battle/precompute_evaluator_vote-*

配置名称：taxonmy

数据文件:
- test_with_taxnomy: taxonmy/test_with_taxnomy-*
- test_with_taxnomy_100: taxonmy/test_with_taxnomy_100-*

配置名称：taxonomy_battle_5_29

数据文件:
- test_with_taxonomy: taxonomy_battle_5_29/test_with_taxonomy-*

搜集汇总

数据集介绍

构建方式

在视觉语言模型评估领域，WildVision/wildvision-internal-data数据集通过系统化采集多轮对话与图像交互数据构建而成。其核心机制涉及组织不同模型对同一视觉问题生成响应，并引入人工或自动化评判机制标注优胜方。数据采集过程涵盖多样化视觉场景与语言类型，通过时间戳记录与匿名化处理确保数据追踪与公平性，最终形成结构化的模型对战与对话记录。

特点

该数据集以多模态对话与模型对战为核心特色，深度融合图像内容与文本交互。其结构设计包含对话轮次、语言类型、图像域及细粒度分类标签，支持对模型性能进行多维度剖析。数据规模庞大且持续更新，涵盖从通用对话到专项评估的多种配置，为视觉语言模型的鲁棒性、公平性及领域适应性研究提供了丰富且层次分明的实验素材。

使用方法

研究人员可通过加载特定配置直接访问数据集，例如利用对战配置进行模型对比评估，或使用对话配置分析单模型生成质量。数据集支持基于图像域、问题类别等标签进行子集筛选，便于开展针对性实验。典型应用包括视觉问答基准测试、多模态对话系统优化以及模型偏差分析，其预计算的评判结果亦可作为自动化评估的参考标准。

背景与挑战

背景概述

在人工智能领域，多模态大模型的评估与优化是当前研究的核心议题。WildVision/wildvision-internal-data数据集由WildVision团队构建，旨在系统性地评估视觉-语言模型在对话与图像理解任务中的性能。该数据集收录了丰富的多轮对话记录与图像数据，通过精心设计的“对战”配置，使不同模型在相同问题下生成响应，并由人工或自动化评估者判定优劣。其创建反映了研究界对模型鲁棒性、泛化能力及人类偏好对齐的深度关切，为推进多模态智能体的实用化奠定了数据基础。

当前挑战

该数据集致力于解决多模态对话模型评估中的核心挑战，即如何客观、全面地衡量模型在复杂视觉-语言交互任务中的表现。具体挑战包括：设计公平且多样化的评估场景以覆盖广泛领域，确保评估标准的一致性以减少主观偏差，以及处理多轮对话中上下文依赖性与图像语义理解的交织难题。在构建过程中，数据收集面临规模与质量的平衡，需整合海量异构数据并保证标注的准确性；同时，匿名化处理与胜者判定机制的建立也增加了工程复杂度，对数据集的可靠性与可扩展性提出了较高要求。

常用场景

经典使用场景

在视觉语言模型评估领域，WildVision/wildvision-internal-data数据集以其丰富的多模态对话记录和模型对战数据，为研究者提供了经典的使用场景。该数据集通过记录不同模型在图像对话任务中的表现，并标注胜出模型，使得研究者能够系统性地比较各类视觉语言模型的性能差异。这种对战式评估框架，不仅涵盖了文本与图像的交互，还引入了匿名化处理和多样化领域划分，为模型能力的横向对比奠定了坚实基础。

衍生相关工作

围绕该数据集，已衍生出多项经典研究工作，主要集中在自动化评估框架与模型能力细粒度分析方向。部分研究利用其对战数据训练轻量级评估模型，以替代高成本的人工标注；另有工作基于数据集的领域分类字段，深入探究模型在不同图像类型和问题类别下的性能波动。这些衍生成果不仅丰富了视觉语言模型的评估生态，也为后续构建更高效、更全面的基准测试体系提供了方法论启示。

数据集最近研究