five

WildVision/wildvision-internal-data

收藏
Hugging Face2024-08-21 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/WildVision/wildvision-internal-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: battle features: - name: question_id dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: conversation_a list: - name: role dtype: string - name: content dtype: string - name: conversation_b list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: anony dtype: bool - name: winner dtype: string - name: tstamp dtype: int32 - name: judge dtype: string - name: domain dtype: string splits: - name: test num_bytes: 18605192639.8 num_examples: 6200 download_size: 8818061879 dataset_size: 18605192639.8 - config_name: battle_2024_08_21 features: - name: question_id dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: conversation_a list: - name: role dtype: string - name: content dtype: string - name: conversation_b list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: anony dtype: bool - name: winner dtype: string - name: tstamp dtype: int32 - name: judge dtype: string splits: - name: test num_bytes: 39514031276.948 num_examples: 13126 download_size: 15521524077 dataset_size: 39514031276.948 - config_name: battle_2024_08_21_raw features: - name: question_id dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: conversation_a list: - name: role dtype: string - name: content dtype: string - name: conversation_b list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: anony dtype: bool - name: winner dtype: string - name: tstamp dtype: int32 - name: judge dtype: string splits: - name: test num_bytes: 39227303456.13 num_examples: 13070 download_size: 15359156748 dataset_size: 39227303456.13 - config_name: battle_5_29 features: - name: question_id dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: conversation_a list: - name: role dtype: string - name: content dtype: string - name: conversation_b list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: anony dtype: bool - name: winner dtype: string - name: tstamp dtype: int32 - name: judge dtype: string splits: - name: test num_bytes: 26549445231.573 num_examples: 8847 download_size: 11520256673 dataset_size: 26549445231.573 - config_name: chat features: - name: question_id dtype: string - name: model dtype: string - name: conversation list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: domain dtype: string - name: tstamp dtype: int32 splits: - name: test num_bytes: 76283030751.608 num_examples: 34577 download_size: 28317275024 dataset_size: 76283030751.608 - config_name: chat_and_battle_image features: - name: question_id dtype: string - name: model dtype: string - name: conversation list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: source dtype: string splits: - name: train num_bytes: 10500475382.445 num_examples: 3977 download_size: 7732811345 dataset_size: 10500475382.445 - config_name: chat_image features: - name: question_id dtype: string - name: model dtype: string - name: conversation list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: domain dtype: string - name: tstamp dtype: int32 splits: - name: train num_bytes: 123011255696.48 num_examples: 55745 download_size: 42601616538 dataset_size: 123011255696.48 - config_name: keep_bad_only features: - name: question_id dtype: string - name: model dtype: string - name: conversation list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 splits: - name: test num_bytes: 4760442474.92 num_examples: 1654 download_size: 3093490423 dataset_size: 4760442474.92 - config_name: release_100_as_bench features: - name: question_id dtype: string - name: model dtype: string - name: conversation list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 splits: - name: test num_bytes: 306531348.0 num_examples: 144 - name: val num_bytes: 75199805.0 num_examples: 52 download_size: 492304000 dataset_size: 381731153.0 - config_name: release_100_as_bench_battle features: - name: question_id dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: conversation_a list: - name: role dtype: string - name: content dtype: string - name: conversation_b list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: anony dtype: bool - name: winner dtype: string - name: tstamp dtype: int32 - name: judge dtype: string splits: - name: precompute_gpt4v_vote num_bytes: 8584763789.0 num_examples: 4032 - name: woprecompute_user_vote num_bytes: 168025531.0 num_examples: 73 - name: precompute_evaluator_vote num_bytes: 8584863881.0 num_examples: 4032 download_size: 906902218 dataset_size: 17337653201.0 - config_name: taxonmy features: - name: question_id dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: conversation_a list: - name: role dtype: string - name: content dtype: string - name: conversation_b list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: anony dtype: bool - name: winner dtype: string - name: tstamp dtype: int32 - name: judge dtype: string - name: question_category dtype: string - name: question_subcategory dtype: string - name: image_domain dtype: string - name: image_subdomain dtype: string splits: - name: test_with_taxnomy num_bytes: 13170968746.43 num_examples: 5695 - name: test_with_taxnomy_100 num_bytes: 182934614.0 num_examples: 100 download_size: 8261937043 dataset_size: 13353903360.43 - config_name: taxonomy_battle_5_29 features: - name: question_id dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: conversation_a list: - name: role dtype: string - name: content dtype: string - name: conversation_b list: - name: role dtype: string - name: content dtype: string - name: language dtype: string - name: image dtype: image - name: turn dtype: int32 - name: anony dtype: bool - name: winner dtype: string - name: tstamp dtype: int32 - name: judge dtype: string - name: question_category dtype: string - name: question_subcategory dtype: string - name: image_domain dtype: string - name: image_subdomain dtype: string splits: - name: test_with_taxonomy num_bytes: 17273443740.424 num_examples: 8076 download_size: 10659233517 dataset_size: 17273443740.424 configs: - config_name: battle data_files: - split: test path: battle/test-* - config_name: battle_2024_08_21 data_files: - split: test path: battle_2024_08_21/test-* - config_name: battle_2024_08_21_raw data_files: - split: test path: battle_2024_08_21_raw/test-* - config_name: battle_5_29 data_files: - split: test path: battle_5_29/test-* - config_name: chat data_files: - split: test path: chat/test-* - config_name: chat_and_battle_image data_files: - split: train path: chat_and_battle_image/train-* - config_name: chat_image data_files: - split: train path: chat_image/train-* - config_name: keep_bad_only data_files: - split: test path: keep_bad_only/test-* - config_name: release_100_as_bench data_files: - split: test path: release_100_as_bench/test-* - split: val path: release_100_as_bench/val-* - config_name: release_100_as_bench_battle data_files: - split: precompute_gpt4v_vote path: release_100_as_bench_battle/precompute_gpt4v_vote-* - split: woprecompute_user_vote path: release_100_as_bench_battle/woprecompute_user_vote-* - split: precompute_evaluator_vote path: release_100_as_bench_battle/precompute_evaluator_vote-* - config_name: taxonmy data_files: - split: test_with_taxnomy path: taxonmy/test_with_taxnomy-* - split: test_with_taxnomy_100 path: taxonmy/test_with_taxnomy_100-* - config_name: taxonomy_battle_5_29 data_files: - split: test_with_taxonomy path: taxonomy_battle_5_29/test_with_taxonomy-* --- # Dataset Card for Dataset Name <!-- Provide a quick summary of the dataset. --> This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1). ## Dataset Details ### Dataset Description <!-- Provide a longer summary of what this dataset is. --> - **Curated by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] ### Dataset Sources [optional] <!-- Provide the basic links for the dataset. --> - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses <!-- Address questions around how the dataset is intended to be used. --> ### Direct Use <!-- This section describes suitable use cases for the dataset. --> [More Information Needed] ### Out-of-Scope Use <!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. --> [More Information Needed] ## Dataset Structure <!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. --> [More Information Needed] ## Dataset Creation ### Curation Rationale <!-- Motivation for the creation of this dataset. --> [More Information Needed] ### Source Data <!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). --> #### Data Collection and Processing <!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. --> [More Information Needed] #### Who are the source data producers? <!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. --> [More Information Needed] ### Annotations [optional] <!-- If the dataset contains annotations which are not part of the initial data collection, use this section to describe them. --> #### Annotation process <!-- This section describes the annotation process such as annotation tools used in the process, the amount of data annotated, annotation guidelines provided to the annotators, interannotator statistics, annotation validation, etc. --> [More Information Needed] #### Who are the annotators? <!-- This section describes the people or systems who created the annotations. --> [More Information Needed] #### Personal and Sensitive Information <!-- State whether the dataset contains data that might be considered personal, sensitive, or private (e.g., data that reveals addresses, uniquely identifiable names or aliases, racial or ethnic origins, sexual orientations, religious beliefs, political opinions, financial or health data, etc.). If efforts were made to anonymize the data, describe the anonymization process. --> [More Information Needed] ## Bias, Risks, and Limitations <!-- This section is meant to convey both technical and sociotechnical limitations. --> [More Information Needed] ### Recommendations <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Citation [optional] <!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. --> **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] <!-- If relevant, include terms and calculations in this section that can help readers understand the dataset or dataset card. --> [More Information Needed] ## More Information [optional] [More Information Needed] ## Dataset Card Authors [optional] [More Information Needed] ## Dataset Card Contact [More Information Needed]
提供机构:
WildVision
原始信息汇总

数据集概述

数据集配置详情

配置名称:battle

  • 特征:
    • question_id: 字符串
    • model_a: 字符串
    • model_b: 字符串
    • conversation_a: 列表,包含 rolecontent,均为字符串
    • conversation_b: 列表,包含 rolecontent,均为字符串
    • language: 字符串
    • image: 图像
    • turn: 整数
    • anony: 布尔值
    • winner: 字符串
    • tstamp: 整数
    • judge: 字符串
    • domain: 字符串
  • 分割:
    • test: 字节数 18605192639.8,样本数 6200
  • 下载大小: 8818061879 字节
  • 数据集大小: 18605192639.8 字节

配置名称:battle_5_29

  • 特征:
    • question_id: 字符串
    • model_a: 字符串
    • model_b: 字符串
    • conversation_a: 列表,包含 rolecontent,均为字符串
    • conversation_b: 列表,包含 rolecontent,均为字符串
    • language: 字符串
    • image: 图像
    • turn: 整数
    • anony: 布尔值
    • winner: 字符串
    • tstamp: 整数
    • judge: 字符串
  • 分割:
    • test: 字节数 26549445231.573,样本数 8847
  • 下载大小: 11520256673 字节
  • 数据集大小: 26549445231.573 字节

配置名称:chat

  • 特征:
    • question_id: 字符串
    • model: 字符串
    • conversation: 列表,包含 rolecontent,均为字符串
    • language: 字符串
    • image: 图像
    • turn: 整数
    • domain: 字符串
    • tstamp: 整数
  • 分割:
    • test: 字节数 76283030751.608,样本数 34577
  • 下载大小: 28317275024 字节
  • 数据集大小: 76283030751.608 字节

配置名称:chat_and_battle_image

  • 特征:
    • question_id: 字符串
    • model: 字符串
    • conversation: 列表,包含 rolecontent,均为字符串
    • language: 字符串
    • image: 图像
    • turn: 整数
    • source: 字符串
  • 分割:
    • train: 字节数 10500475382.445,样本数 3977
  • 下载大小: 7732811345 字节
  • 数据集大小: 10500475382.445 字节

配置名称:keep_bad_only

  • 特征:
    • question_id: 字符串
    • model: 字符串
    • conversation: 列表,包含 rolecontent,均为字符串
    • language: 字符串
    • image: 图像
    • turn: 整数
  • 分割:
    • test: 字节数 4760442474.92,样本数 1654
  • 下载大小: 3093490423 字节
  • 数据集大小: 4760442474.92 字节

配置名称:release_100_as_bench

  • 特征:
    • question_id: 字符串
    • model: 字符串
    • conversation: 列表,包含 rolecontent,均为字符串
    • language: 字符串
    • image: 图像
    • turn: 整数
  • 分割:
    • test: 字节数 306531348.0,样本数 144
    • val: 字节数 75199805.0,样本数 52
  • 下载大小: 492304000 字节
  • 数据集大小: 381731153.0 字节

配置名称:release_100_as_bench_battle

  • 特征:
    • question_id: 字符串
    • model_a: 字符串
    • model_b: 字符串
    • conversation_a: 列表,包含 rolecontent,均为字符串
    • conversation_b: 列表,包含 rolecontent,均为字符串
    • language: 字符串
    • image: 图像
    • turn: 整数
    • anony: 布尔值
    • winner: 字符串
    • tstamp: 整数
    • judge: 字符串
  • 分割:
    • precompute_gpt4v_vote: 字节数 8584763789.0,样本数 4032
    • woprecompute_user_vote: 字节数 168025531.0,样本数 73
    • precompute_evaluator_vote: 字节数 8584863881.0,样本数 4032
  • 下载大小: 906902218 字节
  • 数据集大小: 17337653201.0 字节

配置名称:taxonmy

  • 特征:
    • question_id: 字符串
    • model_a: 字符串
    • model_b: 字符串
    • conversation_a: 列表,包含 rolecontent,均为字符串
    • conversation_b: 列表,包含 rolecontent,均为字符串
    • language: 字符串
    • image: 图像
    • turn: 整数
    • anony: 布尔值
    • winner: 字符串
    • tstamp: 整数
    • judge: 字符串
    • question_category: 字符串
    • question_subcategory: 字符串
    • image_domain: 字符串
    • image_subdomain: 字符串
  • 分割:
    • test_with_taxnomy: 字节数 13170968746.43,样本数 5695
    • test_with_taxnomy_100: 字节数 182934614.0,样本数 100
  • 下载大小: 8261937043 字节
  • 数据集大小: 13353903360.43 字节

配置名称:taxonomy_battle_5_29

  • 特征:
    • question_id: 字符串
    • model_a: 字符串
    • model_b: 字符串
    • conversation_a: 列表,包含 rolecontent,均为字符串
    • conversation_b: 列表,包含 rolecontent,均为字符串
    • language: 字符串
    • image: 图像
    • turn: 整数
    • anony: 布尔值
    • winner: 字符串
    • tstamp: 整数
    • judge: 字符串
    • question_category: 字符串
    • question_subcategory: 字符串
    • image_domain: 字符串
    • image_subdomain: 字符串
  • 分割:
    • test_with_taxonomy: 字节数 17273443740.424,样本数 8076
  • 下载大小: 10659233517 字节
  • 数据集大小: 17273443740.424 字节

数据文件配置

配置名称:battle

  • 数据文件:
    • test: battle/test-*

配置名称:battle_5_29

  • 数据文件:
    • test: battle_5_29/test-*

配置名称:chat

  • 数据文件:
    • test: chat/test-*

配置名称:chat_and_battle_image

  • 数据文件:
    • train: chat_and_battle_image/train-*

配置名称:keep_bad_only

  • 数据文件:
    • test: keep_bad_only/test-*

配置名称:release_100_as_bench

  • 数据文件:
    • test: release_100_as_bench/test-*
    • val: release_100_as_bench/val-*

配置名称:release_100_as_bench_battle

  • 数据文件:
    • precompute_gpt4v_vote: release_100_as_bench_battle/precompute_gpt4v_vote-*
    • woprecompute_user_vote: release_100_as_bench_battle/woprecompute_user_vote-*
    • precompute_evaluator_vote: release_100_as_bench_battle/precompute_evaluator_vote-*

配置名称:taxonmy

  • 数据文件:
    • test_with_taxnomy: taxonmy/test_with_taxnomy-*
    • test_with_taxnomy_100: taxonmy/test_with_taxnomy_100-*

配置名称:taxonomy_battle_5_29

  • 数据文件:
    • test_with_taxonomy: taxonomy_battle_5_29/test_with_taxonomy-*
搜集汇总
数据集介绍
main_image_url
构建方式
在视觉语言模型评估领域,WildVision/wildvision-internal-data数据集通过系统化采集多轮对话与图像交互数据构建而成。其核心机制涉及组织不同模型对同一视觉问题生成响应,并引入人工或自动化评判机制标注优胜方。数据采集过程涵盖多样化视觉场景与语言类型,通过时间戳记录与匿名化处理确保数据追踪与公平性,最终形成结构化的模型对战与对话记录。
特点
该数据集以多模态对话与模型对战为核心特色,深度融合图像内容与文本交互。其结构设计包含对话轮次、语言类型、图像域及细粒度分类标签,支持对模型性能进行多维度剖析。数据规模庞大且持续更新,涵盖从通用对话到专项评估的多种配置,为视觉语言模型的鲁棒性、公平性及领域适应性研究提供了丰富且层次分明的实验素材。
使用方法
研究人员可通过加载特定配置直接访问数据集,例如利用对战配置进行模型对比评估,或使用对话配置分析单模型生成质量。数据集支持基于图像域、问题类别等标签进行子集筛选,便于开展针对性实验。典型应用包括视觉问答基准测试、多模态对话系统优化以及模型偏差分析,其预计算的评判结果亦可作为自动化评估的参考标准。
背景与挑战
背景概述
在人工智能领域,多模态大模型的评估与优化是当前研究的核心议题。WildVision/wildvision-internal-data数据集由WildVision团队构建,旨在系统性地评估视觉-语言模型在对话与图像理解任务中的性能。该数据集收录了丰富的多轮对话记录与图像数据,通过精心设计的“对战”配置,使不同模型在相同问题下生成响应,并由人工或自动化评估者判定优劣。其创建反映了研究界对模型鲁棒性、泛化能力及人类偏好对齐的深度关切,为推进多模态智能体的实用化奠定了数据基础。
当前挑战
该数据集致力于解决多模态对话模型评估中的核心挑战,即如何客观、全面地衡量模型在复杂视觉-语言交互任务中的表现。具体挑战包括:设计公平且多样化的评估场景以覆盖广泛领域,确保评估标准的一致性以减少主观偏差,以及处理多轮对话中上下文依赖性与图像语义理解的交织难题。在构建过程中,数据收集面临规模与质量的平衡,需整合海量异构数据并保证标注的准确性;同时,匿名化处理与胜者判定机制的建立也增加了工程复杂度,对数据集的可靠性与可扩展性提出了较高要求。
常用场景
经典使用场景
在视觉语言模型评估领域,WildVision/wildvision-internal-data数据集以其丰富的多模态对话记录和模型对战数据,为研究者提供了经典的使用场景。该数据集通过记录不同模型在图像对话任务中的表现,并标注胜出模型,使得研究者能够系统性地比较各类视觉语言模型的性能差异。这种对战式评估框架,不仅涵盖了文本与图像的交互,还引入了匿名化处理和多样化领域划分,为模型能力的横向对比奠定了坚实基础。
衍生相关工作
围绕该数据集,已衍生出多项经典研究工作,主要集中在自动化评估框架与模型能力细粒度分析方向。部分研究利用其对战数据训练轻量级评估模型,以替代高成本的人工标注;另有工作基于数据集的领域分类字段,深入探究模型在不同图像类型和问题类别下的性能波动。这些衍生成果不仅丰富了视觉语言模型的评估生态,也为后续构建更高效、更全面的基准测试体系提供了方法论启示。
数据集最近研究
最新研究方向
在视觉语言模型评估领域,WildVision数据集凭借其多模态对话与对战结构,正推动着模型性能评估的前沿探索。该数据集整合了图像与文本交互,通过匿名对战机制记录不同模型的响应,为研究者提供了细粒度的性能对比数据。当前研究聚焦于利用该数据集开发自动化评估框架,以降低人工标注成本,同时探索模型在跨语言、跨领域任务中的泛化能力。随着多模态人工智能的快速发展,该数据集在促进模型公平比较、揭示潜在偏差方面具有重要影响,为构建更稳健、透明的视觉语言系统奠定了数据基础。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作