VisionArena-Battle
收藏魔搭社区2025-11-12 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/lmarena-ai/VisionArena-Battle
下载链接
链接失效反馈官方服务:
资源简介:

# VisionArena-Battle: 30K Real-World Image Conversations with Pairwise Preference Votes
30k single and multi-turn chats between users and 2 anonymous VLM's with preference votes collected on [Chatbot Arena](https://lmarena.ai/).
**WARNING:** Images may contain inappropriate content.

## Dataset Details
* 30K conversations
* 19 VLM's
* 90 languages
* ~23k unique images
* Question Category Tags (Captioning, OCR, Entity Recognition, Coding, Homework, Diagram, Humor, Creative Writing, Refusal)
### Dataset Description
30,000 conversations where users interact with two anonymized VLMs, along with preference votes indicating which response they prefer.
VisionArena-Battle is a dataset of preference annotations collected through the open-source platform [Chatbot Arena](https://lmarena.ai/), where users engage in anonymous side-by-side chats with pairs of large language or vision-language models. Users provide preference votes for responses, which are aggregated using the Bradley-Terry model to compute [leaderboard rankings](https://lmarena.ai/?leaderboard).
The dataset includes conversations from February 2024 to September 2024. Users explicitly agree to have their conversations shared before chatting. We apply an [NSFW](https://learn.microsoft.com/en-us/azure/ai-services/content-moderator/image-moderation-api), [CSAM](https://www.microsoft.com/en-us/photodna?oneroute=true), PII ([text](https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-pii-detection), [images](https://cloud.google.com/sensitive-data-protection/docs/redacting-sensitive-data-images)), and face detectors ([1](https://cloud.google.com/vision/docs/detecting-faces), [2](https://github.com/ageitgey/face_recognition)) to remove any inappropriate images, personally identifiable images/text, or images with human faces. These detectors are not perfect, so such images may still exist in the dataset.
### Relevant Links
- **Repository:** https://github.com/lm-sys/FastChat
- **Paper:** https://arxiv.org/abs/2412.08687
- **Contribute your vote!** https://lmarena.ai/
## Dataset Structure
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
* **model_a**, **model_b** - model identity for left and right fields. These are revealed to the user after they vote.
* **images** - image (note: these are single image conversations only)
* **conversation_a**, **conversation_b** - conversation with each model
* **winner** - which response the user preferred: can be "model_a", "model_b", "tie", and "tie (bothbad)"
* **judge** - hash of user id, based on IP address
* **categories** - category labels (note: a question can belong to multiple categories)
* **num_turns** - number of conversation turns
* **tstamp** timestamp of when the conversation took place
* **conv_metadata** - metadata used for the style features used to compute the [style control leaderboard](https://blog.lmarena.ai/blog/2024/style-control/). This includes conversation length, number of markdown headers, number of lists, number of bolded words, etc.
## Bias, Risks, and Limitations
This benchmark is designed to measure human preferences rather than explicitly evaluate factual accuracy.
This dataset contains a large amount of STEM related questions, OCR tasks, and general problems like captioning. This dataset contains less questions which relate to specialized domains outside of stem.
**If you find your face or personal information in this dataset and wish to have it removed, or if you find hateful or inappropriate content,** please contact us at lmarena.ai@gmail.com or lisabdunlap@berkeley.edu.
**BibTeX:**
```
@article{chou2024visionarena,
title={VisionArena: 230K Real World User-VLM Conversations with Preference Labels},
author={Christopher Chou and Lisa Dunlap and Koki Mashita and Krishna Mandal and Trevor Darrell and Ion Stoica and Joseph E. Gonzalez and Wei-Lin Chiang},
year={2024},
eprint={2412.08687},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.08687},
}
```
## LMArena VisionArena dataset License Agreement
This Agreement contains the terms and conditions that govern your access and use of the LMArena VisionArena dataset (as defined above). You may not use the LMArena VisionArena dataset if you do not accept this Agreement. By clicking to accept, accessing the LMArena VisionArena dataset, or both, you hereby agree to the terms of the Agreement. If you are agreeing to be bound by the Agreement on behalf of your employer or another entity, you represent and warrant that you have full legal authority to bind your employer or such entity to this Agreement. If you do not have the requisite authority, you may not accept the Agreement or access the LMArena VisionArena dataset on behalf of your employer or another entity.
* Safety and Moderation: This dataset contains unsafe conversations that may be perceived as offensive or unsettling. User should apply appropriate filters and safety measures before utilizing this dataset for training dialogue agents.
* Non-Endorsement: The views and opinions depicted in this dataset do not reflect the perspectives of the researchers or affiliated institutions engaged in the data collection process.
* Legal Compliance: You are mandated to use it in adherence with all pertinent laws and regulations.
* Model Specific Terms: When leveraging direct outputs of a specific model, users must adhere to its corresponding terms of use.
* Non-Identification: You must not attempt to identify the identities of individuals or infer any sensitive personal data encompassed in this dataset.
* Prohibited Transfers: You should not distribute, copy, disclose, assign, sublicense, embed, host, or otherwise transfer the dataset to any third party.
* Right to Request Deletion: At any time, we may require you to delete all copies of the conversation dataset (in whole or in part) in your possession and control. You will promptly comply with any and all such requests. Upon our request, you shall provide us with written confirmation of your compliance with such requirement.
* Termination: We may, at any time, for any reason or for no reason, terminate this Agreement, effective immediately upon notice to you. Upon termination, the license granted to you hereunder will immediately terminate, and you will immediately stop using the LMArena VisionArena dataset and destroy all copies of the LMArena VisionArena dataset and related materials in your possession or control.
* Limitation of Liability: IN NO EVENT WILL WE BE LIABLE FOR ANY CONSEQUENTIAL, INCIDENTAL, EXEMPLARY, PUNITIVE, SPECIAL, OR INDIRECT DAMAGES (INCLUDING DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION, OR LOSS OF INFORMATION) ARISING OUT OF OR RELATING TO THIS AGREEMENT OR ITS SUBJECT MATTER, EVEN IF WE HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
* Subject to your compliance with the terms and conditions of this Agreement, we grant to you, a limited, non-exclusive, non-transferable, non-sublicensable license to use the LMArena VisionArena dataset, including the conversation data and annotations, to research, develop, and improve software, algorithms, machine learning models, techniques, and technologies for both research and commercial purposes.

# VisionArena-Battle:3万条真实世界图像对话与成对偏好投票
3万条用户与2个匿名视觉语言模型(Vision Language Model, VLM)的单轮与多轮对话,相关偏好投票数据采集自[Chatbot Arena](https://lmarena.ai/)平台。
**警告:** 数据集内图像可能包含不适宜内容。

## 数据集详情
* 3万条对话
* 19个视觉语言模型
* 覆盖90种语言
* 约2.3万张独特图像
* 问题类别标签(图像描述、光学字符识别、实体识别、代码编写、作业解答、图表分析、幽默创作、创意写作、拒绝回答)
### 数据集描述
本数据集包含3万条用户与两个匿名视觉语言模型的交互对话,附带标注用户偏好的投票数据,用以表明用户更青睐哪一模型的回复。
VisionArena-Battle是一个通过开源平台[Chatbot Arena](https://lmarena.ai/)采集的偏好标注数据集,用户可在此平台匿名并行与成对的大语言模型(Large Language Model, LLM)或视觉语言模型进行对话,并为模型回复投票表达偏好。平台采用布拉德利-特里(Bradley-Terry)模型对投票结果进行聚合,以计算[模型排行榜](https://lmarena.ai/?leaderboard)。
本数据集涵盖2024年2月至2024年9月期间的对话数据。用户在开始对话前已明确同意共享其对话内容。我们采用了[不适当内容检测(Not Safe For Work, NSFW)](https://learn.microsoft.com/en-us/azure/ai-services/content-moderator/image-moderation-api)、[儿童性虐待材料检测(Child Sexual Abuse Material, CSAM)](https://www.microsoft.com/en-us/photodna?oneroute=true)、个人可识别信息(Personally Identifiable Information, PII)检测(含[文本](https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-pii-detection)与[图像](https://cloud.google.com/sensitive-data-protection/docs/redacting-sensitive-data-images))以及人脸检测工具([1](https://cloud.google.com/vision/docs/detecting-faces)、[2](https://github.com/ageitgey/face_recognition)),以移除所有不适宜图像、包含个人可识别信息的图像/文本,以及含有人脸的图像。但此类检测工具并非完美,因此数据集中仍可能存在此类图像。
## 相关链接
- **代码仓库:** https://github.com/lm-sys/FastChat
- **学术论文:** https://arxiv.org/abs/2412.08687
- **参与投票!** https://lmarena.ai/
## 数据集结构
<!-- 本部分描述数据集字段,以及数据集划分标准、数据点间关联等结构相关信息 -->
* **model_a**、**model_b**:左右对话栏对应的模型身份,用户完成投票后才会得知具体模型信息。
* **images**:图像(注:本字段仅适用于单图像对话)
* **conversation_a**、**conversation_b**:与对应模型的完整对话历史
* **winner**:用户偏好的回复模型,可选值为"model_a"、"model_b"、"tie"(平局)以及"tie (bothbad)"(双方回复均不合格)
* **judge**:基于用户IP地址生成的用户ID哈希值
* **categories**:问题类别标签(注:单个问题可归属多个类别)
* **num_turns**:对话轮次数量
* **tstamp**:对话发生的时间戳
* **conv_metadata**:用于计算[风格控制排行榜](https://blog.lmarena.ai/blog/2024/style-control/)的风格特征元数据,涵盖对话长度、Markdown标题数量、列表数量、加粗词汇数量等。
## 偏差、风险与局限性
本基准数据集旨在衡量人类主观偏好,而非直接评估模型回复的事实准确性。
数据集中包含大量STEM相关问题、光学字符识别任务以及图像描述等通用类问题,而涉及STEM以外专业领域的问题相对较少。
**若您发现本数据集中包含您的人脸或个人信息并希望移除,或发现仇恨性、不适宜内容,请发送邮件至lmarena.ai@gmail.com或lisabdunlap@berkeley.edu与我们联系。**
**BibTeX引用格式:**
@article{chou2024visionarena,
title={VisionArena: 230K Real World User-VLM Conversations with Preference Labels},
author={Christopher Chou and Lisa Dunlap and Koki Mashita and Krishna Mandal and Trevor Darrell and Ion Stoica and Joseph E. Gonzalez and Wei-Lin Chiang},
year={2024},
eprint={2412.08687},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.08687},
}
## LMArena VisionArena数据集许可协议
本协议规定了您访问和使用LMArena VisionArena数据集(定义见上文)的条款与条件。若您不接受本协议,则不得使用LMArena VisionArena数据集。通过点击接受、访问LMArena VisionArena数据集,或同时进行上述两项操作,即视为您同意受本协议条款约束。若您代表雇主或其他实体签署本协议,您声明并保证您拥有充分的法律权限以约束该雇主或实体遵守本协议。若您不具备上述权限,则不得代表雇主或其他实体签署本协议或访问LMArena VisionArena数据集。
* **安全与审核:** 本数据集包含可能被视为冒犯性或令人不安的不安全对话。用户在将本数据集用于训练对话智能体前,应采取适当的过滤与安全措施。
* **非背书声明:** 本数据集中呈现的观点与看法,不代表参与数据采集的研究人员或附属机构的立场。
* **合规要求:** 您必须遵守所有相关法律法规使用本数据集。
* **模型专属条款:** 当使用特定模型的直接输出结果时,用户必须遵守该模型对应的使用条款。
* **非识别要求:** 您不得尝试识别数据集中个人的身份,或推断其中包含的任何敏感个人信息。
* **禁止转让:** 您不得向任何第三方分发、复制、披露、转让、再许可、嵌入、托管或以其他方式转移本数据集。
* **删除请求权:** 我们可随时要求您删除所有由您持有或控制的对话数据集副本(全部或部分)。您应及时遵守所有此类要求。在我们提出要求后,您应向我们提供已遵守该要求的书面确认。
* **协议终止:** 我们可随时以任何理由或无理由终止本协议,通知您后立即生效。协议终止后,您在此获得的许可将立即终止,您应立即停止使用LMArena VisionArena数据集,并销毁所有由您持有或控制的LMArena VisionArena数据集及相关材料的副本。
* **责任限制:** 无论我们是否已被告知存在此类损害的可能性,在任何情况下,我们均不对因本协议或其标的事项产生或与之相关的任何间接、附带、惩戒性、惩罚性、特殊或后果性损害(包括利润损失、业务中断或信息丢失造成的损害)承担责任。
* **许可授予:** 在您遵守本协议条款与条件的前提下,我们授予您有限的、非排他的、不可转让的、不可再许可的许可,允许您为研究与商业目的,使用LMArena VisionArena数据集(包括对话数据与标注信息)来研究、开发并改进软件、算法、机器学习模型、技术与工具。
提供机构:
maas
创建时间:
2025-04-21



