five

vision-arena-bench-v0.1

收藏
魔搭社区2025-11-28 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/lmarena-ai/vision-arena-bench-v0.1
下载链接
链接失效反馈
官方服务:
资源简介:
![Vision Arena Questions](vision_arena_questions_fig.png) # VisionArena-Bench: An automatic eval pipeline to estimate model preference rankings An automatic benchmark of 500 diverse user prompts that can be used to cheaply approximate [Chatbot Arena](https://lmarena.ai/) model rankings via automatic benchmarking with VLM as a judge. ### Dataset Sources - **Repository:** https://github.com/lm-sys/FastChat - **Paper:** https://arxiv.org/abs/2412.08687 - **Automatic Evaluation Code:** Coming Soon! ## Dataset Structure <!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. --> - question_id: The unique hash representing the id of the question - cluster_name: The name of the topic cluster that this question is from - turns: The content with the question prompt - images: A list of images of size one (single-image) which correspond to the question in the column `turns` ## Bias, Risks, and Limitations This benchmark is designed to measure human preferences rather than explicitly evaluate factual accuracy. This dataset contains a large amount of STEM related questions, OCR tasks, and general problems like captioning. This dataset contains less questions which relate to specialized domains outside of stem. **If you find your face or personal information in this dataset and wish to have it removed, or if you find hateful or inappropriate content,** please contact us at lmarena.ai@gmail.com or lisabdunlap@berkeley.edu. **BibTeX:** ``` @misc{chou2024visionarena, title={VisionArena: 230K Real World User-VLM Conversations with Preference Labels}, author={Christopher Chou and Lisa Dunlap and Koki Mashita and Krishna Mandal and Trevor Darrell and Ion Stoica and Joseph E. Gonzalez and Wei-Lin Chiang}, year={2024}, eprint={2412.08687}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2412.08687}, } ``` ## LMArena VisionArena dataset License Agreement This Agreement contains the terms and conditions that govern your access and use of the LMArena VisionArena dataset (as defined above). You may not use the LMArena VisionArena dataset if you do not accept this Agreement. By clicking to accept, accessing the LMArena VisionArena dataset, or both, you hereby agree to the terms of the Agreement. If you are agreeing to be bound by the Agreement on behalf of your employer or another entity, you represent and warrant that you have full legal authority to bind your employer or such entity to this Agreement. If you do not have the requisite authority, you may not accept the Agreement or access the LMArena VisionArena dataset on behalf of your employer or another entity. * Safety and Moderation: This dataset contains unsafe conversations that may be perceived as offensive or unsettling. User should apply appropriate filters and safety measures before utilizing this dataset for training dialogue agents. * Non-Endorsement: The views and opinions depicted in this dataset do not reflect the perspectives of the researchers or affiliated institutions engaged in the data collection process. * Legal Compliance: You are mandated to use it in adherence with all pertinent laws and regulations. * Model Specific Terms: When leveraging direct outputs of a specific model, users must adhere to its corresponding terms of use. * Non-Identification: You must not attempt to identify the identities of individuals or infer any sensitive personal data encompassed in this dataset. * Prohibited Transfers: You should not distribute, copy, disclose, assign, sublicense, embed, host, or otherwise transfer the dataset to any third party. * Right to Request Deletion: At any time, we may require you to delete all copies of the conversation dataset (in whole or in part) in your possession and control. You will promptly comply with any and all such requests. Upon our request, you shall provide us with written confirmation of your compliance with such requirement. * Termination: We may, at any time, for any reason or for no reason, terminate this Agreement, effective immediately upon notice to you. Upon termination, the license granted to you hereunder will immediately terminate, and you will immediately stop using the LMArena VisionArena dataset and destroy all copies of the LMArena VisionArena dataset and related materials in your possession or control. * Limitation of Liability: IN NO EVENT WILL WE BE LIABLE FOR ANY CONSEQUENTIAL, INCIDENTAL, EXEMPLARY, PUNITIVE, SPECIAL, OR INDIRECT DAMAGES (INCLUDING DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION, OR LOSS OF INFORMATION) ARISING OUT OF OR RELATING TO THIS AGREEMENT OR ITS SUBJECT MATTER, EVEN IF WE HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. * Subject to your compliance with the terms and conditions of this Agreement, we grant to you, a limited, non-exclusive, non-transferable, non-sublicensable license to use the LMArena VisionArena dataset, including the conversation data and annotations, to research, develop, and improve software, algorithms, machine learning models, techniques, and technologies for both research and commercial purposes.

![Vision Arena 问题](vision_arena_questions_fig.png) # VisionArena基准测试:用于估算模型偏好排名的自动评估流程 本基准包含500条多样化的用户提示词,可通过以视觉语言模型(Vision Language Model, VLM)作为评判者的自动基准测试,低成本近似Chatbot Arena(https://lmarena.ai/)的模型偏好排名。 ### 数据集来源 - **仓库地址**:https://github.com/lm-sys/FastChat - **相关论文**:https://arxiv.org/abs/2412.08687 - **自动评估代码**:即将发布! ## 数据集结构 本节将对数据集字段进行说明,并补充介绍数据集结构相关信息,例如划分数据集所依据的标准、数据点间的关联关系等。 - `question_id`:用于标识该问题的唯一哈希值 - `cluster_name`:该问题所属的主题集群名称 - `turns`:包含问题提示词的内容字段 - `images`:长度为1的图像列表(单图像),与`turns`字段中的问题相对应 ## 偏倚、风险与局限性 本基准旨在衡量人类偏好,而非显式评估事实准确性。 本数据集包含大量与科学、技术、工程、数学(STEM)相关的问题、光学字符识别(Optical Character Recognition, OCR)任务以及图像字幕生成等通用问题,涉及STEM以外专业领域的问题较少。 **若您发现本数据集包含您的面部或个人信息并希望移除,或发现仇恨性、不当内容,请联系我们:lmarena.ai@gmail.com 或 lisabdunlap@berkeley.edu。** **BibTeX:** @misc{chou2024visionarena, title="VisionArena:含偏好标签的23万条真实用户-视觉语言模型对话数据", author="Christopher Chou、Lisa Dunlap、Koki Mashita、Krishna Mandal、Trevor Darrell、Ion Stoica、Joseph E. Gonzalez、Wei-Lin Chiang", year={2024}, eprint={2412.08687}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2412.08687}, } ## LMArena VisionArena数据集使用协议 本协议规定了您访问和使用LMArena VisionArena数据集(定义如上)的条款与条件。若您不接受本协议,则不得使用该数据集。通过点击接受、访问LMArena VisionArena数据集,或同时进行上述两项操作,即视为您同意受本协议条款约束。若您代表雇主或其他实体同意受本协议约束,则您声明并保证您拥有充分的法定权限,可代表雇主或该实体受本协议约束。若您不具备前述权限,则您不得代表雇主或其他实体接受本协议或访问LMArena VisionArena数据集。 * 安全与审核:本数据集包含可能被视为冒犯性或令人不安的不安全对话。使用者在将该数据集用于训练对话智能体前,应采取适当的过滤和安全措施。 * 不代表官方观点:本数据集中所呈现的观点和意见,并不反映参与数据收集过程的研究人员或其附属机构的立场。 * 合规使用:您必须遵守所有相关法律法规使用本数据集。 * 模型专属条款:当使用特定模型的直接输出时,使用者必须遵守该模型对应的使用条款。 * 不得识别身份:您不得尝试识别数据集中个人的身份,或推断其中包含的任何敏感个人信息。 * 禁止转让:您不得向任何第三方分发、复制、披露、转让、再许可、嵌入、托管或以其他方式转让本数据集。 * 删除请求权:我们可随时要求您删除所持有或控制的全部或部分对话数据集副本。您应及时遵守所有此类要求。接到我们的请求后,您应向我们提供书面确认,证明您已遵守该要求。 * 协议终止:我们可随时以任何理由或无理由终止本协议,协议终止通知送达您时即刻生效。协议终止后,本协议授予您的许可将立即终止,您应立即停止使用LMArena VisionArena数据集,并销毁您持有或控制的所有该数据集及相关材料副本。 * 责任限制:在任何情况下,我们均不对因本协议或其标的事项产生或与之相关的任何间接、附带、惩戒、惩罚、特殊或衍生性损害承担责任,即使我们已被告知此类损害的可能性。 * 若您遵守本协议的各项条款与条件,我们将授予您一项有限的、非排他性的、不可转让的、不可再许可的许可,允许您出于研究和商业目的,使用LMArena VisionArena数据集(包括对话数据与标注信息)来研究、开发和改进软件、算法、机器学习模型、技术及相关技术。
提供机构:
maas
创建时间:
2025-04-21
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
VisionArena-Bench是一个自动评估基准,包含500个多样化用户提示,通过视觉语言模型作为评判者来近似Chatbot Arena的模型偏好排名。该数据集主要用于衡量人类偏好,包含STEM问题、OCR任务和图像描述等内容,而非评估事实准确性。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作