arena-human-preference-140k

Name: arena-human-preference-140k
Creator: maas
Published: 2025-11-27 16:42:56
License: 暂无描述

魔搭社区2025-11-27 更新2025-08-02 收录

下载链接：

https://modelscope.cn/datasets/lmarena-ai/arena-human-preference-140k

下载链接

链接失效反馈

官方服务：

资源简介：

### Overview This dataset contains user votes collected in the text-only category. Each row represents a single vote judging two models (model_a and model_b) on a user conversation, along with the full conversation history and metadata. Key fields include: - `id`: Unique feedback ID of each vote/row. - `evaluation_session_id`: Unique ID of each evaluation session, which can contain multiple separate votes/evaluations. - `evaluation_order`: Evaluation order of the current vote. - `winner`: Battle result containing either model_a, model_b, tie, or both_bad. - `conversation_a/conversation_b`: Full conversation of the current evaluation order. - `full_conversation`: The entire conversation, including context prompts and answers from all previous evaluation orders. Note that after each vote new models are sampled, thus the responding models vary across the full context. - `conv_metadata`: Aggregated markdown and token counts for style control. - `category_tag`: Annotation tags including the categories math, creative writing, hard prompts, and instruction following. - `is_code`: Whether the conversation involves code. ### License User prompts are licensed under CC-BY-4.0, and model outputs are governed by the terms of use set by the respective model providers.

### 数据集概览本数据集收录了纯文本类别下的用户投票数据。每一行对应一次单轮投票：针对一段用户对话，对两个模型（model_a与model_b）进行性能评判，并附带完整对话历史与元数据。核心字段包括： - `id`：单条投票或数据行的唯一反馈标识。 - `evaluation_session_id`：单次评估会话的唯一标识，单个会话可包含多组独立投票或评测任务。 - `evaluation_order`：当前投票的评估序号。 - `winner`：对战结果，可选值包括model_a、model_b、平局（tie）或双方均不合格（both_bad）。 - `conversation_a`/`conversation_b`：当前评估序号下的完整对话内容。 - `full_conversation`：完整对话文本，包含上下文提示词与所有过往评估序号对应的模型回复。需注意：每次投票后都会重新采样模型，因此完整上下文内的响应模型存在差异。 - `conv_metadata`：用于风格控制的聚合Markdown与Token计数元数据。 - `category_tag`：标注标签集合，涵盖数学（math）、创意写作（creative writing）、硬核提示（hard prompts）与指令遵循（instruction following）等类别。 - `is_code`：标识该对话是否涉及代码的布尔字段。 ### 授权协议用户提示内容采用CC-BY-4.0协议进行授权，模型输出内容则受对应模型提供商的使用条款约束。

提供机构：

maas

创建时间：

2025-08-01

搜集汇总

数据集介绍