chatbot_arena_conversations

Name: chatbot_arena_conversations
Creator: maas
Published: 2026-05-23 18:40:56
License: 暂无描述

魔搭社区2026-05-23 更新2025-03-08 收录

下载链接：

https://modelscope.cn/datasets/lmsys/chatbot_arena_conversations

下载链接

链接失效反馈

官方服务：

资源简介：

## Chatbot Arena Conversations Dataset This dataset contains 33K cleaned conversations with pairwise human preferences. It is collected from 13K unique IP addresses on the [Chatbot Arena](https://lmsys.org/blog/2023-05-03-arena/) from April to June 2023. Each sample includes a question ID, two model names, their full conversation text in OpenAI API JSON format, the user vote, the anonymized user ID, the detected language tag, the OpenAI moderation API tag, the additional toxic tag, and the timestamp. To ensure the safe release of data, we have made our best efforts to remove all conversations that contain personally identifiable information (PII). User consent is obtained through the "Terms of use" section on the data collection website. In addition, we have included the OpenAI moderation API output to flag inappropriate conversations. However, we have chosen to keep unsafe conversations intact so that researchers can study the safety-related questions associated with LLM usage in real-world scenarios as well as the OpenAI moderation process. As an example, we included additional toxic tags that are generated by our own toxic tagger, which are trained by fine-tuning T5 and RoBERTa on manually labeled data. **Basic Statistics** | Key | Value | | --- | --- | | # Conversations | 33,000 | | # Models | 20 | | # Users | 13,383 | | # Languages | 96 | | Avg. # Turns per Sample | 1.2 | | Avg. # Tokens per Prompt | 52.3 | | Avg. # Tokens per Response | 189.5 | ## Uniqueness and Potential Usage Compared to existing human preference datasets like [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf), and [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1). This dataset - Contains the outputs of 20 LLMs including stronger LLMs such as GPT-4 and Claude-v1. It also contains many failure cases of these state-of-the-art models. - Contains unrestricted conversations from over 13K users in the wild. We believe it will help the AI research community answer important questions around topics like: - Characteristics and distributions of real-world user prompts - Training instruction-following models - Improve and evaluate LLM evaluation methods - Model selection and request dispatching algorithms - AI safety and content moderation ## Disclaimers and Terms - **This dataset contains conversations that may be considered unsafe, offensive, or upsetting.** It is not intended for training dialogue agents without applying appropriate filtering measures. We are not responsible for any outputs of the models trained on this dataset. - Statements or opinions made in this dataset do not reflect the views of researchers or institutions involved in the data collection effort. - Users of this data are responsible for ensuring its appropriate use, which includes abiding by any applicable laws and regulations. - Users of this data should adhere to the terms of use for a specific model when using its direct outputs. - Users of this data agree to not attempt to determine the identity of individuals in this dataset. ## Visualization and Elo Rating Calculation This Colab [notebook](https://colab.research.google.com/drive/1J2Wf7sxc9SVmGnSX_lImhT246pxNVZip?usp=sharing) provides some visualizations and shows how to compute Elo ratings with the dataset. ## License The user prompts are licensed under CC-BY-4.0, while the model outputs are licensed under CC-BY-NC-4.0. ## Citation ``` @misc{zheng2023judging, title={Judging LLM-as-a-judge with MT-Bench and Chatbot Arena}, author={Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric. P Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica}, year={2023}, eprint={2306.05685}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

This dataset contains 33K cleaned dialogues paired with human preferences. Data was collected from 13K unique IP addresses on Chatbot Arena between April and June 2023. Each sample includes a question ID, two model names, their full dialogue texts in OpenAI API JSON format, user votes, anonymous user IDs, detected language tags, OpenAI Moderation API labels, additional toxicity tags, and timestamps.

提供机构：

maas

创建时间：

2025-11-19

搜集汇总

数据集介绍