five

chatbot_arena_conversations

收藏
魔搭社区2026-05-23 更新2025-03-08 收录
下载链接:
https://modelscope.cn/datasets/lmsys/chatbot_arena_conversations
下载链接
链接失效反馈
官方服务:
资源简介:
## Chatbot Arena Conversations Dataset This dataset contains 33K cleaned conversations with pairwise human preferences. It is collected from 13K unique IP addresses on the [Chatbot Arena](https://lmsys.org/blog/2023-05-03-arena/) from April to June 2023. Each sample includes a question ID, two model names, their full conversation text in OpenAI API JSON format, the user vote, the anonymized user ID, the detected language tag, the OpenAI moderation API tag, the additional toxic tag, and the timestamp. To ensure the safe release of data, we have made our best efforts to remove all conversations that contain personally identifiable information (PII). User consent is obtained through the "Terms of use" section on the data collection website. In addition, we have included the OpenAI moderation API output to flag inappropriate conversations. However, we have chosen to keep unsafe conversations intact so that researchers can study the safety-related questions associated with LLM usage in real-world scenarios as well as the OpenAI moderation process. As an example, we included additional toxic tags that are generated by our own toxic tagger, which are trained by fine-tuning T5 and RoBERTa on manually labeled data. **Basic Statistics** | Key | Value | | --- | --- | | # Conversations | 33,000 | | # Models | 20 | | # Users | 13,383 | | # Languages | 96 | | Avg. # Turns per Sample | 1.2 | | Avg. # Tokens per Prompt | 52.3 | | Avg. # Tokens per Response | 189.5 | ## Uniqueness and Potential Usage Compared to existing human preference datasets like [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf), and [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1). This dataset - Contains the outputs of 20 LLMs including stronger LLMs such as GPT-4 and Claude-v1. It also contains many failure cases of these state-of-the-art models. - Contains unrestricted conversations from over 13K users in the wild. We believe it will help the AI research community answer important questions around topics like: - Characteristics and distributions of real-world user prompts - Training instruction-following models - Improve and evaluate LLM evaluation methods - Model selection and request dispatching algorithms - AI safety and content moderation ## Disclaimers and Terms - **This dataset contains conversations that may be considered unsafe, offensive, or upsetting.** It is not intended for training dialogue agents without applying appropriate filtering measures. We are not responsible for any outputs of the models trained on this dataset. - Statements or opinions made in this dataset do not reflect the views of researchers or institutions involved in the data collection effort. - Users of this data are responsible for ensuring its appropriate use, which includes abiding by any applicable laws and regulations. - Users of this data should adhere to the terms of use for a specific model when using its direct outputs. - Users of this data agree to not attempt to determine the identity of individuals in this dataset. ## Visualization and Elo Rating Calculation This Colab [notebook](https://colab.research.google.com/drive/1J2Wf7sxc9SVmGnSX_lImhT246pxNVZip?usp=sharing) provides some visualizations and shows how to compute Elo ratings with the dataset. ## License The user prompts are licensed under CC-BY-4.0, while the model outputs are licensed under CC-BY-NC-4.0. ## Citation ``` @misc{zheng2023judging, title={Judging LLM-as-a-judge with MT-Bench and Chatbot Arena}, author={Lianmin Zheng and Wei-Lin Chiang and Ying Sheng and Siyuan Zhuang and Zhanghao Wu and Yonghao Zhuang and Zi Lin and Zhuohan Li and Dacheng Li and Eric. P Xing and Hao Zhang and Joseph E. Gonzalez and Ion Stoica}, year={2023}, eprint={2306.05685}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

This dataset contains 33K cleaned dialogues paired with human preferences. Data was collected from 13K unique IP addresses on Chatbot Arena between April and June 2023. Each sample includes a question ID, two model names, their full dialogue texts in OpenAI API JSON format, user votes, anonymous user IDs, detected language tags, OpenAI Moderation API labels, additional toxicity tags, and timestamps.
提供机构:
maas
创建时间:
2025-11-19
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集包含33K条清洗过的对话数据,来自13K独立IP地址在Chatbot Arena的互动,涵盖20种LLM的输出和用户偏好,适用于LLM评估、安全研究等多元场景。数据特别包含先进模型的失败案例和真实用户的不受限对话,为AI研究提供丰富资源。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作