llm-jp-chatbot-arena-conversations
收藏魔搭社区2025-12-04 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/llm-jp/llm-jp-chatbot-arena-conversations
下载链接
链接失效反馈官方服务:
资源简介:
## LLM-jp Chatbot Arena Conversations Dataset
This dataset contains approximately 1,000 conversations with pairwise human preferences, most of which are in Japanese.
The data was collected during the trial phase of the LLM-jp Chatbot Arena (January–February 2025), where users compared responses from two different models in a head-to-head format.
Each sample includes a question ID, the names of the two models, their conversation transcripts, the user's vote, an anonymized user ID, a detected language tag, OpenAI moderation API output, and a timestamp.
To ensure a safe public release, we made our best effort to remove all conversations containing personally identifiable information (PII).
User consent was obtained via the "Terms of Use" on the data collection site.
We also provide the output of the OpenAI moderation API to help identify potentially inappropriate content.
However, we have retained conversations flagged as unsafe to support research on safety concerns in real-world LLM use and the effectiveness of moderation systems.
## Basic Statistics
| Metric | |
|:-------------|:----|
| # of Samples | 990 |
| # of Models | 10 |
| # of Judges | 200 |
## Disclaimers
- **This dataset includes conversations that may contain sensitive, offensive, or potentially upsetting content.** It is provided to support research on language model behavior, safety, and robustness. When using this dataset for training or evaluation, we strongly encourage the application of appropriate safety measures and content filtering.
- Statements and opinions expressed in the dataset do not represent the views of the researchers or affiliated institutions involved in its creation.
## License
User prompts are licensed under CC BY 4.0, while model outputs are subject to their respective licenses.
## Citation
```
@misc{llm-jp-chatbot-arena-conversations-dataset,
author = {LLM-jp},
title = {LLM-jp Chatbot Arena Conversations Dataset},
year = {2025},
url = {https://huggingface.co/datasets/llm-jp/llm-jp-chatbot-arena-conversations},
}
```
LLM-jp 聊天机器人竞技场对话数据集
本数据集包含约1000组带成对人类偏好标注的对话,其中绝大多数为日语内容。
本数据集采集自LLM-jp聊天机器人竞技场(2025年1月至2月)的试运营阶段,该阶段用户以一对一对决的形式对比两款不同模型生成的回复。
每个样本包含问题ID、两款模型的名称、对话转录文本、用户投票结果、匿名化用户ID、检测到的语言标签、OpenAI审核API输出结果以及时间戳。
为确保安全公开本数据集,我们已尽最大努力移除所有包含个人可识别信息(Personally Identifiable Information, PII)的对话内容。
用户同意均通过数据采集平台的《使用条款》获取。
我们同时提供OpenAI审核API的输出结果,以辅助识别潜在不当内容。
但我们仍保留了被标记为不安全的对话,以支撑针对真实场景下大语言模型(Large Language Model, LLM)使用安全问题及审核系统有效性的研究。
## 基本统计数据
| 指标 | 数值 |
|:-------------|:----|
| 样本总数 | 990 |
| 参与模型数量 | 10 |
| 标注评委数量 | 200 |
## 免责声明
- **本数据集包含的对话可能涉及敏感、冒犯性或令人不适的内容。** 本数据集仅用于支撑语言模型行为、安全性及鲁棒性相关研究。若将本数据集用于模型训练或评估,我们强烈建议采用适当的安全措施与内容过滤手段。
- 本数据集所表达的陈述与观点,不代表参与本数据集制作的研究人员及其所属机构的立场。
## 许可证
用户提示语采用CC BY 4.0许可证授权,而模型生成的回复则遵循其各自所属的许可证条款。
## 引用格式
@misc{llm-jp-chatbot-arena-conversations-dataset,
author = {LLM-jp},
title = {LLM-jp Chatbot Arena Conversations Dataset},
year = {2025},
url = {https://huggingface.co/datasets/llm-jp/llm-jp-chatbot-arena-conversations},
}
提供机构:
maas
创建时间:
2025-11-24



