arena-preferences
收藏魔搭社区2025-10-09 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/mlabonne/arena-preferences
下载链接
链接失效反馈官方服务:
资源简介:
# ⚔️ Arena Preferences
This is a preference dataset based on [lmsys/chatbot_arena_conversations](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations).
It contains multi-turn conversations (up to 11 turns) and original samples in 39 different languages (no translation).
- Chosen answers are answers where GPT-4 was the winner (33k => 2,868 samples)
- Duplicates were removed (13 samples)
- GPTisms were removed (166 samples)
## 📊 Plots
Here's breakdown of the four most represented languages + an "other" bin in the dataset.

Here's the distribution of the number of turns in the conversations.

# ⚔️ 竞技场偏好(Arena Preferences)
本数据集为基于[lmsys/chatbot_arena_conversations](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)构建的偏好数据集。
该数据集包含至多11轮的多轮对话,以及覆盖39种不同语言的原始未翻译样本。
- 入选回复均为GPT-4获胜的对话结果(原始33k样本经筛选后得到2868条)
- 已移除重复样本(共计13条)
- 已移除GPT式表达(GPTisms)(共计166条)
## 📊 可视化图表
本数据集包含四种占比最高的语言分布及「其他」汇总类别的统计细分。

以下为对话轮次数量的分布情况:

提供机构:
maas
创建时间:
2025-03-18



