five

arena-preferences

收藏
魔搭社区2025-10-09 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/mlabonne/arena-preferences
下载链接
链接失效反馈
官方服务:
资源简介:
# ⚔️ Arena Preferences This is a preference dataset based on [lmsys/chatbot_arena_conversations](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations). It contains multi-turn conversations (up to 11 turns) and original samples in 39 different languages (no translation). - Chosen answers are answers where GPT-4 was the winner (33k => 2,868 samples) - Duplicates were removed (13 samples) - GPTisms were removed (166 samples) ## 📊 Plots Here's breakdown of the four most represented languages + an "other" bin in the dataset. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/IDyos4l_oN5RjtRliAtLx.png) Here's the distribution of the number of turns in the conversations. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/2lKccVxSFujRF-_fjW7NS.png)

# ⚔️ 竞技场偏好(Arena Preferences) 本数据集为基于[lmsys/chatbot_arena_conversations](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)构建的偏好数据集。 该数据集包含至多11轮的多轮对话,以及覆盖39种不同语言的原始未翻译样本。 - 入选回复均为GPT-4获胜的对话结果(原始33k样本经筛选后得到2868条) - 已移除重复样本(共计13条) - 已移除GPT式表达(GPTisms)(共计166条) ## 📊 可视化图表 本数据集包含四种占比最高的语言分布及「其他」汇总类别的统计细分。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/IDyos4l_oN5RjtRliAtLx.png) 以下为对话轮次数量的分布情况: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/2lKccVxSFujRF-_fjW7NS.png)
提供机构:
maas
创建时间:
2025-03-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作