five

OpenDataArena/ODA-Mixture-500k

收藏
Hugging Face2026-04-27 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/OpenDataArena/ODA-Mixture-500k
下载链接
链接失效反馈
官方服务:
资源简介:
ODA-Mixture-500k是一个大规模通用后训练数据集,由多个高性能开放语料库(通过OpenDataArena排行榜选择)经过去重和基准去污染处理后构建而成。数据集包含约500K样本,覆盖数学、代码、推理和通用领域。数据格式为问题→解决方案(推理轨迹)→最终答案。该数据集的目标是通过约500K样本的精选数据集,在数学、代码、推理等多个领域实现最大化的通用性能提升。

ODA-Mixture-500k is a large-scale general-purpose post-training dataset curated from top-performing open corpora (selected via the OpenDataArena leaderboard) and refined through deduplication, benchmark decontamination. The dataset contains approximately 500K samples covering domains such as Math, Code, Reasoning, and General. The format is Problem → Solution (reasoning trace) → Final answer. The goal is to achieve maximum general-purpose performance gains across various domains (Math, Code, Reasoning, etc.) using a curated dataset of ~500K samples.
提供机构:
OpenDataArena
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作