OpenDataArena/ODA-Mixture-500k
收藏Hugging Face2026-04-27 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/OpenDataArena/ODA-Mixture-500k
下载链接
链接失效反馈官方服务:
资源简介:
ODA-Mixture-500k是一个大规模通用后训练数据集,由多个高性能开放语料库(通过OpenDataArena排行榜选择)经过去重和基准去污染处理后构建而成。数据集包含约500K样本,覆盖数学、代码、推理和通用领域。数据格式为问题→解决方案(推理轨迹)→最终答案。该数据集的目标是通过约500K样本的精选数据集,在数学、代码、推理等多个领域实现最大化的通用性能提升。
ODA-Mixture-500k is a large-scale general-purpose post-training dataset curated from top-performing open corpora (selected via the OpenDataArena leaderboard) and refined through deduplication, benchmark decontamination. The dataset contains approximately 500K samples covering domains such as Math, Code, Reasoning, and General. The format is Problem → Solution (reasoning trace) → Final answer. The goal is to achieve maximum general-purpose performance gains across various domains (Math, Code, Reasoning, etc.) using a curated dataset of ~500K samples.
提供机构:
OpenDataArena



