O1-OPEN/OpenO1-SFT-Ultra
收藏Hugging Face2025-03-06 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/O1-OPEN/OpenO1-SFT-Ultra
下载链接
链接失效反馈官方服务:
资源简介:
openo1-sft-ultra-35m-data数据集包含3500万个数据点,基于现有的开源数据集,并使用openo1-qwen-sft模型进行合成。数据通过qwen-2.5-72b-instruct模型根据难度、质量和问题类型进行标注,确保难度和质量均≥8的数据被保留。数据格式包括数据ID、原始查询、详细思考过程的长COT响应、数据来源、问题难度、数据质量、真实答案、问题长度、响应长度、数据主题类别和答案类型注释。数据来源包括WebInstructFull、homework、infinity-instruct、math-stack-exchange、MathInstruct和mcq。
The openo1-sft-ultra-35m-data dataset contains 35 million data points, synthesized based on existing open-source datasets using the openo1-qwen-sft model. The data was annotated using the qwen-2.5-72b-instruct model, filtered based on difficulty, quality, and question types, retaining only data where both difficulty and quality are ≥8. The data format includes data ID, original data query, long COT response, data source, question difficulty, data quality, ground truth answer of the data, question length, answer length, data topic category (Math, Code, Reasoning), answer type annotation, and test cases included in the code. Data sources include WebInstructFull, homework, infinity-instruct, math-stack-exchange, MathInstruct, and mcq.
提供机构:
O1-OPEN



