reasoning-v1-20m

Name: reasoning-v1-20m
Creator: maas
Published: 2026-01-06 16:27:15
License: 暂无描述

魔搭社区2026-01-06 更新2025-03-22 收录

下载链接：

https://modelscope.cn/datasets/glaiveai/reasoning-v1-20m

下载链接

链接失效反馈

官方服务：

资源简介：

![image/png](https://cdn-uploads.huggingface.co/production/uploads/637d41b2bb031d2afee723ae/v0tl4UPoPIp-d0mQGLlX6.png) We are excited to release a synthetic reasoning dataset containing 22mil+ general reasoning questions and responses generated using [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B). While there have been multiple efforts to build open reasoning datasets for math and code tasks, we noticed a lack of large datasets containing reasoning traces for diverse non code/math topics like social and natural sciences, education, creative writing and general conversations, which is why we decided to release this dataset.<br> *Note: Please note that in this instance we have not verified the reasoning traces and answers for accuracy.* **Dataset details:**<br> *Total number of rows:* 22.2 million rows<br> *Total number of tokens:* 35.8 billion tokens The dataset can be used to fine-tune smaller, more efficient models to mimic the reasoning capabilities of larger models like DeepSeek-R1 using SFT. **Response format:** ``` <think> -- reasoning trace -- </think> -- answer -- ``` **Loading the dataset:** ```python from datasets import load_dataset ds = load_dataset("glaiveai/reasoning-v1-20m", split="train") ```

![image/png](https://cdn-uploads.huggingface.co/production/uploads/637d41b2bb031d2afee723ae/v0tl4UPoPIp-d0mQGLlX6.png) 我们很高兴发布一款合成推理数据集，其中包含超过2200万条通用推理问题及对应答案，由[deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B)生成。此前已有多项工作针对数学与代码任务构建开源推理数据集，但我们注意到，目前仍缺乏覆盖社会科学、自然科学、教育学、创意写作及通用对话等多元非代码/非数学主题的带推理轨迹的大型数据集，这也是我们发布本数据集的初衷。 *注：请注意，本次发布的数据集未对推理轨迹与答案的准确性进行验证。* **数据集详情：** *总行数：* 2220万行 *总Token（Token）数：* 358亿Token 本数据集可用于通过监督微调（Supervised Fine-Tuning，SFT）对更小、更高效的模型进行微调，使其具备类似DeepSeek-R1这类大模型的推理能力。 **回复格式：** <think> -- 推理轨迹 -- </think> -- 答案 -- **加载数据集：** python from datasets import load_dataset ds = load_dataset("glaiveai/reasoning-v1-20m", split="train")

提供机构：

maas

创建时间：

2025-03-31

搜集汇总

数据集介绍