five

allenai/big-reasoning-traces

收藏
Hugging Face2025-04-01 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/allenai/big-reasoning-traces
下载链接
链接失效反馈
官方服务:
资源简介:
DeepSeek是一个大规模的推理轨迹数据集,适用于中期训练/退火前的实验。数据集包含大约25亿个标记,使用了OLMo 2分词器。它由多个来源的数据集组合而成,包括GeneralThought-430K、OpenThoughts-114k和OpenR1-Math-220k,这些来源提供了大量的示例和标记。数据集分为训练集,其中DeepSeek配置的训练集有676665个示例,DeepSeek_debug配置的训练集有300个示例。

DeepSeek is a large-scale reasoning trace dataset designed for experiments with midtraining/annealing. The dataset contains approximately 2.5 billion tokens and is tokenized using the OLMo 2 tokenizer. It is compiled from multiple source datasets including GeneralThought-430K, OpenThoughts-114k, and OpenR1-Math-220k, which together provide a substantial number of examples and tokens. The dataset is split into a training set, with the DeepSeek configuration having 676,665 examples and the DeepSeek_debug configuration having 300 examples.
提供机构:
allenai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作