SVECTOR-CORPORATION/ThinkChain-20M
收藏Hugging Face2025-03-27 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/SVECTOR-CORPORATION/ThinkChain-20M
下载链接
链接失效反馈官方服务:
资源简介:
SVECTOR-CORPORATION/ThinkChain-20M是一个合成推理数据集,包含超过2200万条使用Spec-T1生成的通用推理问题和回答。该数据集覆盖了社会科学、自然科学、教育、创意写作和一般对话等多个非代码/数学领域的主题。数据集的总行数超过2200万行,总标记数达到3580亿个。可以使用该数据集通过监督微调(SFT)来精细调整更小、更高效的模型,模拟大型模型如Spec-T1的推理能力。
The SVECTOR-CORPORATION/ThinkChain-20M is a synthetic reasoning dataset containing over 22 million general reasoning questions and responses generated using Spec-T1. It covers a wide range of non-code/math topics such as social and natural sciences, education, creative writing, and general conversations. The dataset has a total of over 22 million rows and 35.8 billion tokens. It can be used to fine-tune smaller, more efficient models to mimic the reasoning capabilities of larger models like Spec-T1 through supervised fine-tuning (SFT).
提供机构:
SVECTOR-CORPORATION



