codelion/synth-1B
收藏Hugging Face2025-11-11 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/codelion/synth-1B
下载链接
链接失效反馈官方服务:
资源简介:
synth-1B数据集是从PleIAs/SYNTH数据集中顺序采样的前999,997,890个标记组成的。这个数据集包含了822,230个文档,每个文档由四个字段组成:查询(query)、查询种子文本(query_seed_text)、合成推理(synthetic_reasoning)和合成答案(synthetic_answer)。这些字段通过双新行符连接,形成用于训练的全面示例。
The synth-1B dataset is a sequential sample of the first 999,997,890 tokens from the PleIAs/SYNTH dataset. It contains 822,230 documents, each consisting of four fields: query, query_seed_text, synthetic_reasoning, and synthetic_answer. These fields are concatenated with double newlines to create comprehensive training examples.
提供机构:
codelion



