five

arekborucki/OpenMementos

收藏
Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/arekborucki/OpenMementos
下载链接
链接失效反馈
官方服务:
资源简介:
OpenMementos-228K是一个包含228,557条推理轨迹的数据集,这些轨迹带有块分割和压缩摘要(mementos)的注释,源自OpenThoughts-v3。Memento是一个框架,用于教导语言模型在长形式推理过程中管理自己的上下文。该数据集通过将推理分割成块,并将每个块压缩成密集摘要(memento),使模型能够仅从mementos继续推理。数据集包含两个子集:default(训练就绪数据)和full(包含管道组件)。数据集统计信息包括示例数量、领域分布、块和摘要的统计信息、摘要质量等。数据来源包括OpenThoughts-v3中的多个子集。数据管道包括句子分割、边界评分、块分割、摘要生成和迭代细化。数据集可用于微调模型以进行memento风格的生成。

OpenMementos-228K is a dataset of 228,557 reasoning traces annotated with block segmentation and compressed summaries (mementos), derived from OpenThoughts-v3. Memento is a framework for teaching language models to manage their own context during long-form reasoning. The dataset enables models to segment their reasoning into blocks, compress each block into a dense summary (a memento), and continue reasoning from mementos alone. It includes two subsets: default (training-ready data) and full (with pipeline components). Dataset statistics cover the number of examples, domain distribution, block and summary statistics, summary quality, etc. Data sources include multiple subsets from OpenThoughts-v3. The data pipeline involves sentence splitting, boundary scoring, block segmentation, summary generation, and iterative refinement. The dataset can be used to fine-tune models for memento-style generation.
提供机构:
arekborucki
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作