arekborucki/OpenMementos

Name: arekborucki/OpenMementos
Creator: arekborucki
Published: 2026-04-22 12:53:36
License: 暂无描述

Hugging Face2026-04-22 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/arekborucki/OpenMementos

下载链接

链接失效反馈

官方服务：

资源简介：

OpenMementos-228K是一个包含228,557条推理轨迹的数据集，这些轨迹带有块分割和压缩摘要（mementos）的注释，源自OpenThoughts-v3。Memento是一个框架，用于教导语言模型在长形式推理过程中管理自己的上下文。该数据集通过将推理分割成块，并将每个块压缩成密集摘要（memento），使模型能够仅从mementos继续推理。数据集包含两个子集：default（训练就绪数据）和full（包含管道组件）。数据集统计信息包括示例数量、领域分布、块和摘要的统计信息、摘要质量等。数据来源包括OpenThoughts-v3中的多个子集。数据管道包括句子分割、边界评分、块分割、摘要生成和迭代细化。数据集可用于微调模型以进行memento风格的生成。

OpenMementos-228K is a dataset of 228,557 reasoning traces annotated with block segmentation and compressed summaries (mementos), derived from OpenThoughts-v3. Memento is a framework for teaching language models to manage their own context during long-form reasoning. The dataset enables models to segment their reasoning into blocks, compress each block into a dense summary (a memento), and continue reasoning from mementos alone. It includes two subsets: default (training-ready data) and full (with pipeline components). Dataset statistics cover the number of examples, domain distribution, block and summary statistics, summary quality, etc. Data sources include multiple subsets from OpenThoughts-v3. The data pipeline involves sentence splitting, boundary scoring, block segmentation, summary generation, and iterative refinement. The dataset can be used to fine-tune models for memento-style generation.

提供机构：

arekborucki

5,000+

优质数据集

54 个

任务类型

进入经典数据集