unlearning-cleanslate/fsid-curated-nemotron-9b

Name: unlearning-cleanslate/fsid-curated-nemotron-9b
Creator: unlearning-cleanslate
Published: 2026-04-28 21:46:31
License: 暂无描述

Hugging Face2026-04-28 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/unlearning-cleanslate/fsid-curated-nemotron-9b

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含多个配置，用于评估语言模型对文本内容的记忆和遗忘情况。主要配置包括：forget配置，包含请求ID、内容标题、前缀、后缀、记忆分数和规则名称等特征，用于分析模型在特定窗口下的记忆行为；forget_pool配置，包含歌词、内容创作者、年份和记忆分数等，可能涉及音乐或文本内容的记忆分析；retain配置，包含文本和规则名称，用于评估模型对文本的记忆保留；retain_pool配置，包含文本长度、窗口统计、记忆分数和详细窗口信息，如日志概率和目标令牌，用于深入分析模型记忆性能。数据集分割包括baseline、bm25_10B、bm25_6T和igm_10B等，适用于不同实验设置。

This dataset includes multiple configurations for evaluating the memorization and forgetting of text content by language models. Key configurations are: forget configuration, featuring request ID, content title, prefix, suffix, memorized fraction, and rule name, used to analyze model memorization behavior in specific windows; forget_pool configuration, including lyrics, content creators, year, and memorized fraction, possibly related to memory analysis of music or textual content; retain configuration, with text and rule name, for assessing model retention of text; retain_pool configuration, containing text length, window statistics, memorized fraction, and detailed window information such as log probabilities and target tokens, for in-depth analysis of model memorization performance. Dataset splits include baseline, bm25_10B, bm25_6T, and igm_10B, suitable for various experimental setups.

提供机构：

unlearning-cleanslate

5,000+

优质数据集

54 个

任务类型

进入经典数据集