unlearning-cleanslate/fsid-curated-nemotron-9b
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/fsid-curated-nemotron-9b
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个配置,用于评估语言模型对文本内容的记忆和遗忘情况。主要配置包括:forget配置,包含请求ID、内容标题、前缀、后缀、记忆分数和规则名称等特征,用于分析模型在特定窗口下的记忆行为;forget_pool配置,包含歌词、内容创作者、年份和记忆分数等,可能涉及音乐或文本内容的记忆分析;retain配置,包含文本和规则名称,用于评估模型对文本的记忆保留;retain_pool配置,包含文本长度、窗口统计、记忆分数和详细窗口信息,如日志概率和目标令牌,用于深入分析模型记忆性能。数据集分割包括baseline、bm25_10B、bm25_6T和igm_10B等,适用于不同实验设置。
This dataset includes multiple configurations for evaluating the memorization and forgetting of text content by language models. Key configurations are: forget configuration, featuring request ID, content title, prefix, suffix, memorized fraction, and rule name, used to analyze model memorization behavior in specific windows; forget_pool configuration, including lyrics, content creators, year, and memorized fraction, possibly related to memory analysis of music or textual content; retain configuration, with text and rule name, for assessing model retention of text; retain_pool configuration, containing text length, window statistics, memorized fraction, and detailed window information such as log probabilities and target tokens, for in-depth analysis of model memorization performance. Dataset splits include baseline, bm25_10B, bm25_6T, and igm_10B, suitable for various experimental setups.
提供机构:
unlearning-cleanslate



