unlearning-cleanslate/eval-04-gemma-3-12b-simnpo-baseline-target-100-checkpoint-2838
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/eval-04-gemma-3-12b-simnpo-baseline-target-100-checkpoint-2838
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含文本内容分析相关数据,用于评估语言模型对文本窗口的记忆性能。特征包括文本长度字符数、窗口数量、记忆窗口数、记忆比例、覆盖率,以及概率统计指标如最大、平均、最小和标准差p_z值。此外,还包含最佳窗口的索引、概率、种子、目标文本和字符位置信息,评估模型名称、窗口大小、步长和评估阈值。每个窗口的详细信息如结束字符、索引、是否被记忆、对数概率、目标标记数、p_z值、种子、开始字符、目标文本、目标对数概率列表和目标排名列表。数据集还包括内容ID、标题、创建者和年份。数据分为训练集,包含4663个示例,总大小约2.7GB。
This dataset contains data related to text content analysis, used to evaluate the memorization performance of language models on text windows. Features include text length in characters, number of windows, memorized windows count, memorized fraction, coverage, and probability statistics such as maximum, average, minimum, and standard deviation of p_z values. Additionally, it includes best window information like index, probability, seed, target text, and character positions, along with evaluation model name, window size, stride, and evaluation threshold. Each window provides details such as end character, index, memorization status, log probability, number of target tokens, p_z value, seed, start character, target text, list of target log probabilities, and list of target ranks. The dataset also includes content ID, title, creators, and year. It is split into a training set with 4663 examples and a total size of approximately 2.7GB.
提供机构:
unlearning-cleanslate



