unlearning-cleanslate/eval-simnpo_olmo-3-1025-7b_20260428_064411-debug-post-olmo
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/eval-simnpo_olmo-3-1025-7b_20260428_064411-debug-post-olmo
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含4663个训练样本,总大小约2.74 GB,用于评估文本记忆化或语言模型性能。特征包括文本长度字符数、窗口数量、记忆化窗口数量、记忆化比例、覆盖率,以及基于概率统计的指标(如最大、平均、中位数、最小和标准差的p_z值)。还包含最佳窗口的索引、概率、种子、目标文本、起始和结束字符位置。评估模型、窗口大小、步长和评估阈值等参数也被记录。每个样本有详细窗口列表,包括结束字符、索引、是否记忆化、对数概率、目标令牌数量、p_z值、种子、起始字符、目标文本、目标对数概率列表和目标排名列表。此外,还提供内容ID、标题、创作者和年份信息。
This dataset contains 4663 training examples with a total size of approximately 2.74 GB, designed for evaluating text memorization or language model performance. Features include text length in characters, number of windows, memorized windows count, memorized fraction, coverage, and probability-based metrics such as max, mean, median, min, and standard deviation of p_z. It also includes best window index, probability, seed, target text, start and end character positions. Parameters like evaluation model, window size, stride, and evaluation threshold are recorded. Each example has a detailed list of windows with end character, index, is_memorized flag, log probability, number of target tokens, p_z value, seed, start character, target text, target log probabilities list, and target ranks list. Additionally, content ID, title, creators, and year information are provided.
提供机构:
unlearning-cleanslate



