unlearning-cleanslate/eval-olmo-3-7b-simnpo-gentle-bm25-6t
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/eval-olmo-3-7b-simnpo-gentle-bm25-6t
下载链接
链接失效反馈官方服务:
资源简介:
该数据集主要包含与文本记忆相关的特征,用于分析和评估文本的记忆情况。数据集的特征包括文本长度、窗口数量、记忆窗口数量、记忆比例、覆盖率、各种概率统计值(最大、平均、中位数、最小、标准差)、最佳窗口索引及其相关属性(概率、种子、目标、起始和结束字符)、评估模型、窗口大小、步长、评估阈值等。此外,还包含窗口列表的详细信息(如结束字符、索引、是否记忆、对数概率、目标令牌数量等)以及内容ID、标题、创建者和年份等信息。数据集仅包含训练集,大小为2737925176字节,包含4663个示例。
This dataset primarily contains features related to text memorization, designed for analyzing and evaluating the memorization of text. The features include text length, number of windows, number of memorized windows, memorized fraction, coverage, various probability statistics (max, mean, median, min, std), best window index and its related attributes (probability, seed, target, start and end characters), evaluation model, window size, stride, evaluation threshold, etc. Additionally, it includes detailed information about the window list (such as end character, index, is memorized, log probability, number of target tokens, etc.) as well as content ID, title, creators, and year. The dataset only contains a training set, with a size of 2737925176 bytes and 4663 examples.
提供机构:
unlearning-cleanslate



