unlearning-cleanslate/eval-olmo-3-7b-simnpo-gentle-bm25-10b
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/eval-olmo-3-7b-simnpo-gentle-bm25-10b
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于评估语言模型记忆行为的数据集,包含文本片段的统计特征,如文本长度、窗口数量、记忆窗口比例、覆盖率、概率分布指标(如最大、平均、中位数、最小和标准差概率),以及最佳窗口的详细信息(如索引、概率、种子、目标文本、起始结束字符)。数据集还包括内容元数据(如ID、标题、创作者、年份)和每个窗口的详细属性(如是否记忆、对数概率、目标标记数量、目标概率和排名列表)。数据集旨在分析语言模型对文本内容的记忆程度和概率特性,适用于模型评估和研究任务。
This is a dataset for evaluating language model memorization behavior, containing statistical features of text fragments such as text length, number of windows, memorized window count, coverage, probability distribution metrics (e.g., max, mean, median, min, and standard deviation probabilities), and details of the best window (e.g., index, probability, seed, target text, start and end characters). It also includes content metadata (e.g., ID, title, creators, year) and detailed attributes for each window (e.g., memorization status, log probability, number of target tokens, target probabilities, and rank lists). The dataset is designed to analyze the extent and probability characteristics of language model memorization of text content, suitable for model evaluation and research tasks.
提供机构:
unlearning-cleanslate



