five

unlearning-cleanslate/eval-01-llama-3_1-8b-simnpo-gentle-baseline-target-100-checkpoint-861

收藏
Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/eval-01-llama-3_1-8b-simnpo-gentle-baseline-target-100-checkpoint-861
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是一个用于分析文本记忆行为的数据集,包含通过滑动窗口方法对文本进行记忆评估的详细特征。数据集记录了文本的字符长度、窗口数量、记忆窗口数量、记忆比例、覆盖率,以及各种概率统计指标(如最大、平均、中位数、最小、标准差概率值)。还包括最佳窗口的索引、概率、种子文本、目标文本、起始和结束字符位置。评估模型、窗口大小、步长和评估阈值等参数也被记录。每个窗口的详细信息包括结束字符位置、索引、是否被记忆、对数概率、目标令牌数量、概率值、种子文本、起始字符位置、目标文本、目标对数概率列表和目标排名列表。此外,数据集包含内容元数据,如内容ID、标题、创作者和年份。数据集仅包含训练集,共有4663个样本,总大小约为2.67GB。

This dataset is designed for analyzing text memorization behavior, containing detailed features for memory evaluation of text through sliding window methods. It records text character length, number of windows, memorized windows count, memorization fraction, coverage, and various probability statistics (e.g., maximum, mean, median, minimum, standard deviation probability values). It also includes the best windows index, probability, seed text, target text, start and end character positions. Parameters such as evaluation model, window size, stride, and evaluation threshold are documented. Detailed information for each window includes end character position, index, memorization status, log probability, target token count, probability value, seed text, start character position, target text, target log probabilities list, and target ranks list. Additionally, the dataset contains content metadata like content ID, title, creators, and year. The dataset only includes a training split with 4663 examples and a total size of approximately 2.67GB.
提供机构:
unlearning-cleanslate
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作