unlearning-cleanslate/fsid-curated-nemotron-9b-target-100
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/fsid-curated-nemotron-9b-target-100
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个配置:forget、forget_pool、retain和retain_pool。每个配置具有不同的特征和分割。forget配置包括request_id、content_id、content_title、window_idx、prefix、suffix、memorized_fraction和rule_name等特征,分割包括baseline、bm25_10B、bm25_6T和igm_10B。forget_pool配置包括content_id、content_title、content_creators、content_year、lyrics、memorized_fraction、max_p_z、num_windows、memorized_windows、source_dataset和pool_bin等特征,分割为train。retain配置包括text和rule_name,分割与forget类似。retain_pool配置包括大量与文本分析和记忆相关的特征,分割为train。数据集的具体用途或内容未直接描述,但可以从特征和配置中推断其可能涉及文本记忆和分析。
The dataset includes multiple configurations: forget, forget_pool, retain, and retain_pool. Each configuration has distinct features and splits. The forget configuration features include request_id, content_id, content_title, window_idx, prefix, suffix, memorized_fraction, and rule_name, with splits like baseline, bm25_10B, bm25_6T, and igm_10B. The forget_pool configuration includes content_id, content_title, content_creators, content_year, lyrics, memorized_fraction, max_p_z, num_windows, memorized_windows, source_dataset, and pool_bin, with a train split. The retain configuration includes text and rule_name, with splits similar to forget. The retain_pool configuration includes numerous features related to text analysis and memorization, with a train split. The datasets specific purpose or content is not directly described, but it can be inferred from the features and configurations that it likely involves text memorization and analysis.
提供机构:
unlearning-cleanslate



