unlearning-cleanslate/cleanslate_dataset

Name: unlearning-cleanslate/cleanslate_dataset
Creator: unlearning-cleanslate
Published: 2026-04-17 09:51:43
License: 暂无描述

Hugging Face2026-04-17 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/unlearning-cleanslate/cleanslate_dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default data_files: - split: train path: data/train-* - config_name: qa_benchmark data_files: - split: train path: qa_benchmark/train-* dataset_info: config_name: qa_benchmark features: - name: content_id dtype: string - name: content_title dtype: string - name: question dtype: string - name: answer dtype: string splits: - name: train num_bytes: 2252421 num_examples: 12088 download_size: 939939 dataset_size: 2252421 --- # CleanSlate Dataset Core content corpus for the CleanSlate memorization evaluation framework. ## Schema | Column | Type | Description | |---|---|---| | `content_id` | string | Stable hash ID for the content item | | `content_title` | string | Title | | `content_creators` | string | Artist / author | | `content_year` | int64 | Release / publication year | | `reference_target` | string | Full text of the content | Single `default` config, single `train` split. ## Usage ```python from datasets import load_dataset ds = load_dataset("unlearning-cleanslate/cleanslate_dataset", split="train") ``` ## Related - Previous schema (with memorization metadata, QA pairs, and cluster IDs): [`cleanslate_dataset_deprecated`](https://huggingface.co/datasets/unlearning-cleanslate/cleanslate_dataset_deprecated). - Framework: [github.com/akhatua2/CleanSlate](https://github.com/akhatua2/CleanSlate)

提供机构：

unlearning-cleanslate

5,000+

优质数据集

54 个

任务类型

进入经典数据集