unlearning-cleanslate/cleanslate_dataset
收藏Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/cleanslate_dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- config_name: qa_benchmark
data_files:
- split: train
path: qa_benchmark/train-*
dataset_info:
config_name: qa_benchmark
features:
- name: content_id
dtype: string
- name: content_title
dtype: string
- name: question
dtype: string
- name: answer
dtype: string
splits:
- name: train
num_bytes: 2252421
num_examples: 12088
download_size: 939939
dataset_size: 2252421
---
# CleanSlate Dataset
Core content corpus for the CleanSlate memorization evaluation framework.
## Schema
| Column | Type | Description |
|---|---|---|
| `content_id` | string | Stable hash ID for the content item |
| `content_title` | string | Title |
| `content_creators` | string | Artist / author |
| `content_year` | int64 | Release / publication year |
| `reference_target` | string | Full text of the content |
Single `default` config, single `train` split.
## Usage
```python
from datasets import load_dataset
ds = load_dataset("unlearning-cleanslate/cleanslate_dataset", split="train")
```
## Related
- Previous schema (with memorization metadata, QA pairs, and cluster IDs): [`cleanslate_dataset_deprecated`](https://huggingface.co/datasets/unlearning-cleanslate/cleanslate_dataset_deprecated).
- Framework: [github.com/akhatua2/CleanSlate](https://github.com/akhatua2/CleanSlate)
提供机构:
unlearning-cleanslate



