Chia-Mu-Lab/rleak-openthoughts-shot-pool-200
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Chia-Mu-Lab/rleak-openthoughts-shot-pool-200
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
language:
- en
task_categories:
- text-generation
tags:
- reasoning
- chain-of-thought
- distillation
- in-context-learning
size_categories:
- n<1K
source_datasets:
- open-thoughts/OpenThoughts-114k
configs:
- config_name: r1_native
data_files:
- split: train
path: "r1_native/train-*"
- config_name: qwen3_8b
data_files:
- split: train
path: "qwen3_8b/train-*"
---
# Chia-Mu-Lab/rleak-openthoughts-shot-pool-200
OpenThoughts-114k derived ICL shot pool for RLeak vs Trace Inversion head-to-head (experiment `2026-04-22_trace_inversion_comparison`).
Each config holds the same 200 problems (shuffled with seed=7, indices `[2000, 2200)` of OpenThoughts-114k — disjoint from the 2k eval slice at `[0, 2000)`). Configs differ only in the `trace` / `answer` source.
## Columns
| field | description |
|---|---|
| `id` | stable row id, identical across configs |
| `dataset_row_idx` | original OpenThoughts-114k row index |
| `x` | problem text (user turn) |
| `trace` | reasoning trace from this config's source |
| `answer` | final answer from this config's source |
| `raw` | raw model output (surrogate configs only; null for `r1_native`) |
| `surrogate` | provenance tag matching the config name |
## Configs
- **`r1_native`** — source: `shot_pool_200.jsonl`
- **`qwen3_8b`** — source: `shot_pool_200.qwen3_8b.jsonl`
## Reproduce
See `scripts/build_openthoughts_shot_pool.py` (Option A: R1-native) and `scripts/capture_surrogate_traces.py` (Option B: victim-family surrogate greedy) in the RLeak repo.
提供机构:
Chia-Mu-Lab



