botcoinmoney/domain-agnostic-causal-reasoning-tuning
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/botcoinmoney/domain-agnostic-causal-reasoning-tuning
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- question-answering
- text-generation
language:
- en
tags:
- multi-hop-reasoning
- document-reasoning
- synthetic-data
- causal-reasoning
- fine-tuning
- reasoning-traces
- RLVR
- distillation
size_categories:
- 1K<n<10K
pretty_name: "Domain-Agnostic Causal Reasoning Tuning Dataset"
dataset_info:
- config_name: dpo_reasoning_bookend
features:
- name: challenge_id
dtype: string
- name: challenge_domain
dtype: string
- name: prompt
dtype: string
- name: chosen
dtype: string
- name: rejected
dtype: string
splits:
- name: train
num_examples: 9
num_bytes: 590496
- name: validation
num_examples: 1
num_bytes: 58337
- config_name: dpo_reasoning_sequential
features:
- name: challenge_id
dtype: string
- name: challenge_domain
dtype: string
- name: prompt
dtype: string
- name: chosen
dtype: string
- name: rejected
dtype: string
splits:
- name: train
num_examples: 9
num_bytes: 590721
- name: validation
num_examples: 1
num_bytes: 58337
- config_name: grpo_reasoning
features:
- name: challenge_id
dtype: string
- name: challenge_domain
dtype: string
- name: prompt
dtype: string
- name: response
dtype: string
- name: reward
dtype: float64
splits:
- name: train
num_examples: 610
num_bytes: 52487599
- name: validation
num_examples: 36
num_bytes: 3104326
- name: test
num_examples: 33
num_bytes: 2798334
- config_name: grpo_reasoning_v2
features:
- name: challenge_id
dtype: string
- name: challenge_domain
dtype: string
- name: prompt
dtype: string
- name: response
dtype: string
- name: reward
dtype: float64
splits:
- name: train
num_examples: 4421
num_bytes: 230534313
- name: validation
num_examples: 250
num_bytes: 12970183
- name: test
num_examples: 242
num_bytes: 12612053
- config_name: prm_reasoning
features:
- name: challenge_id
dtype: string
- name: challenge_domain
dtype: string
- name: prompt
dtype: string
- name: response
dtype: string
- name: reward
dtype: float64
splits:
- name: train
num_examples: 34
num_bytes: 2270099
- name: validation
num_examples: 3
num_bytes: 187911
- name: test
num_examples: 1
num_bytes: 37126
- config_name: sft_reasoning
features:
- name: challenge_id
dtype: string
- name: challenge_domain
dtype: string
- name: prompt
dtype: string
- name: response
dtype: string
splits:
- name: train
num_examples: 610
num_bytes: 52487599
- name: validation
num_examples: 36
num_bytes: 3104326
- name: test
num_examples: 33
num_bytes: 2798334
- config_name: sft_reasoning_v2
features:
- name: challenge_id
dtype: string
- name: challenge_domain
dtype: string
- name: prompt
dtype: string
- name: response
dtype: string
splits:
- name: train
num_examples: 4421
num_bytes: 230534313
- name: validation
num_examples: 250
num_bytes: 12970183
- name: test
num_examples: 242
num_bytes: 12612053
---
# Domain-Agnostic Causal Reasoning Tuning Dataset
Training data for fine-tuning language models on multi-hop document reasoning. Each example is a graded reasoning trace produced by a frontier AI agent solving a procedurally generated challenge from the Botcoin proof-of-inference network.
The traces contain no real domain knowledge. Entities are fictional, numbers are random, and documents are generated deterministically from 128-bit seeds. The reasoning structure is what matters: multi-hop evidence chaining, numerical computation, conflict resolution, and constrained artifact construction.
Fine-tuning Qwen 2.5 7B on 4,421 of these traces (the `sft_reasoning_v2` config) more than doubles accuracy on real arXiv papers: 18.9% to 40.0% on DACR-Bench.
## Dataset Configurations
This dataset contains 7 configurations, each formatted for a different training objective:
| Config | Format | Train | Val | Test | Description |
|--------|--------|-------|-----|------|-------------|
| **`sft_reasoning_v2`** | SFT | 4,421 | 250 | 242 | Supervised fine-tuning traces (v2, used in paper) |
| **`grpo_reasoning_v2`** | GRPO/RLVR | 4,421 | 250 | 242 | Same traces with scalar reward signal |
| `sft_reasoning` | SFT | 610 | 36 | 33 | Earlier v1 traces (smaller corpus) |
| `grpo_reasoning` | GRPO/RLVR | 610 | 36 | 33 | v1 traces with reward signal |
| `dpo_reasoning_bookend` | DPO | 9 | 1 | - | Chosen/rejected pairs (bookend sampling) |
| `dpo_reasoning_sequential` | DPO | 9 | 1 | - | Chosen/rejected pairs (sequential sampling) |
| `prm_reasoning` | PRM | 34 | 3 | 1 | Process reward model supervision |
**For reproducing the paper results, use `sft_reasoning_v2` (train split, 4,421 examples).**
## Quick Start
```python
from datasets import load_dataset
# Load the config used in the paper
ds = load_dataset(
"botcoinmoney/domain-agnostic-causal-reasoning-tuning",
"sft_reasoning_v2",
split="train"
)
print(f"{len(ds)} training examples")
print(ds[0].keys())
# dict_keys(['challenge_id', 'challenge_domain', 'prompt', 'response'])
```
## Data Fields
### SFT configs (`sft_reasoning`, `sft_reasoning_v2`)
| Field | Type | Description |
|-------|------|-------------|
| `challenge_id` | string | Unique identifier for the source challenge |
| `challenge_domain` | string | Domain (e.g., quantum_physics, computational_biology) |
| `prompt` | string | Full challenge: synthetic document + 10 questions + 8 constraints |
| `response` | string | Solver's structured response with answers, citations, reasoning, and constrained artifact |
### GRPO configs (`grpo_reasoning`, `grpo_reasoning_v2`)
Same as SFT plus:
| Field | Type | Description |
|-------|------|-------------|
| `reward` | float | Composite quality score (0.0 to 1.0) from deterministic constraint verification |
### DPO configs (`dpo_reasoning_bookend`, `dpo_reasoning_sequential`)
| Field | Type | Description |
|-------|------|-------------|
| `prompt` | string | Full challenge prompt |
| `chosen` | string | Higher-quality solver response |
| `rejected` | string | Lower-quality solver response |
## Trace Provenance
Traces were collected from the Botcoin proof-of-inference mining network. Three frontier model families contributed:
| Model | Answer Accuracy | Trace Quality | Notes |
|-------|----------------|---------------|-------|
| GPT-5.4 | 90% | 0.60 | Highest accuracy, less detailed traces |
| Claude Haiku | 67% | 0.70 | Lower accuracy, more structured reasoning |
| Codex | varies | varies | Strong at numerical computation |
Multi-model diversity is deliberate. Different architectures produce different reasoning strategies, error modes, and solution structures. Training on the combined corpus outperforms any single-model subset.
## Domains
The v2 corpus spans four synthetic domains:
- **Corporate financial analysis** (companies with quarterly revenues, employee counts, market sectors)
- **Quantum physics** (error-correcting codes with qubit counts, measurement rounds, noise models)
- **Computational biology** (metabolic modeling with flux distributions, enzyme kinetics, pathway analysis)
- **Single-cell RNA imputation** (scRNA-seq methods with gene counts, dropout rates, imputation algorithms)
No real-world data appears in any trace. All entities, attributes, and values are procedurally generated.
## Challenge Structure
Each challenge consists of:
- A synthetic document (1,500 to 3,000 words) generated deterministically from a 128-bit seed
- 10 multi-hop questions with declarative answer logic (filter, aggregate, reduce, extract, chain)
- 8 constraints the solver's artifact must satisfy (word count, string inclusions, forbidden letter, acrostic, prime number, equation)
- 1 silent trap: a wrong value planted early in the document, with the correct value appearing later
## Related Resources
| Resource | Link |
|----------|------|
| Paper | [huggingface.co/botcoinmoney/dacr-paper](https://huggingface.co/botcoinmoney/dacr-paper) |
| DACR-Bench (benchmark) | [github.com/botcoinmoney/dacr-bench](https://github.com/botcoinmoney/dacr-bench) |
| Training and evaluation code | [github.com/botcoinmoney/synthetic-to-real-reasoning](https://github.com/botcoinmoney/synthetic-to-real-reasoning) |
| Evaluation results | [huggingface.co/datasets/botcoinmoney/dacr-bench-results](https://huggingface.co/datasets/botcoinmoney/dacr-bench-results) |
## Citation
```bibtex
@article{botoshi2026synthetic,
title={Synthetic Traces, Real Reasoning: How Procedurally Generated Document Challenges Transfer to Real-World Scientific Papers},
author={Botoshi},
year={2026},
note={Available at \url{https://huggingface.co/botcoinmoney/dacr-paper}}
}
```
## License
Apache-2.0
提供机构:
botcoinmoney



