Name: botcoinmoney/domain-agnostic-causal-reasoning-tuning
Creator: botcoinmoney
Published: 2026-04-08 12:57:53
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/botcoinmoney/domain-agnostic-causal-reasoning-tuning

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - question-answering - text-generation language: - en tags: - multi-hop-reasoning - document-reasoning - synthetic-data - causal-reasoning - fine-tuning - reasoning-traces - RLVR - distillation size_categories: - 1K<n<10K pretty_name: "Domain-Agnostic Causal Reasoning Tuning Dataset" dataset_info: - config_name: dpo_reasoning_bookend features: - name: challenge_id dtype: string - name: challenge_domain dtype: string - name: prompt dtype: string - name: chosen dtype: string - name: rejected dtype: string splits: - name: train num_examples: 9 num_bytes: 590496 - name: validation num_examples: 1 num_bytes: 58337 - config_name: dpo_reasoning_sequential features: - name: challenge_id dtype: string - name: challenge_domain dtype: string - name: prompt dtype: string - name: chosen dtype: string - name: rejected dtype: string splits: - name: train num_examples: 9 num_bytes: 590721 - name: validation num_examples: 1 num_bytes: 58337 - config_name: grpo_reasoning features: - name: challenge_id dtype: string - name: challenge_domain dtype: string - name: prompt dtype: string - name: response dtype: string - name: reward dtype: float64 splits: - name: train num_examples: 610 num_bytes: 52487599 - name: validation num_examples: 36 num_bytes: 3104326 - name: test num_examples: 33 num_bytes: 2798334 - config_name: grpo_reasoning_v2 features: - name: challenge_id dtype: string - name: challenge_domain dtype: string - name: prompt dtype: string - name: response dtype: string - name: reward dtype: float64 splits: - name: train num_examples: 4421 num_bytes: 230534313 - name: validation num_examples: 250 num_bytes: 12970183 - name: test num_examples: 242 num_bytes: 12612053 - config_name: prm_reasoning features: - name: challenge_id dtype: string - name: challenge_domain dtype: string - name: prompt dtype: string - name: response dtype: string - name: reward dtype: float64 splits: - name: train num_examples: 34 num_bytes: 2270099 - name: validation num_examples: 3 num_bytes: 187911 - name: test num_examples: 1 num_bytes: 37126 - config_name: sft_reasoning features: - name: challenge_id dtype: string - name: challenge_domain dtype: string - name: prompt dtype: string - name: response dtype: string splits: - name: train num_examples: 610 num_bytes: 52487599 - name: validation num_examples: 36 num_bytes: 3104326 - name: test num_examples: 33 num_bytes: 2798334 - config_name: sft_reasoning_v2 features: - name: challenge_id dtype: string - name: challenge_domain dtype: string - name: prompt dtype: string - name: response dtype: string splits: - name: train num_examples: 4421 num_bytes: 230534313 - name: validation num_examples: 250 num_bytes: 12970183 - name: test num_examples: 242 num_bytes: 12612053 --- # Domain-Agnostic Causal Reasoning Tuning Dataset Training data for fine-tuning language models on multi-hop document reasoning. Each example is a graded reasoning trace produced by a frontier AI agent solving a procedurally generated challenge from the Botcoin proof-of-inference network. The traces contain no real domain knowledge. Entities are fictional, numbers are random, and documents are generated deterministically from 128-bit seeds. The reasoning structure is what matters: multi-hop evidence chaining, numerical computation, conflict resolution, and constrained artifact construction. Fine-tuning Qwen 2.5 7B on 4,421 of these traces (the `sft_reasoning_v2` config) more than doubles accuracy on real arXiv papers: 18.9% to 40.0% on DACR-Bench. ## Dataset Configurations This dataset contains 7 configurations, each formatted for a different training objective: | Config | Format | Train | Val | Test | Description | |--------|--------|-------|-----|------|-------------| | **`sft_reasoning_v2`** | SFT | 4,421 | 250 | 242 | Supervised fine-tuning traces (v2, used in paper) | | **`grpo_reasoning_v2`** | GRPO/RLVR | 4,421 | 250 | 242 | Same traces with scalar reward signal | | `sft_reasoning` | SFT | 610 | 36 | 33 | Earlier v1 traces (smaller corpus) | | `grpo_reasoning` | GRPO/RLVR | 610 | 36 | 33 | v1 traces with reward signal | | `dpo_reasoning_bookend` | DPO | 9 | 1 | - | Chosen/rejected pairs (bookend sampling) | | `dpo_reasoning_sequential` | DPO | 9 | 1 | - | Chosen/rejected pairs (sequential sampling) | | `prm_reasoning` | PRM | 34 | 3 | 1 | Process reward model supervision | **For reproducing the paper results, use `sft_reasoning_v2` (train split, 4,421 examples).** ## Quick Start ```python from datasets import load_dataset # Load the config used in the paper ds = load_dataset( "botcoinmoney/domain-agnostic-causal-reasoning-tuning", "sft_reasoning_v2", split="train" ) print(f"{len(ds)} training examples") print(ds[0].keys()) # dict_keys(['challenge_id', 'challenge_domain', 'prompt', 'response']) ``` ## Data Fields ### SFT configs (`sft_reasoning`, `sft_reasoning_v2`) | Field | Type | Description | |-------|------|-------------| | `challenge_id` | string | Unique identifier for the source challenge | | `challenge_domain` | string | Domain (e.g., quantum_physics, computational_biology) | | `prompt` | string | Full challenge: synthetic document + 10 questions + 8 constraints | | `response` | string | Solver's structured response with answers, citations, reasoning, and constrained artifact | ### GRPO configs (`grpo_reasoning`, `grpo_reasoning_v2`) Same as SFT plus: | Field | Type | Description | |-------|------|-------------| | `reward` | float | Composite quality score (0.0 to 1.0) from deterministic constraint verification | ### DPO configs (`dpo_reasoning_bookend`, `dpo_reasoning_sequential`) | Field | Type | Description | |-------|------|-------------| | `prompt` | string | Full challenge prompt | | `chosen` | string | Higher-quality solver response | | `rejected` | string | Lower-quality solver response | ## Trace Provenance Traces were collected from the Botcoin proof-of-inference mining network. Three frontier model families contributed: | Model | Answer Accuracy | Trace Quality | Notes | |-------|----------------|---------------|-------| | GPT-5.4 | 90% | 0.60 | Highest accuracy, less detailed traces | | Claude Haiku | 67% | 0.70 | Lower accuracy, more structured reasoning | | Codex | varies | varies | Strong at numerical computation | Multi-model diversity is deliberate. Different architectures produce different reasoning strategies, error modes, and solution structures. Training on the combined corpus outperforms any single-model subset. ## Domains The v2 corpus spans four synthetic domains: - **Corporate financial analysis** (companies with quarterly revenues, employee counts, market sectors) - **Quantum physics** (error-correcting codes with qubit counts, measurement rounds, noise models) - **Computational biology** (metabolic modeling with flux distributions, enzyme kinetics, pathway analysis) - **Single-cell RNA imputation** (scRNA-seq methods with gene counts, dropout rates, imputation algorithms) No real-world data appears in any trace. All entities, attributes, and values are procedurally generated. ## Challenge Structure Each challenge consists of: - A synthetic document (1,500 to 3,000 words) generated deterministically from a 128-bit seed - 10 multi-hop questions with declarative answer logic (filter, aggregate, reduce, extract, chain) - 8 constraints the solver's artifact must satisfy (word count, string inclusions, forbidden letter, acrostic, prime number, equation) - 1 silent trap: a wrong value planted early in the document, with the correct value appearing later ## Related Resources | Resource | Link | |----------|------| | Paper | [huggingface.co/botcoinmoney/dacr-paper](https://huggingface.co/botcoinmoney/dacr-paper) | | DACR-Bench (benchmark) | [github.com/botcoinmoney/dacr-bench](https://github.com/botcoinmoney/dacr-bench) | | Training and evaluation code | [github.com/botcoinmoney/synthetic-to-real-reasoning](https://github.com/botcoinmoney/synthetic-to-real-reasoning) | | Evaluation results | [huggingface.co/datasets/botcoinmoney/dacr-bench-results](https://huggingface.co/datasets/botcoinmoney/dacr-bench-results) | ## Citation ```bibtex @article{botoshi2026synthetic, title={Synthetic Traces, Real Reasoning: How Procedurally Generated Document Challenges Transfer to Real-World Scientific Papers}, author={Botoshi}, year={2026}, note={Available at \url{https://huggingface.co/botcoinmoney/dacr-paper}} } ``` ## License Apache-2.0

应用场景：