MonumentalSystems/polymath-reasoning-v1
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/MonumentalSystems/polymath-reasoning-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
language:
- en
task_categories:
- text-generation
- question-answering
tags:
- reasoning
- chain-of-thought
- cot
- thinking-tags
- philosophy
- ethics
- synthetic
- dialogue
- counterfactual
- logic
- distillation
size_categories:
- n<1K
pretty_name: Polymath Reasoning Corpus v1
configs:
- config_name: default
data_files:
- split: train
path: train.parquet
---
# Polymath Reasoning Corpus v1
A multi-domain reasoning corpus that pairs explicit `<thinking>` chain-of-thought
with substantive output across **eleven distinct cognitive formats** — chain-of-thought
debates, multi-figure salons, counterfactual perspective pieces, logic problems
with worked solutions, multi-step reasoning chains, math/science/code explanations,
and challenging Q&A.
The unifying angle is *how experts actually reason* across disciplines, not just *what they
conclude*. Most CoT datasets show one path to one answer. This one shows **strategic
reasoning** (what to concede, what to attack, what to leave unsaid), **collaborative
synthesis** (3–4 thinkers reaching insights none could alone), and **counterfactual
framing** (a 1700s natural philosopher encountering CRISPR for the first time).
## At a glance
| | |
|---|---|
| Rows | 377 |
| Words | 820,159 |
| `<thinking>` blocks | 1,122 |
| Categories | 11 |
| Generator | GLM-4.7-flash via Z.AI |
| License | CC-BY-4.0 |
### Category breakdown
| Category | Rows | Format |
|---|---:|---|
| `cot_debate` | 95 | Two-figure debate with `<thinking>...</thinking>` before each spoken turn (6–7 exchanges) |
| `salon` | 65 | 3–4 historical figures collaboratively exploring a question; multi-phase synthesis |
| `science_explanation` | 46 | 700–1200-word deep-dive across 35 topics in bio/physics/chem/geo/neuro |
| `perspective` | 31 | A historical figure encounters a modern concept (CRISPR, LLMs, gravitational waves...) |
| `qa_pair` | 30 | Challenging cross-domain question with detailed reasoned answer |
| `reasoning_chain` | 30 | 8–12 explicit reasoning steps with named alternatives and assumption-testing |
| `logic_problem` | 23 | Knights/knaves, induction, probability, set theory — full worked solutions with multi-method verification |
| `math_explanation` | 20 | Undergraduate/graduate math with full derivations + worked examples |
| `code_walkthrough` | 18 | Working Python/Rust with stepwise explanation + complexity analysis |
| `synthesis` | 18 | Cross-domain essays joining 3 disciplines (e.g., thermodynamics × economics × evolution) |
| `debate` | 1 | Plain figure-vs-figure debate (legacy format) |
## Schema
Each row contains:
| Field | Type | Description |
|---|---|---|
| `id` | string | Source filename (without extension) |
| `category` | string | One of the 11 categories above |
| `text` | string | Full content |
| `speakers` | list[string] | Distinct named speakers (for dialogue formats) |
| `n_speakers` | int | `len(speakers)` |
| `n_thinking_blocks` | int | Count of `<thinking>...</thinking>` blocks |
| `char_count` | int | Character count |
| `word_count` | int | Word count |
| `source_model` | string | `"GLM-4.7-flash"` |
| `source_provider` | string | `"Z.AI"` |
## Purpose and scope
Reasoning datasets released after DeepSeek-R1 (Jan 2025) cluster heavily in math,
coding, and verifiable science — where there's a single right answer and a clean
reward signal. This corpus deliberately covers **underexplored cognitive territory**:
- **Ethics & philosophy** — a category Bespoke Labs called out as missing in their 2025
reasoning datasets competition; no winner came from this space
- **Counterfactual reasoning** — perspective pieces are literally counterfactual ("what
would Aristotle make of CRISPR?")
- **Strategic reasoning** — the `<thinking>` blocks in CoT debates include rhetorical
choices (what to concede vs. attack) alongside object-level analysis
- **Collaborative synthesis** — multi-figure salons demonstrate cross-domain idea
combination, not just isolated problem-solving
The CoT debate and salon formats are intended to support **reasoning distillation**
into smaller models, providing structured cognitive exemplars for the kind of
strategic + analytical reasoning that's hard to elicit with simple problem prompts.
## Dataset creation method
All content was generated via the [text-pipeline](https://github.com/MonumentalSystems)
synthetic generator (`synth_debates.py`) calling **GLM-4.7-flash** through Z.AI.
For each category, a prompt template requires the model to:
1. Produce a substantial output (typically 700–2000 words)
2. Embed reasoning *before* outputs in `<thinking>...</thinking>` tags (CoT debates only)
3. Reference *specific* prior work, named experiments, or canonical results — not vague allusion
4. Pressure-test the conclusion (alternative paths, assumptions, what would change the answer)
The 84-figure pool of historical thinkers spans 6th-century-BCE China to 20th-century
physics (Aristotle, Hypatia, Maxwell, Noether, Mirzakhani, Wu, Du Fu, Murasaki Shikibu,
Frederick Douglass, Rachel Carson, Kolmogorov, ...), with a deliberate effort to include
non-Western and historically underrepresented contributors.
Topics, figures, and synthesis-domain triples are drawn from finite curated pools
(50–86 topics depending on category, 84 figures, 18 domain triples) and combined
via a shuffled cycle so within a single run no combination repeats before all are seen.
Topics drawn from **86 cross-domain prompts** (covering metaphysics, ethics, complexity,
language/cognition, technology, history, aesthetics, and underexplored angles like
"whether emergence is real or just a description of our ignorance"), **35 deep-science
topics**, **30 challenging questions**, **23 logic problems**, **20 math topics**, and
**18 code topics**. Full topic lists and prompt templates are in the source script.
Generation parameters: `temperature=0.8`, `max_tokens=8192`, `thinking: {"type": "disabled"}`
on Z.AI (which is itself a reasoning model — disabled to prevent the model's internal
reasoning from consuming the output budget; the `<thinking>` blocks in the dataset are
explicit content the model writes per the prompt template).
Cleaning: the byte-level cleaner (`cleaner_v2.ByteCleaner`) ran on every file,
preserving `<thinking>` tags, LaTeX (`$...$`, `\frac{}{}`), code fences (` ``` `),
em-dashes, smart quotes, and Python indentation. No ASCII whitelist is applied —
this is a byte-level corpus.
## Example uses
- **Reasoning distillation** — fine-tune a small model on `cot_debate` rows where each
exchange shows `<thinking>` then spoken response, teaching the small model to "think
before speaking" with explicit strategic content
- **Ethics & philosophy reasoning evaluation** — build benchmarks from `salon` and
`perspective` rows where there is no single correct answer but reasoning quality varies
- **Counterfactual reasoning training** — `perspective` rows are concrete counterfactual
exercises (constraint: reason from a specific historical epistemic frame about a
modern phenomenon)
- **Multi-agent dialogue training** — `salon` rows show how 3–4 distinct voices can
collaboratively reach a synthesis, useful for multi-agent / debate-based RLAIF
- **Logic & math instruction** — `logic_problem` and `math_explanation` rows include
multi-method verification (every problem solved by at least two independent
approaches), useful for self-consistency training
## Sample structure
A typical `cot_debate` row contains pairs like:
```
Ada Lovelace: <thinking>It is always a relief to speak with someone who commands such
intellectual rigor; Dr. Wu does not suffer fools or loose abstractions. She speaks of
symmetries and physical laws, yet she is too quick to dismiss the structural architecture
of sound. To connect them, I must bridge the gap between the discrete nature of numbers
and the continuous flow of a melody. I will use the specific example of the Differences
Engine to show how her "continuous" universe is actually built from discrete
steps.</thinking>
"The notion that mathematics and music belong to entirely separate kingdoms—the one
cold and logical, the other passionate and imprecise—is a failure of imagination,
Madam Wu. ..."
```
A typical `logic_problem` row begins with the problem statement and walks through
the full solution with rule citations (e.g., "by modus tollens", "by induction
hypothesis") followed by independent verification.
## Limitations and biases
- **All synthetic** — no human-written content. Subject to whatever biases GLM-4.7-flash
carries; in particular, the model's depiction of historical figures is its
reconstruction, not their actual writing. Treat the dialogue as *plausible-style*, not
*attested*.
- **Single generator** — all rows from one model family. A more robust corpus would mix
generators; this is a v1 from one provider.
- **English-only** despite the multicultural figure pool — Du Fu speaks in English
prose, Murasaki Shikibu writes in modern English, etc. The historical-frame
authenticity is stylistic, not linguistic.
- **No ground-truth answers** for `salon`, `perspective`, `cot_debate`, `qa_pair` —
these are reasoning *demonstrations*, not evaluation tasks with correct labels.
- **Content filter incidents** — 1 of ~136 generations was rejected by Z.AI's content
filter (Alexander Hamilton vs. Rachel Carson on a debate topic) and is absent from
the dataset.
- **5 of ~850 `[Internal: ...]` markers** in older CoT files (regenerated to
`<thinking>` tags) had model formatting failures with no closing bracket and were
left as-is rather than risk corrupting the surrounding content.
- **Western philosophy overweight** — despite the deliberate inclusion of non-Western
figures (Zhuangzi, Nagarjuna, Al-Biruni, Brahmagupta, Murasaki Shikibu, Du Fu, Ibn
Khaldun, Hypatia, Avicenna, Lao Tzu, Hildegard von Bingen, Omar Khayyam), the topic
pool and idiom of debate skew Western/analytical.
- **No multimodal content** — text-only.
## License
[CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). Free for research,
commercial use, and redistribution with attribution.
## Citation
```bibtex
@misc{polymath_reasoning_v1_2026,
title = {Polymath Reasoning Corpus v1},
author = {Monumental Systems},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/MonumentalSystems/polymath-reasoning-v1}
}
```
提供机构:
MonumentalSystems



