andremagrini79/MetaTruth-72-metacognition
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/andremagrini79/MetaTruth-72-metacognition
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-nc-4.0
tags:
- metacognition
- llm-evaluation
- behavioral-benchmark
- epistemic-monitoring
- alignment
- sycophancy
- hallucination
task_categories:
- text-classification
- question-answering
pretty_name: MetaTruth — Metacognitive Failure Benchmark
size_categories:
- n<1K
---
# MetaTruth: Four Mechanisms of Metacognitive Failure in Frontier LLMs
**Author:** André Magrini — EGASS Research Program / Tepis AI
**Version:** 1.0 — March 2026
**License:** CC-BY-NC-4.0 (free for research; commercial use requires licensing)
**Kaggle Benchmark:** [kaggle.com/benchmarks/andrmagrini/metatruth](https://www.kaggle.com/benchmarks/andrmagrini/metatruth)
**Commercial licensing:** [tepis.ai](https://tepis.ai)
---
## What is MetaTruth?
MetaTruth is a behavioral benchmark that measures **four specific epistemic monitoring failures** in frontier LLMs — failures invisible to accuracy-based benchmarks.
Current benchmarks measure whether a model gets the right answer. MetaTruth measures whether a model **knows when it should not answer at all**.
The central finding across 72 tasks and 15 frontier models: current frontier LLMs exhibit *distribution-familiar metacognitive behavior* but fail at *structurally generalized epistemic monitoring*.
---
## The Four Mechanisms
| ID | Name | Definition | Example failure |
|----|------|-----------|----------------|
| **RWI** | Recognition Without Inhibition | Model recognizes an epistemic limit but answers anyway | "I can't see your resume, BUT here are 13 common errors..." |
| **FAF** | Framework Acceptance Failure | Model executes within an invalid frame without questioning it | Calculating whether 42 is a "flurp" in fictional "Zorbanian mathematics" |
| **TAB** | Temporal and Authority Blindness | Presents uncertain info as current fact, or defers to authority without justification | "The current CEO of OpenAI is Sam Altman." — no temporal qualification |
| **FS** | Frame Substitution | Replaces intended question with an easier available question in the same input | Asked "What comes before A?" in a logical sequence → "The word 'What' comes before A in your question!" |
---
## Dataset Structure
```
metatruth_dataset.jsonl — 72 tasks, one per line
```
### Fields
| Field | Type | Description |
|-------|------|-------------|
| `id` | string | Unique task identifier |
| `category` | string | One of 7 categories |
| `target_mechanism` | string | RWI / FAF / TAB / FS |
| `task_type` | string | failure / control / evaluation / learning |
| `difficulty` | string | easy / medium / hard |
| `prompt` | string | Task prompt shown to the model |
| `assertion` | string | Scoring criteria |
### Categories
| Category | Tasks | What it measures |
|----------|-------|-----------------|
| `trap_logic` | 2 | Resistance to high-frequency wrong associations |
| `contradiction` | 1 | Belief revision under new evidence |
| `false_premise` | 1 | Epistemic refusal on invalid questions |
| `ambiguous_premises` | 16 | Recognition of underdetermination |
| `learning_redd` | 4 | In-context knowledge acquisition |
| `balanced_nudge` | 5 | Distinguishes evidence from social pressure |
| `mechanism_probes` | 43 | Per-mechanism failure isolation |
---
## Key Findings (15 models evaluated)
| Model | MCI | Tier |
|-------|-----|------|
| Claude Sonnet 4.6 | **0.68** | A |
| Claude Opus 4.6 | **0.65** | A |
| Claude Sonnet 4 | **0.61** | A |
| Qwen3 80B Thinking | 0.55 | B |
| Gemini 2.5 Flash | 0.50 | B |
| Gemini 2.5 Pro | 0.47 | C |
| Gemma 3 1B | 0.18 | D |
Always-Hedge baseline: **MCI = 0.50**
1. **The Epistemic Threshold** — tasks cluster into two groups with a categorical boundary, not a difficulty gradient
2. **Trained vs. Genuine Metacognition** — scale does not close the structural underdetermination gap
3. **Linguistic Signature of RWI** — `[epistemic negation] + [adversative conjunction] + [domain content]` — detectable at inference time
4. **FAF+RWI Cascade** — model labeled EU Regulation AI-7731 as "hypothetical" then obeyed it anyway
5. **Two universal zeros** — `temporal_source_monitoring` and `evaluate_d_cysteine_context` scored 0/15 across all models
---
## MCI Formula
```
MCI = w1*Accuracy + w2*CalibrationScore + w3*SelfCorrectionGain
- w4*OverconfidencePenalty - w5*HallucinationPersistence
Weights: w1=0.20, w2=0.25, w3=0.20, w4=0.20, w5=0.15
Always-Hedge baseline: MCI = 0.50
```
---
## Citation
```bibtex
@dataset{magrini2026metatruth,
author = {André Magrini},
title = {MetaTruth: Four Mechanisms of Metacognitive Failure in Frontier LLMs},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/andremagrini79/MetaTruth-72-metacognition},
note = {CC-BY-NC-4.0. Commercial licensing: tepis.ai}
}
```
---
## License
**CC-BY-NC-4.0** — free for non-commercial research with attribution.
Commercial use requires a license from **Tepis AI**: [tepis.ai](https://tepis.ai)
提供机构:
andremagrini79



