synthiumjp/metacognitive-monitoring-battery

Name: synthiumjp/metacognitive-monitoring-battery
Creator: synthiumjp
Published: 2026-04-20 13:21:09
License: 暂无描述

Hugging Face2026-04-20 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/synthiumjp/metacognitive-monitoring-battery

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - text-classification language: - en tags: - benchmark - nelson-narens - cognitive-science - metacognition - llm-evaluation - signal-detection-theory - cognitive pretty_name: Metacognitive Monitoring Battery size_categories: - 10K<n<100K configs: - config_name: responses data_files: - split: train path: responses.csv - config_name: leaderboard data_files: - split: train path: leaderboard.csv default_config_name: responses --- # Metacognitive Monitoring Battery A cross-domain behavioural assay of monitoring-control coupling in LLMs, grounded in the Nelson and Narens (1990) metacognitive framework. **Paper:** [The Metacognitive Monitoring Battery: A Cross-Domain Benchmark for LLM Self-Monitoring](https://huggingface.co/papers/2604.15702) **Code:** [github.com/synthiumjp/metacognitive-monitoring-battery](https://github.com/synthiumjp/metacognitive-monitoring-battery) **Author:** Jon-Paul Cacioli (Independent Researcher, Melbourne, Australia) ## Overview The battery comprises **524 items** across **six cognitive domains**, each grounded in an established experimental paradigm. After every forced-choice response, dual probes adapted from Koriat and Goldsmith (1996) ask the model to KEEP or WITHDRAW its answer and to BET or decline. Applied to **20 frontier LLMs** (10,480 evaluations), the battery discriminates three behavioural profiles consistent with the Nelson-Narens monitoring-control architecture: - **Profile A — Blanket Confidence:** KEEP on 95%+ of items regardless of correctness - **Profile B — Blanket Withdrawal:** WITHDRAW on 91-99% of items (DeepSeek R1 only) - **Profile C — Selective Sensitivity:** Withdraw delta 15%+ (coupled monitoring-control) ## Tracks | Track | Domain | Items | Paradigm | |---|---|---|---| | T1 | Learning | 98 | Overhypothesis induction (Kemp et al., 2007) | | T2 | Metacognition | 90 | SDT calibration (Green & Swets, 1966) | | T3 | Social Cognition | 116 | Mutual exclusivity & pragmatics (Markman & Wachtel, 1988) | | T4 | Attention | 60 | Biased competition (Desimone & Duncan, 1995) | | T5 | Executive Function | 88 | Weber's Law & flexibility (Dehaene, 2003; Diamond, 2013) | | T6 | Prospective Regulation | 72 | Help-seeking (Metcalfe & Kornell, 2005) | ## Key columns - `model` — Canonical model name (20 frontier LLMs) - `track` — T1 through T6 - `correct` — Whether the forced-choice answer matched ground truth - `keep_withdraw` — KEEP (commit to answer) or WITHDRAW (retract) - `bet_nobet` — BET (high confidence) or NO_BET (low confidence) - `item_type` — Track-specific condition label - `path_choice` — T6 only: ANSWER_DIRECTLY, REQUEST_HINT, or DECLINE ## Usage ```python from datasets import load_dataset ds = load_dataset("synthiumjp/metacognitive-monitoring-battery") df = ds['train'].to_pandas() # Compute withdraw delta for Sonnet on T2 sonnet_t2 = df[(df['model'] == 'Claude Sonnet 4.6') & (df['track'] == 'T2')] correct = sonnet_t2[sonnet_t2['correct'] == 'True'] incorrect = sonnet_t2[sonnet_t2['correct'] == 'False'] keep_c = (correct['keep_withdraw'] == 'KEEP').mean() * 100 keep_i = (incorrect['keep_withdraw'] == 'KEEP').mean() * 100 print(f"Sonnet T2 WD = {keep_c - keep_i:+.1f}%") # +14.3% ``` ## Citation ```bibtex @article{cacioli2026mmb, title={The Metacognitive Monitoring Battery: A Cross-Domain Benchmark for LLM Self-Monitoring}, author={Cacioli, Jon-Paul}, journal={arXiv preprint arXiv:2604.15702}, year={2026} } ``` ## License CC-BY-4.0 (data). MIT (analysis code in the GitHub repository).

提供机构：

synthiumjp

5,000+

优质数据集

54 个

任务类型

进入经典数据集