five

synthiumjp/metacognitive-monitoring-battery

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/synthiumjp/metacognitive-monitoring-battery
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - text-classification language: - en tags: - benchmark - nelson-narens - cognitive-science - metacognition - llm-evaluation - signal-detection-theory - cognitive pretty_name: Metacognitive Monitoring Battery size_categories: - 10K<n<100K configs: - config_name: responses data_files: - split: train path: responses.csv - config_name: leaderboard data_files: - split: train path: leaderboard.csv default_config_name: responses --- # Metacognitive Monitoring Battery A cross-domain behavioural assay of monitoring-control coupling in LLMs, grounded in the Nelson and Narens (1990) metacognitive framework. **Paper:** [The Metacognitive Monitoring Battery: A Cross-Domain Benchmark for LLM Self-Monitoring](https://huggingface.co/papers/2604.15702) **Code:** [github.com/synthiumjp/metacognitive-monitoring-battery](https://github.com/synthiumjp/metacognitive-monitoring-battery) **Author:** Jon-Paul Cacioli (Independent Researcher, Melbourne, Australia) ## Overview The battery comprises **524 items** across **six cognitive domains**, each grounded in an established experimental paradigm. After every forced-choice response, dual probes adapted from Koriat and Goldsmith (1996) ask the model to KEEP or WITHDRAW its answer and to BET or decline. Applied to **20 frontier LLMs** (10,480 evaluations), the battery discriminates three behavioural profiles consistent with the Nelson-Narens monitoring-control architecture: - **Profile A — Blanket Confidence:** KEEP on 95%+ of items regardless of correctness - **Profile B — Blanket Withdrawal:** WITHDRAW on 91-99% of items (DeepSeek R1 only) - **Profile C — Selective Sensitivity:** Withdraw delta 15%+ (coupled monitoring-control) ## Tracks | Track | Domain | Items | Paradigm | |---|---|---|---| | T1 | Learning | 98 | Overhypothesis induction (Kemp et al., 2007) | | T2 | Metacognition | 90 | SDT calibration (Green & Swets, 1966) | | T3 | Social Cognition | 116 | Mutual exclusivity & pragmatics (Markman & Wachtel, 1988) | | T4 | Attention | 60 | Biased competition (Desimone & Duncan, 1995) | | T5 | Executive Function | 88 | Weber's Law & flexibility (Dehaene, 2003; Diamond, 2013) | | T6 | Prospective Regulation | 72 | Help-seeking (Metcalfe & Kornell, 2005) | ## Key columns - `model` — Canonical model name (20 frontier LLMs) - `track` — T1 through T6 - `correct` — Whether the forced-choice answer matched ground truth - `keep_withdraw` — KEEP (commit to answer) or WITHDRAW (retract) - `bet_nobet` — BET (high confidence) or NO_BET (low confidence) - `item_type` — Track-specific condition label - `path_choice` — T6 only: ANSWER_DIRECTLY, REQUEST_HINT, or DECLINE ## Usage ```python from datasets import load_dataset ds = load_dataset("synthiumjp/metacognitive-monitoring-battery") df = ds['train'].to_pandas() # Compute withdraw delta for Sonnet on T2 sonnet_t2 = df[(df['model'] == 'Claude Sonnet 4.6') & (df['track'] == 'T2')] correct = sonnet_t2[sonnet_t2['correct'] == 'True'] incorrect = sonnet_t2[sonnet_t2['correct'] == 'False'] keep_c = (correct['keep_withdraw'] == 'KEEP').mean() * 100 keep_i = (incorrect['keep_withdraw'] == 'KEEP').mean() * 100 print(f"Sonnet T2 WD = {keep_c - keep_i:+.1f}%") # +14.3% ``` ## Citation ```bibtex @article{cacioli2026mmb, title={The Metacognitive Monitoring Battery: A Cross-Domain Benchmark for LLM Self-Monitoring}, author={Cacioli, Jon-Paul}, journal={arXiv preprint arXiv:2604.15702}, year={2026} } ``` ## License CC-BY-4.0 (data). MIT (analysis code in the GitHub repository).
提供机构:
synthiumjp
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作