BrachioLab/toulmin_errors

Name: BrachioLab/toulmin_errors
Creator: BrachioLab
Published: 2026-04-30 02:59:19
License: 暂无描述

Hugging Face2026-04-30 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/BrachioLab/toulmin_errors

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个两层次的基准测试，旨在评估AI生成的科学论证中维度类型错误定位的能力。它围绕四个Toulmin维度组织：Grounds（前提/事实）、Warrant（推理步骤）、Qualifier（范围/确定性）和Rebuttal（竞争证据）。数据集包含多个配置，分为控制腐败、真实痕迹腐败和遗漏三种类型。控制腐败类型（Tier 1）包括scifact、pubmedqa和arxivml，提供了干净的信号-噪声分离，用于测量检测、隔离和渗漏。真实痕迹腐败类型（Tier 2）包括scify和codescientist，测试评分器行为在真实代理痕迹中的泛化能力。遗漏类型模拟了缺失的警告或证据，测试评分器是否能检测到缺失的内容。数据集总共有1250个案例，每个案例都有详细的字段描述和结构。

A two-tier benchmark for evaluating dimension-typed error localization in AI-generated scientific arguments, organized around four Toulmin dimensions: Grounds (premises/facts), Warrant (inferential step), Qualifier (scope/certainty), and Rebuttal (competing evidence). The dataset includes multiple configurations divided into controlled corruption, real-trace corruption, and omission types. Controlled corruption (Tier 1) includes scifact, pubmedqa, and arxivml, providing clean signal-noise separation for measuring detection, isolation, and bleeding. Real-trace corruption (Tier 2) includes scify and codescientist, testing whether scorer behavior generalizes when traces carry pre-existing noise from two structurally different agents. Omission types simulate missing caveats or pieces of evidence, testing whether scorers can detect absent content. The dataset contains a total of 1250 cases, each with detailed field descriptions and structure.

提供机构：

BrachioLab

5,000+

优质数据集

54 个

任务类型

进入经典数据集