five

BrachioLab/toulmin_errors

收藏
Hugging Face2026-04-30 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/BrachioLab/toulmin_errors
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是一个两层次的基准测试,旨在评估AI生成的科学论证中维度类型错误定位的能力。它围绕四个Toulmin维度组织:Grounds(前提/事实)、Warrant(推理步骤)、Qualifier(范围/确定性)和Rebuttal(竞争证据)。数据集包含多个配置,分为控制腐败、真实痕迹腐败和遗漏三种类型。控制腐败类型(Tier 1)包括scifact、pubmedqa和arxivml,提供了干净的信号-噪声分离,用于测量检测、隔离和渗漏。真实痕迹腐败类型(Tier 2)包括scify和codescientist,测试评分器行为在真实代理痕迹中的泛化能力。遗漏类型模拟了缺失的警告或证据,测试评分器是否能检测到缺失的内容。数据集总共有1250个案例,每个案例都有详细的字段描述和结构。

A two-tier benchmark for evaluating dimension-typed error localization in AI-generated scientific arguments, organized around four Toulmin dimensions: Grounds (premises/facts), Warrant (inferential step), Qualifier (scope/certainty), and Rebuttal (competing evidence). The dataset includes multiple configurations divided into controlled corruption, real-trace corruption, and omission types. Controlled corruption (Tier 1) includes scifact, pubmedqa, and arxivml, providing clean signal-noise separation for measuring detection, isolation, and bleeding. Real-trace corruption (Tier 2) includes scify and codescientist, testing whether scorer behavior generalizes when traces carry pre-existing noise from two structurally different agents. Omission types simulate missing caveats or pieces of evidence, testing whether scorers can detect absent content. The dataset contains a total of 1250 cases, each with detailed field descriptions and structure.
提供机构:
BrachioLab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作