five

facebook/principia-bench

收藏
Hugging Face2025-12-18 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/facebook/principia-bench
下载链接
链接失效反馈
官方服务:
资源简介:
Principia Bench是一个基准测试,旨在评估语言模型从STEM相关问题陈述中推导数学对象的能力。每个实例包含一个问题陈述和一个真实答案。问题陈述来自四个基准测试:RealMath、SuperGPQA、Physics和ARB,并经过筛选保留满足以下条件的实例:(1)答案以数学对象表示,(2)问题陈述由单个问题组成,(3)问题陈述是自包含的。数据集包含2241个实例,所有实例都需要推导数学对象。该基准测试可专门用于测试推导数学对象的能力,或更广泛地用于评估推理能力。

Principia Bench is a benchmark designed to evaluate language models ability to derive mathematical objects from STEM-related problem statements. Each instance contains a problem statement and a ground-truth answer. The problem statements are drawn from four benchmarks—RealMath, SuperGPQA (with answers choices removed), Physics, and ARB—and are filtered to retain only instances where (1) the answer is expressed as a mathematical object, (2) the problem statement consists of a single question, and (3) the problem statement is self-contained. Principia Bench includes 2241 instances, all of which require deriving mathematical objects. The benchmark can be used specifically to test the ability to derive mathematical objects or, more broadly, to evaluate reasoning capability.
提供机构:
facebook
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作