PRBench
收藏魔搭社区2025-12-05 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/ScaleAI/PRBench
下载链接
链接失效反馈官方服务:
资源简介:
# PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning
PRBench consists of:
* 1,100 expert-authored conversations across Finance and Legal domains
* 19,356 expert-curated rubric criteria (10–30 per task)
* Coverage of 114 countries, 47 U.S. jurisdictions, and 25 total professional topics.
* Hard subsets (Finance-300, Legal-250) representing the most challenging tasks
We release the Finance, Finance-Hard, Legal and Legal-Hard subsets in this release, along with our evaluation code at https://github.com/scaleapi/PRBench.
See the release for full details at: https://scale.com/research/prbench

Explore our data using our visualizer at: https://prbench-explorer.vercel.app/

# PRBench:面向高风险专业推理评估的大规模专家评分细则集
PRBench 包含以下组成部分:
* 覆盖金融与法律两大领域的1100段专家撰写对话
* 19356条经专家甄选的评分细则(单任务对应10至30条细则)
* 覆盖114个国家、47个美国司法管辖区,涵盖25个专业细分领域
* 设有两类高难度任务子集:Finance-300与Legal-250,代表本基准中最具挑战性的推理任务
本次发布同步开放金融、金融-高难度、法律与法律-高难度四个子集,并配套提供评估代码,代码仓库地址为https://github.com/scaleapi/PRBench。
完整详细信息请查阅本次发布页面:https://scale.com/research/prbench

可通过官方可视化工具探索本数据集,工具地址为https://prbench-explorer.vercel.app/

提供机构:
maas
创建时间:
2025-11-14



