five

PRBench

收藏
魔搭社区2025-12-05 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/ScaleAI/PRBench
下载链接
链接失效反馈
官方服务:
资源简介:
# PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning PRBench consists of: * 1,100 expert-authored conversations across Finance and Legal domains * 19,356 expert-curated rubric criteria (10–30 per task) * Coverage of 114 countries, 47 U.S. jurisdictions, and 25 total professional topics. * Hard subsets (Finance-300, Legal-250) representing the most challenging tasks We release the Finance, Finance-Hard, Legal and Legal-Hard subsets in this release, along with our evaluation code at https://github.com/scaleapi/PRBench. See the release for full details at: https://scale.com/research/prbench ![image](https://cdn-uploads.huggingface.co/production/uploads/617b009a7b4dce0224d6b8fb/9Gip9LfXUECzRMUy1kkyc.png) Explore our data using our visualizer at: https://prbench-explorer.vercel.app/ ![image](https://cdn-uploads.huggingface.co/production/uploads/617b009a7b4dce0224d6b8fb/LwsxCpmOrJCyI_SwD4e9I.png)

# PRBench:面向高风险专业推理评估的大规模专家评分细则集 PRBench 包含以下组成部分: * 覆盖金融与法律两大领域的1100段专家撰写对话 * 19356条经专家甄选的评分细则(单任务对应10至30条细则) * 覆盖114个国家、47个美国司法管辖区,涵盖25个专业细分领域 * 设有两类高难度任务子集:Finance-300与Legal-250,代表本基准中最具挑战性的推理任务 本次发布同步开放金融、金融-高难度、法律与法律-高难度四个子集,并配套提供评估代码,代码仓库地址为https://github.com/scaleapi/PRBench。 完整详细信息请查阅本次发布页面:https://scale.com/research/prbench ![image](https://cdn-uploads.huggingface.co/production/uploads/617b009a7b4dce0224d6b8fb/9Gip9LfXUECzRMUy1kkyc.png) 可通过官方可视化工具探索本数据集,工具地址为https://prbench-explorer.vercel.app/ ![image](https://cdn-uploads.huggingface.co/production/uploads/617b009a7b4dce0224d6b8fb/LwsxCpmOrJCyI_SwD4e9I.png)
提供机构:
maas
创建时间:
2025-11-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作