launch/ExpertLongBench

Name: launch/ExpertLongBench
Creator: launch
Published: 2025-07-30 18:57:40
License: 暂无描述

Hugging Face2025-07-30 更新2025-05-31 收录

下载链接：

https://hf-mirror.com/datasets/launch/ExpertLongBench

下载链接

链接失效反馈

官方服务：

资源简介：

ExpertLongBench是一个多领域基准，用于评估语言模型在长形式、结构化任务上的专家级性能。它包括模拟不同专业领域现实世界专家工作流程的任务，每个任务都需要输出超过5,000个token的内容，并使用专家定义或验证的量表进行指导。数据集包含了公开发布的长形式结构化任务。

ExpertLongBench is a multi-domain benchmark for evaluating the expert-level performance of language models on long-form, structured tasks. It includes tasks that simulate real-world expert workflows across various professional domains, each requiring outputs that exceed 5,000 tokens, guided by rubrics defined or validated by domain experts.

提供机构：

launch

5,000+

优质数据集

54 个

任务类型

进入经典数据集