launch/ExpertLongBench
收藏Hugging Face2025-07-30 更新2025-05-31 收录
下载链接:
https://hf-mirror.com/datasets/launch/ExpertLongBench
下载链接
链接失效反馈官方服务:
资源简介:
ExpertLongBench是一个多领域基准,用于评估语言模型在长形式、结构化任务上的专家级性能。它包括模拟不同专业领域现实世界专家工作流程的任务,每个任务都需要输出超过5,000个token的内容,并使用专家定义或验证的量表进行指导。数据集包含了公开发布的长形式结构化任务。
ExpertLongBench is a multi-domain benchmark for evaluating the expert-level performance of language models on long-form, structured tasks. It includes tasks that simulate real-world expert workflows across various professional domains, each requiring outputs that exceed 5,000 tokens, guided by rubrics defined or validated by domain experts.
提供机构:
launch



