RouterEval
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/MilkThink-Lab/RouterEval
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为RouterEval,是一个专为路由器研究设计的基准测试,包含了超过2亿条性能记录,这些记录来自于12个流行的大型语言模型(LLM)评估领域,如基于知识的问答、常识推理、语义理解、数学推理以及指令遵循等,基于超过8500个大型语言模型的数据。此外,该数据集指出,目前缺乏针对路由大型语言模型的全面开源基准测试,这限制了路由器的发展。其规模之大,拥有超过2亿条性能记录,其任务是评估路由大型语言模型的性能。
This dataset, named RouterEval, is a benchmark specifically tailored for router research. It contains over 200 million performance records derived from evaluations of more than 8500 large language models (LLMs) across 12 popular LLM evaluation domains, including knowledge-based question answering, common sense reasoning, semantic understanding, mathematical reasoning, instruction following, and more. Currently, there is a shortage of comprehensive open-source benchmarks for routing LLMs, which constrains the advancement of routing technologies. With its large scale boasting over 200 million performance records, RouterEval aims to evaluate the performance of routing LLMs.
提供机构:
MilkThink Lab



