withmartian/routerbench
收藏Hugging Face2024-03-27 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/withmartian/routerbench
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- text-generation
- question-answering
language:
- en
tags:
- code
pretty_name: RouterBench
size_categories:
- 10K<n<100K
---
RouterBench is a dataset comprising of over 30000 prompts and the responses from 11 different LLMs, with the prompts taken from standard benchmarks such as MBPP, GSM-8k, Winogrande, Hellaswag, MMLU, MT-Bench, and more.
The data includes the prompt, the model response, the estimated cost associated with that response, and a performance score to answer if the model got the answer correct. All prompts have a correct answer that the LLM generation
is compared against. These datasets are designed to be used with Martian's [routerbench](https://github.com/withmartian/alt-routing-methods/tree/public-productionize) package for training and evaluating various model routing
methods.
There are two versions of the dataset, one where there is 5-shot generation, and one with 0-shot results. Both datasets can be used with the `routerbench` package individually or in combination.
提供机构:
withmartian
原始信息汇总
RouterBench 数据集概述
任务类别
- 文本生成
- 问答
语言
- 英语
标签
- 代码
数据集名称
- RouterBench
数据集大小
- 10K<n<100K
数据集描述
RouterBench 数据集包含超过 30000 个提示及其来自 11 种不同大型语言模型(LLMs)的响应。这些提示来自多个标准基准,如 MBPP、GSM-8k、Winogrande、Hellaswag、MMLU、MT-Bench 等。数据集包括提示、模型响应、与响应相关的估计成本以及一个性能评分,用于判断模型是否正确回答了问题。所有提示都有一个正确答案,用于与 LLM 生成的答案进行比较。这些数据集旨在与 Martian 的 routerbench 包一起使用,用于训练和评估各种模型路由方法。
数据集版本
- 5-shot 生成版本
- 0-shot 结果版本
这两个版本的数据集可以单独或组合使用 routerbench 包。



