CostOfPass/benchmark
收藏Hugging Face2025-04-23 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/CostOfPass/benchmark
下载链接
链接失效反馈官方服务:
资源简介:
Cost-of-Pass是一个用于评估语言模型性能的经济框架,该数据集包含论文中评估的基准记录。数据集分为full_records和metric_records两部分,前者包含模型运行的原始记录,后者包含使用特定指标的记录评估。数据集按照模型名称、任务名称和推理时间方法组织,包含了模型名称、任务名称、输入索引、答案等详细信息。
Cost-of-Pass is an economic framework for evaluating the performance of language models. This dataset contains benchmark records evaluated in our paper. It is divided into two parts: full_records and metric_records, with the former containing raw records of model runs and the latter containing record evaluations using specific metrics. The dataset is organized by model name, task name, and inference time method, and includes details such as model name, task name, input index, and answer.
提供机构:
CostOfPass



