categories-benchmark-eval

Name: categories-benchmark-eval
Creator: maas
Published: 2025-12-05 12:04:39
License: 暂无描述

魔搭社区2025-12-05 更新2025-05-03 收录

下载链接：

https://modelscope.cn/datasets/lmarena-ai/categories-benchmark-eval

下载链接

链接失效反馈

官方服务：

资源简介：

Within each bench there are - data folder - files contain the candidate models' labels in the field `"category_tag"`. - ground truth files. - contain ground truth labels by larger models in the field `"label"` - vote files contain labels calculated from the votes of `claude-3-7-sonnet`, `deepseek-r1`, `gemini-2.0-pro`, and `gpt-4.5` - vote_2 : true if >= 2 models vote true - vote_3 : true if >= 3 models vote true - individual votes are specified in the field `"votes"`

每个基准分组内包含以下内容： - 数据文件夹 - 其内文件存储了候选模型的标签，相关标签位于`"category_tag"`字段中。 - 真值标签文件 - 其内包含由大模型生成的真值标签，相关标签位于`"label"`字段中。 - 投票文件包含由`claude-3-7-sonnet`、`deepseek-r1`、`gemini-2.0-pro`及`gpt-4.5`的投票结果计算得到的标签： - `vote_2`：当至少有2个模型投票为真时，该字段取值为真 - `vote_3`：当至少有3个模型投票为真时，该字段取值为真 - 各模型的单独投票结果存储于`"votes"`字段中。

提供机构：

maas

创建时间：

2025-04-21

搜集汇总

数据集介绍