Cooolder/SCOPE-70K
收藏Hugging Face2025-12-18 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Cooolder/SCOPE-70K
下载链接
链接失效反馈官方服务:
资源简介:
SCOPE-70K是一个用于评估大型语言模型在多个领域和难度级别上的综合基准数据集。该数据集包含来自5个不同基准测试的68,653个问题,并评估了13个最先进的模型。数据集支持英文和中文两种语言,任务类型包括多项选择和简答题。数据集结构包括训练集(62,127个问题)、测试集(3,276个问题)和锚点集(3,250个问题),每个模型都有独立的配置。数据字段包括问题ID、问题文本、正确答案、模型响应、是否正确、推理模式等多个方面。数据集创建基于中国高考、GPQA、LPFQA、MMLU-Pro和R-Bench等多个已建立的基准测试,所有模型均通过OpenRouter API进行评估。
SCOPE-70K is a comprehensive benchmark dataset for evaluating large language models across multiple domains and difficulty levels. This dataset contains 68,653 questions from 5 different benchmarks, with evaluations from 13 state-of-the-art models. The dataset supports both English and Chinese languages, with task types including multiple choice and short answer. The dataset structure includes train set (62,127 questions), test set (3,276 questions), and anchor set (3,250 questions), with each model having its own configuration. Data fields include question ID, question text, ground truth answer, model response, correctness, reasoning mode, and more. The dataset is created based on several established benchmarks including Chinese Gaokao, GPQA, LPFQA, MMLU-Pro, and R-Bench, with all models evaluated using OpenRouter API.
提供机构:
Cooolder



