JaneDing2025/LLMEval
收藏Hugging Face2025-11-04 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/JaneDing2025/LLMEval
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是包含GPT-5、Gemini-2.5-Pro和Claude-4.5-Sonnet三种模型评估结果的集合,评估基于MMLU-Pro、GPQA、MATH-500和MMMU-Pro等多个数据集。每个数据集都有详细的splits信息,并按照统一的特征字段进行标准化,包括样本索引、输入文本、答案以及模型的响应和正确性。数据集还提供了模型的生成设置和在不同数据集上的准确率。
This dataset is a collection of evaluation results for three models: GPT-5, Gemini-2.5-Pro, and Claude-4.5-Sonnet, based on multiple datasets including MMLU-Pro, GPQA, MATH-500, and MMMU-Pro. Each dataset has detailed split information and is standardized according to unified feature fields, including sample index, input text, answer, and model responses and correctness. The dataset also provides model generation settings and accuracy results on different datasets.
提供机构:
JaneDing2025



