team-suzuki/hle-extract-qwen3235ba22b-20250815
收藏Hugging Face2025-08-18 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/team-suzuki/hle-extract-qwen3235ba22b-20250815
下载链接
链接失效反馈官方服务:
资源简介:
这个数据集包含了完整的人类水平评估(HLE)基准和Qwen/Qwen3-235B-A22B模型的详细评估结果。它整合了原始的team-suzuki/hle-extract数据集,并包含了全面的模型响应和人类判断。数据集包括120个问题,其中103个问题已经评估,17个问题未评估。数据集按类别划分的评估表现详细展示了不同学科领域的准确度。数据集以多种格式提供,适用于不同的研究场景。
This dataset contains the complete Human-Level Evaluation (HLE) benchmark with detailed evaluation results from the Qwen/Qwen3-235B-A22B model. It merges the original team-suzuki/hle-extract dataset with comprehensive model responses and human judgments. The dataset includes 120 questions, of which 103 have been evaluated and 17 are unevaluated. The performance by category is detailed, showing the accuracy across different subject areas. The dataset is provided in multiple formats, suitable for various research purposes.
提供机构:
team-suzuki



