llm-compe-2025-kato/step2-evaluated-dataset-Qwen3-14B

Name: llm-compe-2025-kato/step2-evaluated-dataset-Qwen3-14B
Creator: llm-compe-2025-kato
Published: 2025-08-20 14:38:34
License: 暂无描述

Hugging Face2025-08-20 更新2025-09-13 收录

下载链接：

https://hf-mirror.com/datasets/llm-compe-2025-kato/step2-evaluated-dataset-Qwen3-14B

下载链接

链接失效反馈

官方服务：

资源简介：

完整评估数据集（Rubric + LogP）包含使用综合Rubric评估和LogP评估的链式思维解释。数据集来源于llm-compe-2025-kato/step2-evaluated-dataset-Qwen3-14B，共有156个样本，其中135个样本成功通过Rubric评估，21个样本评估失败。评估模型为Qwen/Qwen3-32B。Rubric评估和LogP评估的统计数据、评估方法、以及数据集结构都有详细描述。

The Complete Evaluation Dataset (Rubric + LogP) contains chain-of-thought explanations that have been evaluated using both comprehensive rubric assessment and LogP evaluation. The dataset is sourced from llm-compe-2025-kato/step2-evaluated-dataset-Qwen3-14B, with a total of 156 samples, 135 of which were successfully evaluated using the rubric, and 21 that failed the evaluation. The evaluation model used is Qwen/Qwen3-32B. Detailed statistics on the rubric and LogP evaluations, evaluation methods, and the structure of the dataset are provided.

提供机构：

llm-compe-2025-kato

5,000+

优质数据集

54 个

任务类型

进入经典数据集