llm-compe-2025-kato/step2-evaluated-dataset-Qwen3-14B
收藏Hugging Face2025-08-20 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/llm-compe-2025-kato/step2-evaluated-dataset-Qwen3-14B
下载链接
链接失效反馈官方服务:
资源简介:
完整评估数据集(Rubric + LogP)包含使用综合Rubric评估和LogP评估的链式思维解释。数据集来源于llm-compe-2025-kato/step2-evaluated-dataset-Qwen3-14B,共有156个样本,其中135个样本成功通过Rubric评估,21个样本评估失败。评估模型为Qwen/Qwen3-32B。Rubric评估和LogP评估的统计数据、评估方法、以及数据集结构都有详细描述。
The Complete Evaluation Dataset (Rubric + LogP) contains chain-of-thought explanations that have been evaluated using both comprehensive rubric assessment and LogP evaluation. The dataset is sourced from llm-compe-2025-kato/step2-evaluated-dataset-Qwen3-14B, with a total of 156 samples, 135 of which were successfully evaluated using the rubric, and 21 that failed the evaluation. The evaluation model used is Qwen/Qwen3-32B. Detailed statistics on the rubric and LogP evaluations, evaluation methods, and the structure of the dataset are provided.
提供机构:
llm-compe-2025-kato



