shisa-ai/eval-IFBench-results

Name: shisa-ai/eval-IFBench-results
Creator: shisa-ai
Published: 2026-01-06 17:39:08
License: 暂无描述

Hugging Face2026-01-06 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/shisa-ai/eval-IFBench-results

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了多种语言模型在IFBench（一个用于精确指令遵循的挑战性基准测试）上的评估结果。数据集按模型名称组织，每个模型文件夹包含模型响应文件（responses_{model-name}.jsonl）、严格评估结果文件（eval_results_strict.jsonl）和宽松评估结果文件（eval_results_loose.jsonl）。评估结果文件包含每个提示的模型响应、是否遵循所有指令、是否遵循每个指令的列表以及指令ID列表。数据集还提供了使用和贡献结果的详细指南，以及相关的引用信息。

This dataset contains evaluation results for various language models on IFBench, a challenging benchmark for precise instruction following. The dataset is organized by model name, with each model folder containing the models responses file (responses_{model-name}.jsonl), strict evaluation results file (eval_results_strict.jsonl), and loose evaluation results file (eval_results_loose.jsonl). The evaluation results files include the models response to each prompt, whether all instructions were followed, a list of whether each instruction was followed, and a list of instruction IDs. The dataset also provides detailed guidelines for usage and contributing results, as well as relevant citation information.

提供机构：

shisa-ai

5,000+

优质数据集

54 个

任务类型

进入经典数据集