shisa-ai/eval-IFBench-results
收藏Hugging Face2026-01-06 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/shisa-ai/eval-IFBench-results
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了多种语言模型在IFBench(一个用于精确指令遵循的挑战性基准测试)上的评估结果。数据集按模型名称组织,每个模型文件夹包含模型响应文件(responses_{model-name}.jsonl)、严格评估结果文件(eval_results_strict.jsonl)和宽松评估结果文件(eval_results_loose.jsonl)。评估结果文件包含每个提示的模型响应、是否遵循所有指令、是否遵循每个指令的列表以及指令ID列表。数据集还提供了使用和贡献结果的详细指南,以及相关的引用信息。
This dataset contains evaluation results for various language models on IFBench, a challenging benchmark for precise instruction following. The dataset is organized by model name, with each model folder containing the models responses file (responses_{model-name}.jsonl), strict evaluation results file (eval_results_strict.jsonl), and loose evaluation results file (eval_results_loose.jsonl). The evaluation results files include the models response to each prompt, whether all instructions were followed, a list of whether each instruction was followed, and a list of instruction IDs. The dataset also provides detailed guidelines for usage and contributing results, as well as relevant citation information.
提供机构:
shisa-ai



