five

shisa-ai/eval-IFBench-results

收藏
Hugging Face2026-01-06 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/shisa-ai/eval-IFBench-results
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了多种语言模型在IFBench(一个用于精确指令遵循的挑战性基准测试)上的评估结果。数据集按模型名称组织,每个模型文件夹包含模型响应文件(responses_{model-name}.jsonl)、严格评估结果文件(eval_results_strict.jsonl)和宽松评估结果文件(eval_results_loose.jsonl)。评估结果文件包含每个提示的模型响应、是否遵循所有指令、是否遵循每个指令的列表以及指令ID列表。数据集还提供了使用和贡献结果的详细指南,以及相关的引用信息。

This dataset contains evaluation results for various language models on IFBench, a challenging benchmark for precise instruction following. The dataset is organized by model name, with each model folder containing the models responses file (responses_{model-name}.jsonl), strict evaluation results file (eval_results_strict.jsonl), and loose evaluation results file (eval_results_loose.jsonl). The evaluation results files include the models response to each prompt, whether all instructions were followed, a list of whether each instruction was followed, and a list of instruction IDs. The dataset also provides detailed guidelines for usage and contributing results, as well as relevant citation information.
提供机构:
shisa-ai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作