WildIFEval
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/gililior/wild-if-eval
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个包含12,000条真实用户指令的大规模集合,这些指令具有多样化的多限制条件,旨在评估大型语言模型在遵循现实世界中的多限制指令方面的能力。该数据集包含了11,813项真实世界的受限生成任务,每个任务都附有一系列限制条件的注释。这些数据经过三步流程的精心筛选和整理,包括过滤、任务策划和限制条件分解。规模上,数据集拥有12,000条指令和29,874个独特的限制条件,任务内容是遵循具有多限制条件下的指令。
This dataset is a large-scale collection of 12,000 real-world user instructions with diverse multi-constraints, which is designed to evaluate the ability of large language models (LLMs) to follow real-world multi-constraint instructions. It encompasses 11,813 real-world constrained generation tasks, each annotated with a set of corresponding constraints. The data was carefully curated and refined via a three-stage workflow including filtering, task planning, and constraint decomposition. In terms of scale, the dataset consists of 12,000 instructions and 29,874 unique constraints, with the core evaluation task focusing on following instructions with multiple constraints.
提供机构:
Authors of the paper



