yale-nlp/ReIFE

Name: yale-nlp/ReIFE
Creator: yale-nlp
Published: 2024-10-10 04:12:08
License: 暂无描述

Hugging Face2024-10-10 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/yale-nlp/ReIFE

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 configs: - config_name: src data_files: - split: llmbar_natural path: "src_llmbar_natural.json" - split: llmbar_adversarial path: "src_llmbar_adversarial.json" - split: mtbench path: "src_mtbench.json" - split: instrusum path: "src_instrusum.json" - config_name: predictions data_files: - split: llmbar_natural path: "llmbar_natural.jsonl" - split: llmbar_adversarial path: "llmbar_adversarial.jsonl" - split: mtbench path: "mtbench.jsonl" - split: instrusum path: "instrusum.jsonl" --- # ReIFE This dataset contains the evaluation result collection for our work ["ReIFE: Re-evaluating Instruction-Following Evaluation"](https://arxiv.org/abs/2410.07069). It contains two subsets: `src` and `predictions`. The `src` subset contains the source datasets for evaluating LLM-evaluators. The `predictions` subset contains the evaluation results of the LLM-evaluators. The source datasets are from the following previous works (please cite them if you use the datasets): - [LLMBar](https://github.com/princeton-nlp/LLMBar?tab=readme-ov-file#hugging-face-datasets) - [MTBench](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge#datasets) - [InstruSum](https://github.com/yale-nlp/InstruSum?tab=readme-ov-file#benchmark-dataset) The `predictions` subset contains the evaluation results of the 450 LLM-evaluators, consisting of 25 base LLMs and 18 evaluation protocols. The evaluation results are in the JSONL format. Each line is a JSON object containing the evaluation results of an LLM-evaluator on a dataset. Please visit our GitHub repo for more details including dataset analysis: https://github.com/yale-nlp/ReIFE

提供机构：

yale-nlp

5,000+

优质数据集

54 个

任务类型

进入经典数据集