five

yale-nlp/ReIFE

收藏
Hugging Face2024-10-10 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/yale-nlp/ReIFE
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 configs: - config_name: src data_files: - split: llmbar_natural path: "src_llmbar_natural.json" - split: llmbar_adversarial path: "src_llmbar_adversarial.json" - split: mtbench path: "src_mtbench.json" - split: instrusum path: "src_instrusum.json" - config_name: predictions data_files: - split: llmbar_natural path: "llmbar_natural.jsonl" - split: llmbar_adversarial path: "llmbar_adversarial.jsonl" - split: mtbench path: "mtbench.jsonl" - split: instrusum path: "instrusum.jsonl" --- # ReIFE This dataset contains the evaluation result collection for our work ["ReIFE: Re-evaluating Instruction-Following Evaluation"](https://arxiv.org/abs/2410.07069). It contains two subsets: `src` and `predictions`. The `src` subset contains the source datasets for evaluating LLM-evaluators. The `predictions` subset contains the evaluation results of the LLM-evaluators. The source datasets are from the following previous works (please cite them if you use the datasets): - [LLMBar](https://github.com/princeton-nlp/LLMBar?tab=readme-ov-file#hugging-face-datasets) - [MTBench](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge#datasets) - [InstruSum](https://github.com/yale-nlp/InstruSum?tab=readme-ov-file#benchmark-dataset) The `predictions` subset contains the evaluation results of the 450 LLM-evaluators, consisting of 25 base LLMs and 18 evaluation protocols. The evaluation results are in the JSONL format. Each line is a JSON object containing the evaluation results of an LLM-evaluator on a dataset. Please visit our GitHub repo for more details including dataset analysis: https://github.com/yale-nlp/ReIFE
提供机构:
yale-nlp
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作