five

flowaicom/HaluEval

收藏
Hugging Face2024-09-14 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/flowaicom/HaluEval
下载链接
链接失效反馈
官方服务:
资源简介:
HaluEval是HaluBench的一个子集,由Patronus AI创建并提供。该数据集原始发布在论文《HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models》中。数据集包含 passages、questions 和 answers,以及判断这些answers是否基于文档上下文的支持的评价标签。在预处理过程中,将原始的幻觉标签映射为0(表示失败或有幻觉)和1(表示通过或无幻觉)。数据集的评价标准和量表与论文《Lynx: An Open Source Hallucination Evaluation Model》中使用的标准一致。

This dataset contains the HaluEval subset of HaluBench, created by Patronus AI and available from [PatronusAI/HaluBench](https://huggingface.co/datasets/PatronusAI/HaluBench). The dataset was originally published in the paper _[HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models](https://arxiv.org/abs/2305.11747)_. It includes passages, questions, and answers along with evaluation labels that judge whether these answers are supported by the context provided in the document. During preprocessing, the original hallucination labels were mapped to 0 (indicating failure or hallucination) and 1 (indicating pass or no hallucination). The evaluation criteria and rubric are aligned with those used in the paper _[Lynx: An Open Source Hallucination Evaluation Model](https://arxiv.org/abs/2407.08488)_.
提供机构:
flowaicom
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作