five

llm-semantic-router/longcontext-haldetect

收藏
Hugging Face2026-01-09 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/llm-semantic-router/longcontext-haldetect
下载链接
链接失效反馈
官方服务:
资源简介:
这是一个合成的基准数据集,用于评估长文档(8K-24K tokens)上的幻觉检测模型。数据集专门设计用于测试能够处理超过典型8K token限制的上下文的模型。数据集包含3,366个样本,其中49.9%为幻觉样本,50.1%为支持样本。样本来源于NarrativeQA(故事和电影剧本)、GovReport(政府报告)和QuALITY(文章和故事)。幻觉类型包括明显无根据信息、明显冲突和微妙无根据信息。数据集的目的是解决标准基准测试过短、8K模型在处理长文档时丢失关键上下文以及需要长上下文评估的问题。数据生成流程包括源过滤、答案生成、幻觉注入和跨度注释。数据格式为JSON,包含ID、提示、答案、标签等信息。

A synthetic benchmark dataset for evaluating hallucination detection models on long documents (8K-24K tokens). This dataset is specifically designed to test models that can handle contexts beyond the typical 8K token limit. The dataset contains 3,366 samples, with 49.9% being hallucinated and 50.1% supported. Samples are sourced from NarrativeQA (stories and movie scripts), GovReport (government reports), and QuALITY (articles and stories). Hallucination types include Evident Baseless Info, Evident Conflict, and Subtle Baseless Info. The dataset addresses the gaps of standard benchmarks being too short, 8K models truncating long documents, and the need for long-context evaluation. The generation pipeline involves source filtering, answer generation, hallucination injection, and span annotation. The data format is JSON, containing ID, prompt, answer, labels, and other information.
提供机构:
llm-semantic-router
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作