llm-semantic-router/longcontext-haldetect

Name: llm-semantic-router/longcontext-haldetect
Creator: llm-semantic-router
Published: 2026-01-09 02:42:04
License: 暂无描述

Hugging Face2026-01-09 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/llm-semantic-router/longcontext-haldetect

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个合成的基准数据集，用于评估长文档（8K-24K tokens）上的幻觉检测模型。数据集专门设计用于测试能够处理超过典型8K token限制的上下文的模型。数据集包含3,366个样本，其中49.9%为幻觉样本，50.1%为支持样本。样本来源于NarrativeQA（故事和电影剧本）、GovReport（政府报告）和QuALITY（文章和故事）。幻觉类型包括明显无根据信息、明显冲突和微妙无根据信息。数据集的目的是解决标准基准测试过短、8K模型在处理长文档时丢失关键上下文以及需要长上下文评估的问题。数据生成流程包括源过滤、答案生成、幻觉注入和跨度注释。数据格式为JSON，包含ID、提示、答案、标签等信息。

A synthetic benchmark dataset for evaluating hallucination detection models on long documents (8K-24K tokens). This dataset is specifically designed to test models that can handle contexts beyond the typical 8K token limit. The dataset contains 3,366 samples, with 49.9% being hallucinated and 50.1% supported. Samples are sourced from NarrativeQA (stories and movie scripts), GovReport (government reports), and QuALITY (articles and stories). Hallucination types include Evident Baseless Info, Evident Conflict, and Subtle Baseless Info. The dataset addresses the gaps of standard benchmarks being too short, 8K models truncating long documents, and the need for long-context evaluation. The generation pipeline involves source filtering, answer generation, hallucination injection, and span annotation. The data format is JSON, containing ID, prompt, answer, labels, and other information.

提供机构：

llm-semantic-router

5,000+

优质数据集

54 个

任务类型

进入经典数据集