SealQA
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/vtllms/sealqa
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为SealQA,是一个旨在评估搜索引擎增强型语言模型在处理寻求事实性问题时的挑战性基准。特别是在网络搜索结果存在冲突或噪音的情况下,SealQA能够对模型的推理能力和事实准确性进行考验。该数据集包含三种类型:Seal-0、Seal-Hard和LongSeal,分别设计用来测试不同方面的推理能力和事实准确性。数据集中的问题平均长度为31个标记,覆盖了包括科学、体育、娱乐、政治和历史在内的多个领域。其任务在于评估语言模型在处理含噪声搜索结果时的推理能力和鲁棒性。
Named SealQA, this dataset is a challenging benchmark designed to evaluate search-augmented language models when addressing factual questions. Specifically, SealQA tests models' reasoning capabilities and factual accuracy in scenarios where web search results contain conflicts or noise. The dataset includes three variants: Seal-0, Seal-Hard, and LongSeal, each tailored to evaluate distinct aspects of reasoning and factual accuracy. The questions in the dataset have an average length of 31 tokens, spanning multiple domains including science, sports, entertainment, politics, and history. The core task of this benchmark is to assess the reasoning ability and robustness of language models when processing noisy search results.
提供机构:
NLP researchers involved in the study and their colleagues.
搜集汇总
背景与挑战
背景概述
SealQA是一个用于评估搜索引擎增强型语言模型在处理事实性问题时的挑战性基准,特别针对网络搜索结果存在冲突或噪音的场景。它包含三种类型(Seal-0、Seal-Hard和LongSeal),覆盖多领域问题,旨在测试模型的推理能力、事实准确性和鲁棒性。
以上内容由遇见数据集搜集并总结生成



