From Answer to Origin: Page Number Grounding in Document-Level Question Answering
收藏DataCite Commons2025-09-28 更新2026-05-03 收录
下载链接:
http://dataverse.jpl.nasa.gov/citation?persistentId=doi:10.48577/jpl.7EMHTW
下载链接
链接失效反馈官方服务:
资源简介:
We introduce PageLocQA, a retrieval- augmented generation (RAG) framework that integrates agentic reasoning and source attribution for question answering over long technical documents. Unlike existing approaches that either lack source grounding or rely solely on top-k context retrieval, PageLocQA dynamically selects tools based on question types and performs multi-step reasoning to generate semantically accurate answers along with page-level citations. We construct a domain-specific evaluation dataset using the MODTRAN 6 User Manual, comprising expert-verified question–answer pairs and associated source page numbers. Evaluated against state-of-the-art baselines including GPT-3.5, Gemini-1.5, ChatQA-1.5, and SelfRAG, our method achieves the highest BERTScore F1 (88.8%) and page attribution accuracy (95.0%). Ablation studies reveal that PageLocQA significantly outperforms both traditional LLM and RAG pipelines in both answer quality and response speed. Our results highlight the importance of structured retrieval and attribution in building transparent and trustworthy scientific QA systems.
提供机构:
Root
创建时间:
2025-09-28



