summary-of-a-haystack
收藏数据集概述
基本信息
- 许可证:Apache 2.0
- 任务类别:摘要生成
- 语言:英语
- 数据集名称:SummHay
数据结构
数据集包含10个Haystacks(5个在对话领域,5个在新闻领域)。每个示例的格式如下: json { "topic_id": "ObjectId()", "topic": "", "topic_metadata": {"participants": []}, // 领域特定信息 "subtopics": [ { "subtopic_id": "ObjectId()", "subtopic_name": "", "subtopic": "", "insights": [ { "insight_id": "ObjectId()", "insight_name": "", "insight": "" } ], "query": "子主题的问题重构", "retriever": { "retriever_method": { "document_id": "0|1" } }, "summaries": { "summarization_method_xyz": ["line1", "line2", "line3"], "{retriever}-{llm_summarizer}": ["line1", "line2", "line3"], "summarization_method_abc": ["line1", "line2", "line3"] }, "eval_summaries": { "summarization_method_xyz": [ { "insight_id": "", "coverage": "NO_COVERAGE|PARTIAL_COVERAGE|FULL_COVERAGE", "bullet_id": "line_number" } ] } } ], "documents": [ { "document_id": "ObjectId()", "document_text": "", "document_metadata": [], // 领域特定信息 "insights_included": [] // 包含的insight_ids列表 } ] }
引用
plaintext @article{laban2024SummHay, title={Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems}, author={Laban, Philippe and Fabbri, Alexander R and Xiong, Caiming and Wu, Chien-Sheng}, journal={arXiv preprint arXiv:https://arxiv.org/pdf/2407.01370}, year={2024} }




