sapienzanlp/LiteraryQA
收藏Hugging Face2026-01-13 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/sapienzanlp/LiteraryQA
下载链接
链接失效反馈官方服务:
资源简介:
LiteraryQA是一个专注于文学作品的长上下文问答基准数据集。该数据集源自NarrativeQA,解决了原始书籍文本、众包问答以及评估系统所用指标的问题。数据集包含来自Project Gutenberg的书籍,并通过脚本下载和预处理文本。数据格式包括文档ID、Gutenberg ID、数据集划分、书籍标题、全文、摘要、问答对以及元数据(如作者、出版日期、类型标签等)。问答对包含问题、参考答案以及问题或答案是否被修改的标志。该数据集适用于自然语言处理任务,特别是长文档叙事问答。
LiteraryQA is a long-context question-answering benchmark focusing on literary works. Derived from NarrativeQA, it addresses issues with the raw text of books, the crowdsourced QAs, and the metrics used to evaluate systems. The dataset includes books from Project Gutenberg, with a script for downloading and preprocessing the texts. The data format includes document ID, Gutenberg ID, dataset split, book title, full text, summary, QA pairs, and metadata (e.g., author, publication date, genre tags). QA pairs contain questions, reference answers, and flags indicating if questions or answers were modified. The dataset is designed for NLP tasks, particularly long-document narrative QA.
提供机构:
sapienzanlp



