five

sam234990/ADC

收藏
Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/sam234990/ADC
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: odc-by task_categories: - question-answering language: - en pretty_name: ADC size_categories: - 1K<n<10K configs: - config_name: questions default: true data_files: - split: train path: questions.jsonl - config_name: corpus data_files: - split: train path: corpus.jsonl --- # ADC: Abstractive Document Comprehension ADC is a dataset and benchmark for document-level, synthesis-heavy question answering in retrieval-augmented generation systems. It evaluates whether a system can read long documents, organize evidence, and produce grounded abstractive answers, rather than only retrieve short facts. ## Dataset Summary - **869 questions** in total - **5 task types**: `Single-Sum`, `Pair-Comp`, `Multi-Comp`, `Enum`, and `Temp` - **2 source domains**: academic documents and news documents - **3 retrieval scopes** used in the benchmark: `Simple`, `Middle`, and `Hard` ## Dataset Files - `corpus.jsonl`: source documents used for retrieval and evidence grounding - `questions.jsonl`: abstractive question-answer pairs with topic-set style answers and benchmark metadata ## Task Types - **Single-Sum**: summarize a single document into a compact grounded answer - **Pair-Comp**: compare two documents, methods, entities, or events - **Multi-Comp**: synthesize comparisons across multiple targets - **Enum**: enumerate key items, themes, findings, or contributions - **Temp**: reconstruct temporally evolving events over a time window ## Data Sources ADC is constructed from publicly available sources, including arXiv, OpenReview, and news articles collected through `mediastack.com`. ## License This dataset is released under the Open Data Commons Attribution License (`ODC-By`). The `ODC-By` license applies to the dataset annotations, organization, metadata, and benchmark construction. Original source documents remain subject to their respective licenses and terms of use. Users are responsible for complying with the original licenses and source-specific usage terms when using the source content. ## Citation Citation information will be added when the paper is released.
提供机构:
sam234990
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作