sam234990/ADC
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/sam234990/ADC
下载链接
链接失效反馈官方服务:
资源简介:
---
license: odc-by
task_categories:
- question-answering
language:
- en
pretty_name: ADC
size_categories:
- 1K<n<10K
configs:
- config_name: questions
default: true
data_files:
- split: train
path: questions.jsonl
- config_name: corpus
data_files:
- split: train
path: corpus.jsonl
---
# ADC: Abstractive Document Comprehension
ADC is a dataset and benchmark for document-level, synthesis-heavy question answering in retrieval-augmented generation systems.
It evaluates whether a system can read long documents, organize evidence, and produce grounded abstractive answers, rather than only retrieve short facts.
## Dataset Summary
- **869 questions** in total
- **5 task types**: `Single-Sum`, `Pair-Comp`, `Multi-Comp`, `Enum`, and `Temp`
- **2 source domains**: academic documents and news documents
- **3 retrieval scopes** used in the benchmark: `Simple`, `Middle`, and `Hard`
## Dataset Files
- `corpus.jsonl`: source documents used for retrieval and evidence grounding
- `questions.jsonl`: abstractive question-answer pairs with topic-set style answers and benchmark metadata
## Task Types
- **Single-Sum**: summarize a single document into a compact grounded answer
- **Pair-Comp**: compare two documents, methods, entities, or events
- **Multi-Comp**: synthesize comparisons across multiple targets
- **Enum**: enumerate key items, themes, findings, or contributions
- **Temp**: reconstruct temporally evolving events over a time window
## Data Sources
ADC is constructed from publicly available sources, including arXiv, OpenReview, and news articles collected through `mediastack.com`.
## License
This dataset is released under the Open Data Commons Attribution License (`ODC-By`).
The `ODC-By` license applies to the dataset annotations, organization, metadata, and benchmark construction.
Original source documents remain subject to their respective licenses and terms of use.
Users are responsible for complying with the original licenses and source-specific usage terms when using the source content.
## Citation
Citation information will be added when the paper is released.
提供机构:
sam234990



