KRLabsOrg/acl-verbatim-spans
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/KRLabsOrg/acl-verbatim-spans
下载链接
链接失效反馈官方服务:
资源简介:
ACL-Verbatim Span Dataset是一个用于在ACL Anthology论文上进行查询条件提取证据选择的数据集。该版本结合了:一个带有手动跨度注释的黄金测试基准,以及一个由合成问题、检索和基于LLM的跨度注释产生的更大的银训练集。数据集旨在支持那些给定问题和检索到的论文块后,必须在块中识别支持证据的系统。典型用途包括训练用于语义高亮的令牌分类器、评估跨度提取器和证据选择器、比较LLM教师、令牌级学生和句子选择基线,以及研究科学文本中的段落级证据提取。数据集包含canonical和encoder两种配置,分别用于分析、评估和下游重用,以及直接用于Hugging Face `transformers`的令牌分类训练。
The ACL-Verbatim Span Dataset is a dataset for query-conditioned extractive evidence selection over papers from the ACL Anthology. The release combines a gold test benchmark with manual span annotations and a larger silver training set produced from synthetic questions, retrieval, and LLM-based span annotation. The dataset is intended for systems that, given a question and a retrieved paper chunk, must identify the supporting evidence verbatim in the chunk. Typical uses include training token classifiers for semantic highlighting, evaluating span extractors and evidence selectors, comparing LLM teachers, token-level students, and sentence-selection baselines, and studying paragraph-scale evidence extraction in scientific text. The dataset includes canonical and encoder configs, intended for analysis, evaluation, and downstream reuse, and for direct encoder training with Hugging Face `transformers`, respectively.
提供机构:
KRLabsOrg



