WikiAtomic
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.com/datasets/wikipedia
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为WikiAtomic,包含了来自维基百科的200篇高质量文章,这些文章被分解为原子句子,以适应开放性问答任务的需求。此外,该数据集使得研究者能够深入分析大型语言模型如何运用上下文和参数化知识。规模上,它包含了10,000个原子句子(即每篇文章平均含有50个原子句子的200篇文章)。该数据集的任务类型是开放性问答。
This dataset, named WikiAtomic, contains 200 high-quality articles sourced from Wikipedia. These articles have been decomposed into atomic sentences to accommodate open-ended question answering tasks. Furthermore, this dataset enables researchers to conduct in-depth analyses of how large language models utilize contextual and parametric knowledge. In terms of scale, it encompasses 10,000 atomic sentences, corresponding to 200 articles with an average of 50 atomic sentences per article. The task type supported by this dataset is open-ended question answering.



