five

WikiAtomic

收藏
arXiv2025-09-30 收录
下载链接:
https://huggingface.com/datasets/wikipedia
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为WikiAtomic,包含了来自维基百科的200篇高质量文章,这些文章被分解为原子句子,以适应开放性问答任务的需求。此外,该数据集使得研究者能够深入分析大型语言模型如何运用上下文和参数化知识。规模上,它包含了10,000个原子句子(即每篇文章平均含有50个原子句子的200篇文章)。该数据集的任务类型是开放性问答。

This dataset, named WikiAtomic, contains 200 high-quality articles sourced from Wikipedia. These articles have been decomposed into atomic sentences to accommodate open-ended question answering tasks. Furthermore, this dataset enables researchers to conduct in-depth analyses of how large language models utilize contextual and parametric knowledge. In terms of scale, it encompasses 10,000 atomic sentences, corresponding to 200 articles with an average of 50 atomic sentences per article. The task type supported by this dataset is open-ended question answering.
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作