NFCorpus
收藏arXiv2025-09-30 收录
下载链接:
https://www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为NFCorpus,其特点是文档平均长度较长,这给受GPU内存限制的上下文检索方法带来了挑战。该数据集被用于评估检索增强生成技术的有效性,其限制在于文档长度会影响计算资源。在研究问题2(RQ#2)中,我们使用了数据集的20%作为子集;而在研究问题1(RQ#1)中,则使用了完整的数据集。这些任务的目的是评估检索性能。
This dataset is named NFCorpus, which is characterized by long average document lengths, posing challenges to context retrieval methods constrained by GPU memory. It is employed to evaluate the effectiveness of retrieval-augmented generation technologies, with a limitation that document length affects computational resource usage. In Research Question 2 (RQ#2), we used 20% of the dataset as a subset; while in Research Question 1 (RQ#1), we adopted the full dataset. The objective of these tasks is to assess retrieval performance.



