InPars-v2
收藏arXiv2023-05-27 更新2024-06-21 收录
下载链接:
https://github.com/zetaalphavector/inPars/tree/master/legacy/inpars-v2
下载链接
链接失效反馈官方服务:
资源简介:
InPars-v2是由巴西神经智能公司和巴西圣保罗大学电气与计算机工程学院共同创建的数据集,专注于利用大型语言模型生成用于信息检索的合成查询-文档对。该数据集包含约10,000个高质量的查询-文档对,这些数据是通过使用开源的GPT-J模型和强大的重排序器生成的。创建过程中,首先从BEIR基准的每个数据集中抽样100,000个文档,然后为每个文档生成一个合成查询。InPars-v2数据集的应用领域主要集中在信息检索模型的训练和评估,特别是在提高模型在BEIR基准上的性能方面。
InPars-v2 is a dataset co-created by Neural Intelligence Company of Brazil and the School of Electrical and Computer Engineering, University of São Paulo, Brazil. It focuses on generating synthetic query-document pairs for information retrieval using large language models. This dataset contains approximately 10,000 high-quality query-document pairs, which are generated with the open-source GPT-J model and a powerful re-ranker. During its creation, 100,000 documents were first sampled from each dataset in the BEIR benchmark, and one synthetic query was generated for each sampled document. The InPars-v2 dataset is primarily used for training and evaluating information retrieval models, particularly to enhance the performance of such models on the BEIR benchmark.
提供机构:
巴西神经智能公司和巴西圣保罗大学电气与计算机工程学院
创建时间:
2023-01-05



