波兰信息检索基准(PIRB)
收藏arXiv2024-03-11 更新2024-06-21 收录
下载链接:
https://huggingface.co/spaces/sdadas/pirb
下载链接
链接失效反馈官方服务:
资源简介:
波兰信息检索基准(PIRB)是一个综合评估框架,包含41个文本信息检索任务,涉及医学、法律、商业等多个领域。数据集不仅整合了现有数据集,还新增了10个未发表的数据集,以覆盖更广泛的主题和需求。PIRB旨在通过广泛的评估,推动波兰语信息检索技术的发展,特别是在模型泛化能力和零样本性能方面。此外,PIRB还引入了三步训练过程,以培养高效的语言特定检索器,并通过知识蒸馏、监督微调和构建稀疏-密集混合检索器来优化性能。
Polish Information Retrieval Benchmark (PIRB) is a comprehensive evaluation framework encompassing 41 text information retrieval tasks spanning multiple domains such as medicine, law, business and other fields. This benchmark not only integrates existing datasets but also adds 10 unpublished datasets to cover a wider range of topics and requirements. PIRB aims to promote the development of Polish-language information retrieval technologies through extensive evaluations, particularly with respect to model generalization ability and zero-shot performance. Furthermore, PIRB introduces a three-step training pipeline to develop efficient language-specific retrievers, and optimizes performance via knowledge distillation, supervised fine-tuning and the construction of sparse-dense hybrid retrievers.
提供机构:
国家信息处理研究所
创建时间:
2024-02-21



