BIRCO
收藏arXiv2024-04-04 更新2024-06-21 收录
下载链接:
https://github.com/BIRCO-benchmark/BIRCO
下载链接
链接失效反馈官方服务:
资源简介:
BIRCO是一个用于评估大型语言模型(LLM)在复杂目标下信息检索任务性能的基准。它包含5个开放源代码数据集,分别涉及计算机科学、辩论、文学和生物医学等领域。这些数据集中的查询具有多面性任务目标,旨在测试模型处理复杂用户搜索需求的能力。BIRCO通过模块化框架研究可能影响LLM在检索任务上性能的因素,并识别出与现有方法相比表现出色的简单基线模型。该基准强调了开发超越基于相似性的检索方法的必要性,以满足复杂用户意图的需求。
BIRCO is a benchmark for evaluating the performance of Large Language Models (LLMs) on information retrieval tasks with complex objectives. It comprises 5 open-source datasets covering domains such as computer science, debate, literature, and biomedicine. The queries in these datasets feature multifaceted task objectives, designed to test models' ability to handle complex user search needs. BIRCO employs a modular framework to investigate factors that may impact LLMs' performance on retrieval tasks, and to identify simple baseline models that outperform existing methods. This benchmark emphasizes the need to develop retrieval methods that go beyond similarity-based approaches to meet the demands of complex user intents.
提供机构:
加州大学圣地亚哥分校新兴智能实验室
创建时间:
2024-02-22



