jfkback/crumb
收藏Hugging Face2025-09-11 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/jfkback/crumb
下载链接
链接失效反馈官方服务:
资源简介:
CRUMB是一个多样化的、真实的基准测试数据集,旨在评估信息检索模型在复杂、多方面的搜索任务上的能力。它包括八个精心策划的检索任务,每个任务都有多个组件或要求(即复杂),与许多现有的评估集合和基准不同。数据集包含八个不同的任务,涵盖法律问答、临床试验、代码、科学论文等领域。查询是自然的,包含多个约束或要求。文档以统一的Markdown格式提供,并具有上下文化的分块,以保留文档结构。数据集分为两个版本:段落版本(用于标准检索器)和完整文档版本(用于长上下文模型)。每个任务都包括一个开发集,以启用调整和少量提示方法。
CRUMB is a diverse and realistic benchmark designed to evaluate the capabilities of information retrieval models on complex, multi-aspect search tasks. It consists of eight meticulously curated retrieval tasks that have multiple components or requirements (i.e. are complex) unlike many common existing evaluation collections and benchmarks. The dataset includes eight different tasks, covering domains such as legal QA, clinical trials, code, scientific papers, and more. Queries are natural and contain multiple constraints or requirements. Documents are provided in a unified Markdown format, with contextualized chunking to preserve document structure. The dataset is available in both passage (chunked) and full-document versions to evaluate models on both standard retrievers and long-context models. Each task includes a development set to enable tuning and few-shot prompting approaches.
提供机构:
jfkback



