Magic Mushroom
收藏arXiv2025-06-04 更新2025-11-28 收录
下载链接:
https://drive.google.com/file/d/1aP5kyPuk4L-L_uoI6T9UhxuTyt8oMqjT/view
下载链接
链接失效反馈官方服务:
资源简介:
Magic Mushroom是一个用于评估在复杂检索噪声下LLM鲁棒性的可控测试平台。该数据集由7,468个单跳和3,925个多跳问答对组成,每个问答对都与金标准和多种噪声文档配对,比之前的基准测试更复杂。Magic Mushroom允许研究人员根据特定的研究目标或应用场景灵活配置检索噪声的组合,从而实现高度控制的评估设置。
Magic Mushroom is a controllable testbed for evaluating the robustness of Large Language Models (LLMs) under complex retrieval noise. This dataset comprises 7,468 single-hop and 3,925 multi-hop question-answer pairs, each paired with a gold-standard reference and multiple noisy documents, making it more complex than prior benchmark datasets. Magic Mushroom allows researchers to flexibly configure combinations of retrieval noises based on specific research objectives or application scenarios, thereby enabling highly controlled evaluation settings.
提供机构:
东南大学计算机科学与工程学院,中国南京211189
创建时间:
2025-06-04



