HotpotQA
收藏arXiv2025-09-30 收录
下载链接:
https://hotpotqa.github.io/
下载链接
链接失效反馈官方服务:
资源简介:
该数据集被用作评估ReAct代理在处理任务时的性能基准,通过将多个大型语言模型(LLM)调用与诸如网络搜索等行动交替进行。此外,该数据集还用于反映原始ReAct代理框架论文中所描述的设置。其所涉及的任务为问答。
This dataset serves as a performance benchmark for evaluating ReAct Agents during task processing, by interleaving multiple invocations of large language models (LLMs) with actions such as web search. Additionally, this dataset is used to replicate the experimental setup described in the original ReAct Agent framework paper, and the tasks involved are question answering.



