wangyuwei111/simpleqa-verified
收藏Hugging Face2025-12-16 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/wangyuwei111/simpleqa-verified
下载链接
链接失效反馈官方服务:
资源简介:
SimpleQA Verified是由Google DeepMind和Google Research设计的一个包含1,000个提示的事实性基准数据集,用于可靠评估大型语言模型(LLMs)的短格式事实性和参数知识。每个示例包含以下字段:原始索引(original_index)、问题(problem)、黄金答案(answer)、主题(topic)和答案类型(answer_type)分类、两个额外的元数据字段(multi_step和requires_reasoning)以及支持黄金答案的URL列表(urls)。该数据集旨在为研究社区提供一个更精确的工具,以跟踪事实性方面的真正进展,并促进更可信的AI系统的开发。
SimpleQA Verified is a 1,000-prompt benchmark for reliably evaluating Large Language Models (LLMs) on short-form factuality and parametric knowledge, designed by Google DeepMind and Google Research. Each example includes: an original index (original_index), a problem (prompt), a gold answer (answer), a topic and answer type classification, two additional metadata fields (multi_step and requires_reasoning), and a list of URLs supporting the gold answer. The dataset aims to provide the research community with a more precise instrument to track genuine progress in factuality and foster the development of more trustworthy AI systems.
提供机构:
wangyuwei111



