five

SimpleQA-Verified

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/codelion/SimpleQA-Verified
下载链接
链接失效反馈
官方服务:
资源简介:
SimpleQA Verified is a 1,000-prompt benchmark for reliably evaluating Large Language Models (LLMs) on short-form factuality and parametric knowledge. The authors from Google DeepMind and Google Research address various limitations of SimpleQA, originally designed by Wei et al. (2024) at OpenAI, including noisy and incorrect labels, topical biases, and question redundancy. SimpleQA Verified was created to provide the research community with a more precise instrument to track genuine progress in factuality, discourage overfitting to benchmark artifacts, and ultimately foster the development of more trustworthy AI systems. Lukas Haas and Gal Yona and Giovanni D'Antonio and Sasha Goldshtein and Dipanjan Das. SimpleQA Verified: A Reliable Factuality Benchmark to Measure Parametric Knowledge. https://arxiv.org/abs/2509.07968, 2025. Google DeepMind, Google Research.

SimpleQA Verified 是一款含千条提示词的基准评测集,用于精准评估大语言模型(Large Language Models, LLMs)的短格式事实性能力与参数化知识储备。来自谷歌深度思维(Google DeepMind)与谷歌研究院(Google Research)的作者团队,针对OpenAI Wei团队于2024年最初提出的原版SimpleQA所存在的多重不足进行了优化,这些不足包括标签存在噪声与错误、主题偏见以及问题冗余现象。SimpleQA Verified 的研发旨在为学界提供更为精准的评测工具,用以追踪事实性任务领域的真实研究进展,防止模型过拟合基准测试的人工构造特征,并最终助力更具可信性的人工智能系统的研发。 Lukas Haas、Gal Yona、Giovanni D'Antonio、Sasha Goldshtein与Dipanjan Das. SimpleQA Verified:一款用于评测参数化知识的可靠事实性基准数据集. https://arxiv.org/abs/2509.07968, 2025. Google DeepMind, Google Research.
提供机构:
maas
创建时间:
2025-10-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作