ArtificialAnalysis/AA-Omniscience-Public
收藏Hugging Face2026-01-14 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/ArtificialAnalysis/AA-Omniscience-Public
下载链接
链接失效反馈官方服务:
资源简介:
AA-Omniscience-Public 是一个包含600个问题的数据集,涵盖多个领域,用于测试模型的知识和幻觉倾向。该数据集是完整6,000个问题数据集的10%子集,确保在发布时能够代表完整数据集的性能。问题由自动生成代理创建,经过过滤和修订,确保问题难度高、无歧义、不依赖特定来源且答案精确。数据集旨在衡量模型准确回忆事实信息的能力,并在知识不足时正确避免回答。评分方法使用分级模型对答案进行分类,确保评估的准确性。
AA-Omniscience-Public contains 600 questions across a wide range of domains used to test a model’s knowledge and hallucination tendencies. It is a 10% subset of the full 6,000-question dataset, sampled to ensure representativeness of model performance at the time of release. Questions are created using an automated question generation agent, filtered and revised to be difficult, unambiguous, not reliant on specific sources, and precise. The dataset is designed to measure a model’s ability to recall factual information accurately across domains and correctly abstain when its knowledge is insufficient. A grading model is used to classify answers, ensuring evaluation integrity.
提供机构:
ArtificialAnalysis



