five

Chinese-SimpleQA

收藏
魔搭社区2026-05-16 更新2025-03-15 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/Chinese-SimpleQA
下载链接
链接失效反馈
官方服务:
资源简介:
# Overview <p align="center"> 🌐 <a href="https://openstellarteam.github.io/ChineseSimpleQA/" target="_blank">Website</a> • 🤗 <a href="https://huggingface.co/datasets/OpenStellarTeam/Chinese-SimpleQA" target="_blank">Hugging Face</a> • ⏬ <a href="#data" target="_blank">Data</a> • 📃 <a href="https://huggingface.co/datasets/OpenStellarTeam/Chinese-SimpleQA" target="_blank">Paper</a> • 📊 <a href="http://47.109.32.164/" target="_blank">Leaderboard</a> </p> **Chinese SimpleQA** is the first comprehensive Chinese benchmark to evaluate the factuality ability of language models to answer short questions, and Chinese SimpleQA mainly has five properties (i.e., Chinese, Diverse, High-quality, Static, Easy-to-evaluate). Specifically, our benchmark covers **6 major topics** with **99 diverse subtopics**. Please visit our [website](https://openstellarteam.github.io/ChineseSimpleQA/) or check our [paper](https://arxiv.org/abs/2411.07140) for more details. ## 💫 Instroduction * How to solve the generative hallucination of models has always been an unsolved problem in the field of artificial intelligence (AI). In order to measure the factual correctness of language models, OpenAI recently released and open-sourced a test set called SimpleQA. We have also been paying attention to the field of factuality, which currently has problems such as outdated data, inaccurate evaluation, and incomplete coverage. For example, the knowledge evaluation sets widely used now are still CommonSenseQA, CMMLU, and C-Eval, which are multiple-choice question-based evaluation sets. **In order to further promote the research of the Chinese community on the factual correctness of models, we propose the Chinese SimpleQA**. which consists of 3000 high-quality questions spanning 6 major topics, ranging from humanities to science and engineering. Specifically, the distinct main features of our proposed Chinese SimpleQA dataset are as follows: * 🀄**Chinese:** Our Chinese SimpleQA focuses on the Chinese language, which provides a comprehensive evaluation of the factuality abilities of existing LLMs in Chinese. * 🍀**Diverse:** Chinese SimpleQA covers 6 topics (i.e., “Chinese Culture”, “Humanities”, “Engineering, Technology, and Applied Sciences”, “Life, Art, and Culture”, “Society”, and “Natural Science”), and these topic includes 99 fine-grained subtopics in total, which demonstrates the diversity of our Chinese SimpleQA. * ⚡**High-quality:** We conduct a comprehensive and rigorous quality control process to ensure the quality and accuracy of our Chinese SimpleQA. * 💡**Static:** Following SimpleQA, to preserve the evergreen property of Chinese SimpleQA, all reference answers would not change over time. * 🗂️**Easy-to-evaluate:** Following SimpleQA, as the questions and answers are very short, the grading procedure is fast to run via existing LLMs (e.g., OpenAI API). - Based on Chinese SimpleQA, we have conducted a comprehensive evaluation of the factual capabilities of existing LLMs. We also maintain a comprehensive leaderboard list. - In short, we hope that Chinese SimpleQA can help developers gain a deeper understanding of the factual correctness of their models in the Chinese field, and at the same time provide an important cornerstone for their algorithm research, and jointly promote the growth of Chinese basic models. ## 📊 Leaderboard 详见: [📊](http://47.109.32.164/) ## ⚖️ Evals please visit [github page](https://github.com/OpenStellarTeam/ChineseSimpleQA). ## Citation Please cite our paper if you use our dataset. ``` @misc{he2024chinesesimpleqachinesefactuality, title={Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models}, author={Yancheng He and Shilong Li and Jiaheng Liu and Yingshui Tan and Weixun Wang and Hui Huang and Xingyuan Bu and Hangyu Guo and Chengwei Hu and Boren Zheng and Zhuoran Lin and Xuepeng Liu and Dekai Sun and Shirong Lin and Zhicheng Zheng and Xiaoyong Zhu and Wenbo Su and Bo Zheng}, year={2024}, eprint={2411.07140}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2411.07140}, } ```

# 概览 <p align="center"> 🌐 <a href="https://openstellarteam.github.io/ChineseSimpleQA/" target="_blank">官方网站</a> • 🤗 <a href="https://huggingface.co/datasets/OpenStellarTeam/Chinese-SimpleQA" target="_blank">Hugging Face</a> • ⏬ <a href="#data" target="_blank">数据集</a> • 📃 <a href="https://huggingface.co/datasets/OpenStellarTeam/Chinese-SimpleQA" target="_blank">研究论文</a> • 📊 <a href="http://47.109.32.164/" target="_blank">排行榜</a> </p> **Chinese SimpleQA(中文简易事实问答基准)** 是首个用于评估语言模型回答短句问题事实性能力的综合性中文基准数据集,具备五大核心特性:中文适配、覆盖多样、品质精良、静态稳定、易于评估。具体而言,本基准涵盖**6大主题**与**99个多样化子主题**。 请访问我们的[官方网站](https://openstellarteam.github.io/ChineseSimpleQA/)或查阅[研究论文](https://arxiv.org/abs/2411.07140)以获取更多细节。 ## 💫 数据集介绍 * 如何解决模型的生成式幻觉问题始终是人工智能(AI)领域尚未解决的难题。为衡量语言模型的事实正确性,OpenAI近期发布并开源了名为SimpleQA的测试集。我们同样关注事实性评估领域,当前该领域存在数据过时、评估不准确、覆盖范围不全等问题。例如,目前广泛使用的知识评估数据集仍以基于选择题的CommonSenseQA、CMMLU及C-Eval为主。**为进一步推动中文社区在模型事实正确性方向的研究,我们提出了Chinese SimpleQA(中文简易事实问答基准)**,该数据集包含3000道高质量问题,涵盖从人文到理工的6大主题。具体而言,本数据集的显著核心特性如下: * 🀄**中文适配:** 本数据集聚焦中文语境,可全面评估现有大语言模型(Large Language Model, LLM)的中文事实性推理能力。 * 🍀**覆盖多样:** Chinese SimpleQA涵盖6大主题,分别为“中国文化”“人文社科”“工程技术与应用科学”“生活艺术与文化”“社会科学”以及“自然科学”,总计包含99个细粒度子主题,充分体现了数据集的多样性。 * ⚡**品质精良:** 我们开展了全面且严格的质量管控流程,以确保数据集的质量与答案准确性。 * 💡**静态稳定:** 参考SimpleQA的设计思路,为保留数据集的长效性,所有参考答案均不会随时间发生变更。 * 🗂️**易于评估:** 同样参考SimpleQA的设计,由于问题与答案均较为简短,可通过现有大语言模型(如OpenAI API)快速完成评分流程。 - 基于Chinese SimpleQA(中文简易事实问答基准),我们已对现有大语言模型的事实性能力开展了全面评估,并维护了完整的排行榜列表。 - 简言之,我们期望Chinese SimpleQA能够帮助开发者更深入地理解其模型在中文领域的事实正确性,同时为其算法研究提供重要的研究基石,共同推动中文基础模型的发展。 ## 📊 排行榜 详见: [📊](http://47.109.32.164/) ## ⚖️ 评估方法 请访问[GitHub页面](https://github.com/OpenStellarTeam/ChineseSimpleQA)。 ## 引用说明 若您使用本数据集,请引用我们的论文。 @misc{he2024chinesesimpleqachinesefactuality, title={Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models}, author={Yancheng He and Shilong Li and Jiaheng Liu and Yingshui Tan and Weixun Wang and Hui Huang and Xingyuan Bu and Hangyu Guo and Chengwei Hu and Boren Zheng and Zhuoran Lin and Xuepeng Liu and Dekai Sun and Shirong Lin and Zhicheng Zheng and Xiaoyong Zhu and Wenbo Su and Bo Zheng}, year={2024}, eprint={2411.07140}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2411.07140}, }
提供机构:
maas
创建时间:
2025-03-10
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
Chinese-SimpleQA是首个全面的中文基准数据集,旨在评估语言模型回答简短问题的事实性能力,具有中文、多样化、高质量、静态和易于评估五大特性。该数据集覆盖6个主要主题和99个子主题,包含3000个高质量问题,范围从人文到科学与工程,为中文社区研究模型事实性正确性提供重要支持。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作