AcademicBrowse

Name: AcademicBrowse
Creator: maas
Published: 2025-12-05 16:39:03
License: 暂无描述

魔搭社区2025-12-05 更新2025-06-28 收录

下载链接：

https://modelscope.cn/datasets/PKU-DS-LAB/AcademicBrowse

下载链接

链接失效反馈

官方服务：

资源简介：

# Welcome to ScholarSearch created by PKU-DS-LAB! ## Dataset Description ScholarSearch is the first dataset specifically designed to evaluate the complex information retrieval capabilities of Large Language Models (LLMs) in academic research. Key characteristics of ScholarSearch include: - **Academic Practicality**: Questions are based on real academic learning and research environments, avoiding misleading the models. - **High Difficulty**: Answers often require at least three deep searches to derive, making them challenging for single models. - **Concise Evaluation**: Answers are unique, with clear sources and brief explanations, facilitating audit and verification. - **Broad Coverage**: The dataset spans at least 12 different academic disciplines, including Computer Science, Literature, Biology, Political Science, Economics, Mathematics, Demography, History of Science and Technology, Chemistry, Sociology, Public Health, and Physics. The dataset consists of 223 meticulously curated questions in Chinese, each accompanied by an answer, explanation, and domain. It was created by a team of undergraduate and graduate students from various faculties at Peking University, ensuring the questions reflect genuine academic search scenarios. ## Dataset Structure Each entry in the dataset contains the following fields: - **question**: The academic query or problem. - **answer**: The correct answer to the question. - **explanation**: A brief explanation or justification for the answer, including sources. - **domain**: The academic discipline or field to which the question belongs. The dataset is provided as a JSON file containing a list of entries. ## Experiment Result | **Model** | **All (%)** | **Science & Engineering (%)** | **Social Sciences & Humanities (%)** | |-----------|:-------------:|:-------------------------------:|:--------------------------------------:| | gpt-4o-search-preview | 18.83 | 18.64 | 19.05 | | gpt-4o-mini-search-preview | 10.31 | 10.17 | 10.48 | | deepseek-r1-0528 | 8.52 | 5.08 | 12.38 | | gpt-4.1 | 7.17 | 5.93 | 8.57 | | gpt-4o-2024-11-20 | 3.59 | 1.69 | 5.71 | | gpt-4o-mini | 2.24 | 0.85 | 3.81 | *The judge model for all experiments is GPT-4o-mini.* ## Citation Information Paper Link: https://arxiv.org/abs/2506.13784 ## Additional Information - This project was funded by Grant 624B2005. - We would like to thank the following individuals for their contributions to problem-solving and evaluation: Xun Zhao, Zizhuo Fu, Yuqian Zhan, Xinhao Ji, Jiarui Sun, Junhao Zhang, Shengfan Wang, Ziteng Lu, Yumeng Song, Ziyan Yang, Hongjiao Wang, Shan Zhang, Huahui Lin, Junhong Liu, Zhengyang Wang, Yuchen Lu, Yanxi Xu. ## Team Members: Leading By: Tong Yang; Yuhan Wu; Core Contributors: Junting Zhou; Wang Li; Yiyan Liao; Nengyuan Zhang; Tingjia Miao; Zhihui Qi ## Dataset Card Contact For more details, please contact: yangtong@pku.edu.cn

# 欢迎使用北京大学数据科学实验室（PKU-DS-LAB）打造的ScholarSearch！ ## 数据集描述 ScholarSearch是首个专为评估大语言模型（Large Language Models, LLMs）在学术研究中的复杂信息检索能力而设计的数据集。 ScholarSearch的核心特性包括： - **学术实用性**：问题均基于真实学术学习与研究场景构建，避免误导模型。 - **高难度性**：答案通常需要至少三次深度检索才能推导得出，对单一模型而言极具挑战性。 - **评估便捷性**：答案唯一且来源明确、解释清晰，便于审核与验证。 - **覆盖广泛性**：数据集涵盖至少12个不同的学术学科，包括计算机科学、文学、生物学、政治学、经济学、数学、人口学、科学技术史、化学、社会学、公共卫生学以及物理学。该数据集包含223道经精心编撰的中文问题，每道问题均配有答案、解释及所属领域。数据集由北京大学各院系的本科生与研究生团队共同打造，确保问题能够真实反映学术检索场景。 ## 数据集结构数据集中的每条条目包含以下字段： - **question**: 学术查询或问题。 - **answer**: 该问题的正确答案。 - **explanation**: 针对答案的简要解释或论证过程，包含来源信息。 - **domain**: 该问题所属的学术学科或领域。数据集以JSON文件形式提供，内含条目列表。 ## 实验结果 | **模型** | **整体准确率(%)** | **理工科(%)** | **人文社科(%)** | |-----------|:-------------:|:-------------------------------:|:--------------------------------------:| | gpt-4o-search-preview | 18.83 | 18.64 | 19.05 | | gpt-4o-mini-search-preview | 10.31 | 10.17 | 10.48 | | deepseek-r1-0528 | 8.52 | 5.08 | 12.38 | | gpt-4.1 | 7.17 | 5.93 | 8.57 | | gpt-4o-2024-11-20 | 3.59 | 1.69 | 5.71 | | gpt-4o-mini | 2.24 | 0.85 | 3.81 | *所有实验的评判模型均为GPT-4o-mini。* ## 引用信息论文链接：https://arxiv.org/abs/2506.13784 ## 补充信息 - 本项目受编号为624B2005的科研基金资助。 - 我们谨向以下为问题构建与评估工作做出贡献的人员致以诚挚感谢：赵迅、傅梓卓、詹雨茜、吉新昊、孙佳睿、张峻豪、王盛帆、卢梓腾、宋雨萌、杨子妍、王洪娇、张珊、林华辉、刘俊宏、王正扬、卢雨辰、徐妍希。 ## 团队成员项目负责人：杨彤、吴雨晗；核心贡献者：周俊廷、李旺、廖奕言、张能远、苗庭嘉、齐智辉 ## 数据集卡片联系人如需了解更多详情，请联系：yangtong@pku.edu.cn

提供机构：

maas

创建时间：

2025-06-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集