openscilm_queries
收藏魔搭社区2025-12-05 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/allenai/openscilm_queries
下载链接
链接失效反馈官方服务:
资源简介:
## Literature Synthesis Queries
This dataset contains 50k real-world literature synthesis queries from [our public demo](https://openscilm.allen.ai/).
### Dataset Summary
This dataset contains real-world literature synthesis questions collected from users of a scientific question-answering system.
Each entry includes:
- The user’s query (in natural language) (`query`)
- The subject of the question (e.g., computer science, medicine, engineering) (`subject`)
- The query intent (e.g., Literature Understanding, Paper Finding, Ideation, Other) (`query intent`)
The questions reflect realistic information needs from researchers and students, and often require retrieving, synthesizing, or summarizing scientific literature.
The dataset is intended for research on retrieval-augmented generation, scientific question answering, and user intent classification.
### License
This dataset is licensed under [CC BY 4.0 ](https://creativecommons.org/licenses/by/4.0/legalcode.en). It is intended for research and educational use in accordance with [Ai2's Responsible Use Guidelines](https://allenai.org/responsible-use).
### Citation
```
@article{openscholar,
title={{OpenScholar}: Synthesizing Scientific Literature with Retrieval-Augmented Language Models},
author={Asai, Akari and He*, Jacqueline and Shao*, Rulin and Shi, Weijia and Singh, Amanpreet and Chang, Joseph Chee and Lo, Kyle and Soldaini, Luca and Feldman, Tian, Sergey and Mike, D’arcy and Wadden, David and Latzke, Matt and Sparks,Jenna and Hwang, Jena D. and Kishore, Varsha and Minyang and Ji, Pan and Liu, Shengyan and Tong, Hao and Wu, Bohao and Xiong, Yanyu and Zettlemoyer, Luke and Weld, Dan and Neubig, Graham and Downey, Doug and Yih, Wen-tau and Koh, Pang Wei and Hajishirzi, Hannaneh},
journal={Arxiv},
year={2024},
}
```
## 文献合成查询(Literature Synthesis Queries)
本数据集包含来自[我们的公开演示平台](https://openscilm.allen.ai/)的5万条真实世界文献合成查询。
### 数据集概述
本数据集采集自某科学问答系统用户提交的真实文献合成问题。每条数据包含以下内容:
- 用户自然语言查询(对应`query`字段)
- 问题所属学科(如计算机科学、医学、工程学等,对应`subject`字段)
- 查询意图(如文献理解、论文检索、创意构思、其他等,对应`query intent`字段)
此类问题反映了科研人员与学生的真实信息需求,通常需要对科学文献进行检索、整合或总结。本数据集旨在支持检索增强生成、科学问答以及用户意图分类等方向的研究。
### 许可证
本数据集采用[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode.en)许可证发布,仅可用于符合[Ai2负责任使用指南](https://allenai.org/responsible-use)的研究与教育用途。
### 引用格式
@article{openscholar,
title={{OpenScholar}:基于检索增强语言模型的科学文献合成},
author={Asai, Akari and He*, Jacqueline and Shao*, Rulin and Shi, Weijia and Singh, Amanpreet and Chang, Joseph Chee and Lo, Kyle and Soldaini, Luca and Feldman, Tian, Sergey and Mike, D’arcy and Wadden, David and Latzke, Matt and Sparks,Jenna and Hwang, Jena D. and Kishore, Varsha and Minyang and Ji, Pan and Liu, Shengyan and Tong, Hao and Wu, Bohao and Xiong, Yanyu and Zettlemoyer, Luke and Weld, Dan and Neubig, Graham and Downey, Doug and Yih, Wen-tau and Koh, Pang Wei and Hajishirzi, Hannaneh},
journal={arXiv},
year={2024},
}
提供机构:
maas
创建时间:
2025-08-13



