SciArena
收藏魔搭社区2025-11-27 更新2025-05-17 收录
下载链接:
https://modelscope.cn/datasets/yale-nlp/SciArena
下载链接
链接失效反馈官方服务:
资源简介:
<h1 align="center">
SciArena: A New Platform for Evaluating Foundation Models in Scientific Literature Tasks
</h1>
<p align="center">
📝 <a href="https://allenai.org/blog/sciarena">Blog</a>
🌐 <a href="https://sciarena.allen.ai">SciArena Platform</a>
💻 <a href="https://github.com/yale-nlp/SciArena">Code</a>
📰 <a href="https://huggingface.co/papers/2507.01001">Paper</a>
</p>
We present SciArena, an open and collaborative platform for evaluating foundation models on scientific literature tasks. Unlike traditional benchmarks for scientific literature understanding and synthesis, SciArena engages the research community directly, following the Chatbot Arena evaluation approach of community voting on model comparisons. By leveraging collective intelligence, SciArena offers a community-driven evaluation of model performance on open-ended scientific tasks that demand literature-grounded, long-form responses.
## 📰 News
- **2025-07-01**: We are excited to release the SciArena paper, evaluation platform, dataset, and evaluation code!
## 🚀 Quickstart
**Dataset Example Feature**:
```bash
{
"id": // Unique ID for the voting example
"question": // question collected from users
"responseA": // Response from Model A
"responseB": // Response from Model B
"modelA": // Model A name
"modelB": // Model B name
"vote": // User vote
"citation_a": // The citation for the response from Model A, the feature 'concise_authors' corresponds to the citation in the response
"citation_b": // The citation for the response from Model B, the feature 'concise_authors' corresponds to the citation in the response
"question_type": // The type of question
"subject": // The subject
},
```
We also provide a version with examples, along with the accompanying **paper bank** (the set of retrieved papers used as model input), available at https://huggingface.co/datasets/yale-nlp/SciArena-with-paperbank
## 📖 Citation
```
@inproceedings{zhao2025sciarena,
title = {SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks},
author = {Yilun Zhao and Kaiyan Zhang and Tiansheng Hu and Sihong Wu and Ronan Le Bras and Taira Anderson and Jonathan Bragg and Joseph Chee Chang and Jesse Dodge and Matt Latzke and Yixin Liu and Charles McGrady and Xiangru Tang and Zihang Wang and Chen Zhao and Hannaneh Hajishirzi and Doug Downey and Arman Cohan},
booktitle = {Proceedings of the NeurIPS 2025 Datasets \& Benchmarks Track},
year = {2025},
url = {https://neurips.cc/virtual/2025/poster/121564},
note = {Available at arXiv:2507.01001},
}
```
<h1 align="center">
SciArena:面向基础模型科学文献任务评测的全新平台
</h1>
<p align="center">
📝 <a href="https://allenai.org/blog/sciarena">官方博客</a>
🌐 <a href="https://sciarena.allen.ai">SciArena评测平台</a>
💻 <a href="https://github.com/yale-nlp/SciArena">源代码仓库</a>
📰 <a href="https://huggingface.co/papers/2507.01001">研究论文</a>
</p>
我们推出了SciArena——一个面向基础模型科学文献任务评测的开源协作平台。与传统的科学文献理解与合成基准不同,SciArena直接面向研究社区,采用Chatbot Arena的评测范式,即通过社区投票对模型输出进行对比评选。依托集体智能,SciArena能够针对需要基于文献生成长文本回复的开放式科学任务,开展由社区主导的模型性能评测。
## 📰 最新动态
- **2025-07-01**:我们正式发布SciArena相关研究论文、评测平台、数据集及评测代码!
## 🚀 快速上手
**数据集示例字段说明**:
json
{
"id": "// 用于投票示例的唯一标识符",
"question": "// 用户征集的问题",
"responseA": "// 模型A生成的回复",
"responseB": "// 模型B生成的回复",
"modelA": "// 模型A的名称",
"modelB": "// 模型B的名称",
"vote": "// 用户投票结果",
"citation_a": "// 模型A回复对应的引用,其中`concise_authors`字段对应回复中的引用信息",
"citation_b": "// 模型B回复对应的引用,其中`concise_authors`字段对应回复中的引用信息",
"question_type": "// 问题类型",
"subject": "// 所属学科主题"
}
我们还提供了带示例的数据集版本,以及配套的**文献库**(即作为模型输入的检索论文集合),相关资源可通过 https://huggingface.co/datasets/yale-nlp/SciArena-with-paperbank 获取。
## 📖 引用格式
bibtex
@inproceedings{zhao2025sciarena,
title = {SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks},
author = {Yilun Zhao and Kaiyan Zhang and Tiansheng Hu and Sihong Wu and Ronan Le Bras and Taira Anderson and Jonathan Bragg and Joseph Chee Chang and Jesse Dodge and Matt Latzke and Yixin Liu and Charles McGrady and Xiangru Tang and Zihang Wang and Chen Zhao and Hannaneh Hajishirzi and Doug Downey and Arman Cohan},
booktitle = {Proceedings of the NeurIPS 2025 Datasets & Benchmarks Track},
year = {2025},
url = {https://neurips.cc/virtual/2025/poster/121564},
note = {Available at arXiv:2507.01001},
}
提供机构:
maas
创建时间:
2025-05-16



