SciArena-with-paperbank
收藏魔搭社区2025-12-05 更新2025-08-30 收录
下载链接:
https://modelscope.cn/datasets/yale-nlp/SciArena-with-paperbank
下载链接
链接失效反馈官方服务:
资源简介:
<h1 align="center">
SciArena: A New Platform for Evaluating Foundation Models in Scientific Literature Tasks
</h1>
<p align="center">
📝 <a href="https://allenai.org/blog/sciarena">Blog</a>
🌐 <a href="https://sciarena.allen.ai">SciArena Platform</a>
💻 <a href="https://github.com/yale-nlp/SciArena">Code</a>
📰 <a href="https://huggingface.co/papers/2507.01001">Paper</a>
</p>
We present SciArena, an open and collaborative platform for evaluating foundation models on scientific literature tasks. Unlike traditional benchmarks for scientific literature understanding and synthesis, SciArena engages the research community directly, following the Chatbot Arena evaluation approach of community voting on model comparisons. By leveraging collective intelligence, SciArena offers a community-driven evaluation of model performance on open-ended scientific tasks that demand literature-grounded, long-form responses.
## 📰 News
- **2025-07-01**: We are excited to release the SciArena paper, evaluation platform, dataset, and evaluation code!
## 🚀 Quickstart
- The `train` subset contains all examples in `yale-nlp/SciArena`.
- The `test` subset contains the examples in `yale-nlp/SciArena-Eval`, which is a collection of 2,000 examples randomly sampled from `yale-nlp/SciArena` and serves as the meta-evaluation benchmark.
**Dataset Example Feature**:
```bash
{
"id": // Unique ID for the voting example
"question": // question collected from users
"responseA": // Response from Model A
"responseB": // Response from Model B
"modelA": // Model A name
"modelB": // Model B name
"vote": // User vote
"citation_a": // The citation for the response from Model A, the feature 'concise_authors' corresponds to the citation in the response
"citation_b": // The citation for the response from Model B, the feature 'concise_authors' corresponds to the citation in the response
"question_type": // The type of question
"subject": // The subject
"paper_bank": **(New)** Collection of top-n retrieved papers for context
},
```
### Basic Loading
```python
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("yale-nlp/SciArena-with-paperbank", split="train")
# Check dataset size
print(f"Number of examples: {len(dataset)}")
# View first example
first_example = dataset[0]
print(f"Question: {first_example['question']}")
print(f"Model A: {first_example['modelA']}")
print(f"Model B: {first_example['modelB']}")
print(f"Winner: {first_example['vote']}")
```
<h1 align="center">
SciArena:面向科研文献任务的基础模型评测新平台
</h1>
<p align="center">
📝 <a href="https://allenai.org/blog/sciarena">博客</a>
🌐 <a href="https://sciarena.allen.ai">SciArena平台</a>
💻 <a href="https://github.com/yale-nlp/SciArena">代码仓库</a>
📰 <a href="https://huggingface.co/papers/2507.01001">研究论文</a>
</p>
本研究提出SciArena——一个面向科研文献任务、用于评测基础模型的开放协作式平台。与传统的科研文献理解与合成基准不同,SciArena直接对接研究社区,采用Chatbot Arena的评测范式,即通过社区投票对模型输出进行对比。依托集体智能,SciArena可针对需要基于文献生成长文本回答的开放式科研任务,实现由社区主导的模型性能评测。
## 📰 最新动态
- **2025-07-01**:我们正式发布SciArena相关论文、评测平台、数据集及评测代码!
## 🚀 快速入门
- `train` 子集包含 `yale-nlp/SciArena` 中的全部样本。
- `test` 子集包含来自 `yale-nlp/SciArena-Eval` 的样本,该子集是从 `yale-nlp/SciArena` 中随机抽取的2000条样本构成的元评测基准。
**数据集示例特征**:
bash
{
"id": // 投票样本的唯一标识符
"question": // 用户征集的问题
"responseA": // 模型A的输出结果
"responseB": // 模型B的输出结果
"modelA": // 模型A的名称
"modelB": // 模型B的名称
"vote": // 用户投票结果
"citation_a": // 模型A输出结果对应的参考文献,其中`concise_authors`字段对应参考文献中的作者信息
"citation_b": // 模型B输出结果对应的参考文献,其中`concise_authors`字段对应参考文献中的作者信息
"question_type": // 问题类型
"subject": // 研究主题
"paper_bank": **(新增)** 用于上下文的Top-n检索论文集合
},
### 基础加载方法
python
from datasets import load_dataset
# 加载数据集
dataset = load_dataset("yale-nlp/SciArena-with-paperbank", split="train")
# 查看数据集规模
print(f"样本总数:{len(dataset)}")
# 查看第一条样本
first_example = dataset[0]
print(f"问题:{first_example['question']}")
print(f"模型A:{first_example['modelA']}")
print(f"模型B:{first_example['modelB']}")
print(f"获胜模型:{first_example['vote']}")
提供机构:
maas
创建时间:
2025-08-20



