sqa_reranking_eval
收藏魔搭社区2025-11-27 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/sqa_reranking_eval
下载链接
链接失效反馈官方服务:
资源简介:
## Dataset Details
Dataset to evaluate retrieval/reranking models or techniques for scientific QA.
The questions are sourced from:
- Real researchers
- Stack exchange communities from computing related domains - CS, stats, math, data science
- Synthetic questions generated by prompting an LLM
Each question has passages text in markdown format and the paper Semantic Scholar id, along with a relevance label ranging from 0-3 (higher implies more relevant) obtained from GPT-4o.
The label distribution of passages across the questions is:
- 0 : 78187
- 2 : 65805
- 1 : 64785
- 3 : 8067
Here are evaluation results on some baseline models:
| Model (Size) | Latency (sec/query) | nDCG @10 | mRR |
|-------------|:-------------------:|:--------:|:---:|
| bge-reranker-v2-m3 (568M) | 0.14 | 0.913 | 0.973 |
| akariasai/ranker_large (568M) | 0.14 | 0.906 | 0.970 |
| jina-reranker-v2-base (278M) | 0.06 | 0.907 | 0.972 |
| mxbai-rerank-large-v1 (435M) | 0.46 | 0.927 | 0.975 |
| mxbai-rerank-base-v1 (184M) | 0.19 | 0.919 | 0.974 |
| mxbai-rerank-xsmall-v1 (70M) | 0.11 | 0.911 | 0.970 |
| mxbai-rerank-base-v2 (0.5B) | 0.40 | 0.918 | 0.974 |
| mxbai-rerank-large-v2 (1.5B) | 0.70 | 0.911 | 0.975 |
### Dataset Sources
<!-- Provide the basic links for the dataset. -->
- **Repository:** [ai2-scholarqa-lib](https://github.com/allenai/ai2-scholarqa-lib)
- **Demo [optional]:** [Ai2 ScholarQA](https://scholarqa.allen.ai/)
## 数据集详情
本数据集用于评估面向科学问答的检索与重排序模型或技术。
问题来源包括:
- 真实科研人员
- 计算机相关领域的Stack Exchange社区,涵盖计算机科学、统计学、数学、数据科学方向
- 通过提示大语言模型(Large Language Model,LLM)生成的合成问题
每个问题均附带Markdown格式的段落文本、对应论文的语义学者(Semantic Scholar)ID,以及由GPT-4o生成的0至3级相关性标签(数值越高代表相关性越强)。
各问题对应段落的标签分布如下:
- 0 : 78187
- 2 : 65805
- 1 : 64785
- 3 : 8067
以下为部分基线模型的评估结果:
| 模型(参数量) | 延迟(秒/查询) | 归一化折损累计增益@10(Normalized Discounted Cumulative Gain,nDCG@10) | 平均倒数排名(Mean Reciprocal Rank,mRR) |
|-------------|:-------------------:|:--------:|:---:|
| bge-reranker-v2-m3 (568M) | 0.14 | 0.913 | 0.973 |
| akariasai/ranker_large (568M) | 0.14 | 0.906 | 0.970 |
| jina-reranker-v2-base (278M) | 0.06 | 0.907 | 0.972 |
| mxbai-rerank-large-v1 (435M) | 0.46 | 0.927 | 0.975 |
| mxbai-rerank-base-v1 (184M) | 0.19 | 0.919 | 0.974 |
| mxbai-rerank-xsmall-v1 (70M) | 0.11 | 0.911 | 0.970 |
| mxbai-rerank-base-v2 (0.5B) | 0.40 | 0.918 | 0.974 |
| mxbai-rerank-large-v2 (1.5B) | 0.70 | 0.911 | 0.975 |
### 数据集来源
<!-- 请提供数据集的基础链接。 -->
- **仓库:** [ai2-scholarqa-lib](https://github.com/allenai/ai2-scholarqa-lib)
- **演示(可选):** [Ai2 ScholarQA](https://scholarqa.allen.ai/)
提供机构:
maas
创建时间:
2025-05-27



