trillionlabs/rbridge-mask
收藏Hugging Face2026-02-09 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/trillionlabs/rbridge-mask
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-generation
tags:
- reasoning
- perplexity
- evaluation
- rbridge
pretty_name: "rBridge-Mask: Masked Reasoning Trace Evaluation Dataset"
---
<p align="center">
<picture>
<img src="https://github.com/trillion-labs/gWorld/blob/main/trillion.jpg?raw=true" style="width: 40%;">
</picture>
</p>
<div align="center">
[](https://arxiv.org/abs/2509.21013)
[](https://github.com/trillion-labs/rBridge)
</div>
# rBridge-Mask
Evaluation dataset for **rBridge**, a method for predicting LLM reasoning performance using small proxy models. Contains reasoning traces from frontier models with `<span>` tags marking key reasoning steps.
## Overview
Each sample contains a question and a reasoning trace where important factual/reasoning content is tagged with `<span>...</span>`. rBridge computes the masked log-likelihood — only scoring tokens inside tagged regions — to predict downstream reasoning performance at a fraction of the cost.
## Subsets
| Subset | Samples | Source |
|--------|---------|--------|
| mmlu-pro | 601 | MMLU-Pro |
| mmlu | 1,248 | MMLU |
| cqa | 486 | CommonsenseQA |
| bbh | 322 | BIG-Bench Hard |
| kmmlu | 220 | KMMLU |
| arc | 177 | ARC-Challenge |
| arena-hard | 100 | Arena-Hard |
| math500 | 100 | MATH-500 |
| gpqa | 100 | GPQA |
| aime25 | 9 | AIME 2025 |
| **Total** | **3,363** | |
## Schema
| Column | Type | Description |
|--------|------|-------------|
| `question` | string | Input question/prompt |
| `reasoning` | string | Reasoning trace with `<span>` tags marking key steps |
## Example
```
question: "What is the Dane particle?"
reasoning: "The user is asking about virology. <span>The Dane particle is the complete virion
of Hepatitis B virus (HBV), approximately 42nm in diameter</span>. It was named after
David Dane who first described it in 1970. <span>It consists of an outer lipid envelope
containing HBsAg and an inner nucleocapsid containing HBcAg and the viral DNA</span>."
```
## Usage
```python
from datasets import load_dataset
ds = load_dataset("trillionlabs/rbridge-mask", "gpqa", split="test")
```
Or evaluate directly with the [rBridge CLI](https://github.com/trillion-labs/rBridge):
```bash
python -m rbridge.eval \
--model your-model \
--dataset trillionlabs/rbridge-mask \
--subsets gpqa,mmlu-pro \
--output results.json
```
## Citation
```bibtex
@article{koh2025predicting,
title={Predicting LLM Reasoning Performance with Small Proxy Model},
author={Koh, Woosung and Suk, Juyoung and Han, Sungjun and Yun, Se-Young and Shin, Jamin},
journal={arXiv preprint arXiv:2509.21013},
year={2025}
}
```
提供机构:
trillionlabs



