five

trillionlabs/rbridge-mask

收藏
Hugging Face2026-02-09 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/trillionlabs/rbridge-mask
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation tags: - reasoning - perplexity - evaluation - rbridge pretty_name: "rBridge-Mask: Masked Reasoning Trace Evaluation Dataset" --- <p align="center"> <picture> <img src="https://github.com/trillion-labs/gWorld/blob/main/trillion.jpg?raw=true" style="width: 40%;"> </picture> </p> <div align="center"> [![Paper](https://img.shields.io/badge/arXiv-2509.21013-b31b1b.svg)](https://arxiv.org/abs/2509.21013) [![Code](https://img.shields.io/badge/GitHub-trillion--labs/rBridge-blue.svg)](https://github.com/trillion-labs/rBridge) </div> # rBridge-Mask Evaluation dataset for **rBridge**, a method for predicting LLM reasoning performance using small proxy models. Contains reasoning traces from frontier models with `<span>` tags marking key reasoning steps. ## Overview Each sample contains a question and a reasoning trace where important factual/reasoning content is tagged with `<span>...</span>`. rBridge computes the masked log-likelihood — only scoring tokens inside tagged regions — to predict downstream reasoning performance at a fraction of the cost. ## Subsets | Subset | Samples | Source | |--------|---------|--------| | mmlu-pro | 601 | MMLU-Pro | | mmlu | 1,248 | MMLU | | cqa | 486 | CommonsenseQA | | bbh | 322 | BIG-Bench Hard | | kmmlu | 220 | KMMLU | | arc | 177 | ARC-Challenge | | arena-hard | 100 | Arena-Hard | | math500 | 100 | MATH-500 | | gpqa | 100 | GPQA | | aime25 | 9 | AIME 2025 | | **Total** | **3,363** | | ## Schema | Column | Type | Description | |--------|------|-------------| | `question` | string | Input question/prompt | | `reasoning` | string | Reasoning trace with `<span>` tags marking key steps | ## Example ``` question: "What is the Dane particle?" reasoning: "The user is asking about virology. <span>The Dane particle is the complete virion of Hepatitis B virus (HBV), approximately 42nm in diameter</span>. It was named after David Dane who first described it in 1970. <span>It consists of an outer lipid envelope containing HBsAg and an inner nucleocapsid containing HBcAg and the viral DNA</span>." ``` ## Usage ```python from datasets import load_dataset ds = load_dataset("trillionlabs/rbridge-mask", "gpqa", split="test") ``` Or evaluate directly with the [rBridge CLI](https://github.com/trillion-labs/rBridge): ```bash python -m rbridge.eval \ --model your-model \ --dataset trillionlabs/rbridge-mask \ --subsets gpqa,mmlu-pro \ --output results.json ``` ## Citation ```bibtex @article{koh2025predicting, title={Predicting LLM Reasoning Performance with Small Proxy Model}, author={Koh, Woosung and Suk, Juyoung and Han, Sungjun and Yun, Se-Young and Shin, Jamin}, journal={arXiv preprint arXiv:2509.21013}, year={2025} } ```
提供机构:
trillionlabs
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作