HumanRef-CoT-45k
收藏魔搭社区2025-11-27 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/IDEA-Research/HumanRef-CoT-45k
下载链接
链接失效反馈官方服务:
资源简介:
<div align=center>
<img src="assets/logo.png" width=300 >
</div>
# 🦖🧠 Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning 🦖🧠
<div align=center>
<p align="center">
<a href="https://bagel-ai.org/">
<img
src="https://img.shields.io/badge/RexThinker-Website-Red?logo=afdian&logoColor=white&color=blue"
alt="RexThinker Website"
/>
</a>
<a href="https://github.com/IDEA-Research/Rex-Thinker/blob/master/paper_temp/rexthinker.pdf">
<img
src="https://img.shields.io/badge/RexThinker-Paper-Red%25red?logo=arxiv&logoColor=red&color=yellow"
alt="RexThinker Paper on arXiv"
/>
</a>
<a href="https://huggingface.co/IDEA-Research/Rex-Thinker-GRPO-7B">
<img
src="https://img.shields.io/badge/RexThinker-Weight-orange?logo=huggingface&logoColor=yellow"
alt="RexThinker weight on Hugging Face"
/>
</a>
<a href="https://huggingface.co/datasets/IDEA-Research/HumanRef-CoT-45k">
<img
src="https://img.shields.io/badge/HumanRefCoT-Data-orange?logo=huggingface&logoColor=yellow"
alt="RexThinker data on Hugging Face"
/>
</a>
</p>
</div>
> We propose Rex-Thinker, a Chain-of-Thought (CoT) reasoning model for object referring that addresses two key challenges: lack of interpretability and inability to reject unmatched expressions. Instead of directly predicting bounding boxes, Rex-Thinker reasons step-by-step over candidate objects to determine which, if any, match a given expression. Rex-Thinker is trained in two stages: supervised fine-tuning to learn structured CoT reasoning, followed by reinforcement learning with GRPO to enhance accuracy, faithfulness, and generalization. Our approach improves both prediction precision and interpretability, while enabling the model to abstain when no suitable object is found. Below is an example of the model's reasoning process:
<p align="center"><img src="assets/teaser_example.jpg" width="95%"></p>
## Method
**Rex-Thinker** reformulates object referring as a **Chain-of-Thought (CoT)** reasoning task to improve both interpretability and reliability. The model follows a structured three-stage reasoning paradigm:
1. **Planning**: Decompose the referring expression into interpretable subgoals.
2. **Action**: Evaluate each candidate object (obtained via an open-vocabulary detector) against these subgoals using step-by-step reasoning.
3. **Summarization**: Aggregate the intermediate results to output the final prediction — or abstain when no object matches.
Each reasoning step is grounded in a specific candidate object region through **Box Hints**, making the process transparent and verifiable.
Rex-Thinker is implemented on top of **QwenVL-2.5**, and trained in two stages:
- **Supervised Fine-Tuning (SFT)**
Cold-start training using GPT-4o-generated CoT traces as supervision.
- **GRPO-based Reinforcement Learning**
Further optimizes reasoning accuracy, generalization, and rejection ability via Group Relative Policy Optimization.
This CoT-based framework enables Rex-Thinker to make faithful, interpretable predictions while generalizing well to out-of-domain referring scenarios.
<p align="center"><img src="assets/model.jpg" width="95%"></p>
## HumanRef-CoT Dataset 📊
To support Chain-of-Thought (CoT) reasoning in referring expression comprehension, we introduce HumanRef-CoT, a large-scale dataset with 90,824 high-quality step-by-step reasoning annotations. Built on the HumanRef dataset, which focuses on multi-person referring tasks, HumanRef-CoT provides structured CoT traces—including planning, action, and summarization—generated using GPT-4o. These annotations make the model's reasoning process interpretable and verifiable, and serve as training data for both supervised fine-tuning and GRPO-based instruction tuning.
<p align="center"><img src="assets/data_engine.jpg" width="95%"></p>
We open source a subset of HumanRef-CoT with 45k samples for academic research. You can download the dataset from [Hugging Face](https://huggingface.co/datasets/IDEA-Research/HumanRef-CoT-45K). The dataset is in tsv format. which you can use the following script for visualize
### 6.1 Visualize the dataset
```bash
python tools/visualize_humanref_cot.py \
--img_tsv data/IDEA-Research/HumanRef-CoT-45k/humanref_cot.images.tsv \
--ann_tsv data/IDEA-Research/HumanRef-CoT-45k/humanref_cot.annotations.tsv \
--ann_lineidx data/IDEA-Research/HumanRef-CoT-45k/humanref_cot.annotations.tsv.lineidx \
--num_vis 50 \
--output_dir vis/humanref_cot
```
Note that the current visualization code can't draw emoji ✅, ❌, and ⚠️, which are used in the dataset.
<div align=center>
<img src="assets/logo.png" width=300 >
</div>
# 🦖🧠 Rex-Thinker:基于思维链(Chain-of-Thought, CoT)推理的接地目标指代(grounded object referring)任务 🦖🧠
<div align=center>
<p align="center">
<a href="https://bagel-ai.org/">
<img
src="https://img.shields.io/badge/RexThinker-Website-Red?logo=afdian&logoColor=white&color=blue"
alt="RexThinker 官网"
/>
</a>
<a href="https://github.com/IDEA-Research/Rex-Thinker/blob/master/paper_temp/rexthinker.pdf">
<img
src="https://img.shields.io/badge/RexThinker-Paper-Red%25red?logo=arxiv&logoColor=red&color=yellow"
alt="arXiv 平台上的 RexThinker 论文"
/>
</a>
<a href="https://huggingface.co/IDEA-Research/Rex-Thinker-GRPO-7B">
<img
src="https://img.shields.io/badge/RexThinker-Weight-orange?logo=huggingface&logoColor=yellow"
alt="Hugging Face 平台上的 RexThinker 模型权重"
/>
</a>
<a href="https://huggingface.co/datasets/IDEA-Research/HumanRef-CoT-45k">
<img
src="https://img.shields.io/badge/HumanRefCoT-Data-orange?logo=huggingface&logoColor=yellow"
alt="Hugging Face 平台上的 HumanRefCoT 数据集"
/>
</a>
</p>
</div>
> 本文提出Rex-Thinker,一款面向接地目标指代任务的思维链(Chain-of-Thought, CoT)推理模型,旨在解决两大核心挑战:推理过程缺乏可解释性,以及无法拒绝不匹配的指代表达式。与直接预测边界框不同,Rex-Thinker会逐步对候选目标进行推理,以判断是否存在与给定表达式匹配的对象。Rex-Thinker采用两阶段训练流程:首先通过监督微调学习结构化思维链推理,随后借助基于GRPO的强化学习提升模型准确率、忠实度与泛化能力。我们的方法不仅提升了预测精度与可解释性,还能在无合适目标时主动放弃预测。以下为模型推理过程示例:
<p align="center"><img src="assets/teaser_example.jpg" width="95%"></p>
## 方法
**Rex-Thinker**将目标指代任务重构为**思维链(Chain-of-Thought, CoT)**推理任务,以同时提升可解释性与可靠性。该模型遵循结构化的三阶段推理范式:
1. **规划阶段**:将指代表达式拆解为可解释的子目标。
2. **行动阶段**:通过逐步推理,将(基于开放词汇检测器获取的)每个候选目标与上述子目标进行比对评估。
3. **总结阶段**:整合中间推理结果以输出最终预测——若无可匹配目标则主动放弃。
每一步推理均通过**边界框提示(Box Hints)**锚定至特定候选目标区域,使得整个推理过程透明且可验证。
Rex-Thinker基于**QwenVL-2.5**构建,采用两阶段训练:
- **监督微调(Supervised Fine-Tuning, SFT)**:以GPT-4o生成的思维链轨迹作为监督信号进行冷启动训练。
- **基于GRPO的强化学习**:通过组相对策略优化(Group Relative Policy Optimization, GRPO)进一步优化推理精度、泛化能力与拒识能力。
该基于思维链的框架使得Rex-Thinker能够生成忠实且可解释的预测结果,同时在跨域指代场景中仍保持良好的泛化性能。
<p align="center"><img src="assets/model.jpg" width="95%"></p>
## HumanRef-CoT 数据集 📊
为支持指代理解任务中的思维链推理,我们推出HumanRef-CoT大型数据集,包含90,824条高质量的分步推理标注。该数据集基于聚焦于多人指代任务的HumanRef数据集构建,通过GPT-4o生成包含规划、行动、总结三个环节的结构化思维链轨迹。这些标注使得模型的推理过程具备可解释性与可验证性,同时可作为监督微调与基于GRPO的指令微调的训练数据。
<p align="center"><img src="assets/data_engine.jpg" width="95%"></p>
我们开源了HumanRef-CoT的45k样本子集以支持学术研究。您可通过[Hugging Face](https://huggingface.co/datasets/IDEA-Research/HumanRef-CoT-45K)下载该数据集,其格式为TSV文件。您可使用以下脚本进行数据可视化:
### 6.1 数据集可视化
bash
python tools/visualize_humanref_cot.py
--img_tsv data/IDEA-Research/HumanRef-CoT-45k/humanref_cot.images.tsv
--ann_tsv data/IDEA-Research/HumanRef-CoT-45k/humanref_cot.annotations.tsv
--ann_lineidx data/IDEA-Research/HumanRef-CoT-45k/humanref_cot.annotations.tsv.lineidx
--num_vis 50
--output_dir vis/humanref_cot
请注意,当前的可视化代码无法绘制数据集中使用的表情符号✅、❌与⚠️。
提供机构:
maas
创建时间:
2025-10-20



