five

GThinker-11K

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/jefferyZhan/GThinker
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为GThinker-11K,包含7,000条高质量、经迭代标注的推理路径和4,000个精心挑选的强化学习样本。这些数据旨在提升模型在通用场景、数学和科学领域的多模态推理能力。该数据集旨在弥补通用多模态推理数据不足的问题,并支持GThinker模型的分两阶段训练流程。数据集规模达到11,000个样本,任务聚焦于多模态推理。

The dataset is named GThinker-11K, which contains 7,000 high-quality iteratively annotated reasoning paths and 4,000 carefully selected reinforcement learning samples. These data are designed to enhance the model's multimodal reasoning capabilities in general scenarios, mathematics and scientific domains. This dataset aims to fill the gap of insufficient general multimodal reasoning data, and supports the two-stage training pipeline of the GThinker model. With a total scale of 11,000 samples, the dataset focuses on multimodal reasoning tasks.
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作