five

TIGER-Lab/ViRL39K

收藏
Hugging Face2025-04-23 更新2025-05-31 收录
下载链接:
https://hf-mirror.com/datasets/TIGER-Lab/ViRL39K
下载链接
链接失效反馈
官方服务:
资源简介:
ViRL39K是一个为视觉语言推理模型训练而精心策划的集合,包含了38,870个可验证的问答对。这个数据集基于新收集的问题和现有数据集构建,经过了清洗、格式化、改写和验证。ViRL39K为最先进的视觉语言推理模型VL-Rethinker奠定了基础,具有高质量、可验证的特点,覆盖了从小学问题到更广泛的STEM和社会主题的全面话题和类别,并且提供了细粒度的模型能力注释,指导在不同规模模型训练时使用哪些查询。

ViRL39K is a curated collection of 38,870 verifiable Q&A pairs for Vision-Language Reasoning Model training. It is built on top of newly collected problems and existing datasets through cleaning, reformatting, rephrasing, and verification. ViRL39K lays the foundation for the state-of-the-art Vision-Language Reasoning Model VL-Rethinker, featuring high-quality and verifiable content, comprehensive coverage of topics from elementary school problems to broader STEM and social topics, and fine-grained model-capability annotations to guide query selection during training for models of different scales.
提供机构:
TIGER-Lab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作