TIGER-Lab/ViRL39K
收藏Hugging Face2025-04-23 更新2025-05-31 收录
下载链接:
https://hf-mirror.com/datasets/TIGER-Lab/ViRL39K
下载链接
链接失效反馈官方服务:
资源简介:
ViRL39K是一个为视觉语言推理模型训练而精心策划的集合,包含了38,870个可验证的问答对。这个数据集基于新收集的问题和现有数据集构建,经过了清洗、格式化、改写和验证。ViRL39K为最先进的视觉语言推理模型VL-Rethinker奠定了基础,具有高质量、可验证的特点,覆盖了从小学问题到更广泛的STEM和社会主题的全面话题和类别,并且提供了细粒度的模型能力注释,指导在不同规模模型训练时使用哪些查询。
ViRL39K is a curated collection of 38,870 verifiable Q&A pairs for Vision-Language Reasoning Model training. It is built on top of newly collected problems and existing datasets through cleaning, reformatting, rephrasing, and verification. ViRL39K lays the foundation for the state-of-the-art Vision-Language Reasoning Model VL-Rethinker, featuring high-quality and verifiable content, comprehensive coverage of topics from elementary school problems to broader STEM and social topics, and fine-grained model-capability annotations to guide query selection during training for models of different scales.
提供机构:
TIGER-Lab



