InternVL-Chat-V1-2-SFT-Data
收藏魔搭社区2025-12-18 更新2024-12-28 收录
下载链接:
https://modelscope.cn/datasets/OpenGVLab/InternVL-Chat-V1-2-SFT-Data
下载链接
链接失效反馈官方服务:
资源简介:
# Data Card for InternVL-Chat-V1-2-SFT-Data
## Overview
Inspired by LLaVA-NeXT, we adopted a data-efficient SFT strategy to train InternVL-Chat-V1-2, utilizing approximately 1.2M of visual instruction tuning samples in total, all of which are fully open-source. In a macro sense, we build upon ShareGPT-4V and additionally integrate LLaVA-ZH, DVQA, ChartQA, AI2D, DocVQA, GeoQA+, and SynthDoG-EN. Most of the data remains consistent with LLaVA-NeXT.
## Citation
If you use this dataset in your research, please consider citing:
```
@article{chen2023internvl,
title={InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks},
author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and Li, Bin and Luo, Ping and Lu, Tong and Qiao, Yu and Dai, Jifeng},
journal={arXiv preprint arXiv:2312.14238},
year={2023}
}
@article{chen2024far,
title={How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites},
author={Chen, Zhe and Wang, Weiyun and Tian, Hao and Ye, Shenglong and Gao, Zhangwei and Cui, Erfei and Tong, Wenwen and Hu, Kongzhi and Luo, Jiapeng and Ma, Zheng and others},
journal={arXiv preprint arXiv:2404.16821},
year={2024}
}
```
# InternVL-Chat-V1-2-SFT-Data 数据集卡片
## 概览
受LLaVA-NeXT启发,我们采用了数据高效的监督微调(Supervised Fine-Tuning,SFT)策略训练InternVL-Chat-V1-2,总计使用约120万份视觉指令微调样本,所有样本均完全开源。从宏观层面而言,我们基于ShareGPT-4V进行构建,并额外集成了LLaVA-ZH、DVQA、ChartQA、AI2D、DocVQA、GeoQA+以及SynthDoG-EN数据集。绝大多数数据集与LLaVA-NeXT保持一致。
## 引用说明
若您在研究工作中使用本数据集,请引用如下文献:
@article{chen2023internvl,
title={InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks},
author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and Li, Bin and Luo, Ping and Lu, Tong and Qiao, Yu and Dai, Jifeng},
journal={arXiv preprint arXiv:2312.14238},
year={2023}
}
@article{chen2024far,
title={How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites},
author={Chen, Zhe and Wang, Weiyun and Tian, Hao and Ye, Shenglong and Gao, Zhangwei and Cui, Erfei and Tong, Wenwen and Hu, Kongzhi and Luo, Jiapeng and Ma, Zheng and others},
journal={arXiv preprint arXiv:2404.16821},
year={2024}
}
提供机构:
maas
创建时间:
2024-12-26



