ShareGPT4V
收藏魔搭社区2026-05-24 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/ShareGPT4V
下载链接
链接失效反馈官方服务:
资源简介:
# News
**[2024/5/8]** We released **[ShareGPT4Video](https://sharegpt4video.github.io/)**, a large-scale video-caption dataset, with **40K** captions annotated by GPT4V and **4.8M** captions annotated by our ShareCaptioner-Video. The total videos last with **300** hours and **3000** hours separately!
# ShareGPT4V 1.2M Dataset Card
## Dataset details
**Dataset type:**
ShareGPT4V Captions 1.2M is a set of GPT4-Vision-powered multi-modal captions data.
It is constructed to enhance modality alignment and fine-grained visual concept perception in Large Multi-Modal Models (LMMs) during both the pre-training and supervised fine-tuning stages. This advancement aims to bring LMMs towards GPT4-Vision capabilities.
* sharegpt4v_instruct_gpt4-vision_cap100k.json is generated by GPT4-Vision (ShareGPT4V).
* share-captioner_coco_lcs_sam_1246k_1107.json is generated by our Share-Captioner trained on GPT4-Vision-generated data (ShareGPT4V-PT).
* sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k.json is curated from sharegpt4v_instruct_gpt4-vision_cap100k.json for the supervised fine-tuning stage.
**Dataset date:**
ShareGPT4V Captions 1.2M was collected in 11.07 2023.
**Paper or resources for more information:**
[[Project](https://ShareGPT4V.github.io/)] [[Paper](https://huggingface.co/papers/2311.12793)] [[Code](https://github.com/ShareGPT4Omni/ShareGPT4V)]
**License:**
Attribution-NonCommercial 4.0 International
It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use
## Intended use
**Primary intended uses:**
The primary use of ShareGPT4V Captions 1.2M is research on large multimodal models and chatbots.
**Primary intended users:**
The primary intended users of this dataset are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
# 新闻动态
**[2024/5/8]** 我们发布了**[ShareGPT4Video](https://sharegpt4video.github.io/)**,一款大规模视频字幕数据集,其中包含**4万条**由GPT4V标注的字幕,以及**480万条**由我们的ShareCaptioner-Video生成的字幕。两类视频总时长分别达到**300小时**与**3000小时**!
# ShareGPT4V 120万数据集卡片
## 数据集详情
**数据集类型:**
ShareGPT4V Captions 1.2M是一套基于GPT-4视觉(GPT4-Vision)构建的多模态字幕数据。
本数据集旨在提升多模态大模型(Large Multi-Modal Models, LMMs)在预训练与监督微调阶段的模态对齐能力与细粒度视觉概念感知水平,以期推动多模态大模型达到GPT-4视觉的性能水准。
* `sharegpt4v_instruct_gpt4-vision_cap100k.json` 由GPT-4视觉(GPT4-Vision,即ShareGPT4V)生成。
* `share-captioner_coco_lcs_sam_1246k_1107.json` 由我们基于GPT-4视觉生成数据(ShareGPT4V-PT)训练的Share-Captioner生成。
* `sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k.json` 从`sharegpt4v_instruct_gpt4-vision_cap100k.json`中筛选得到,用于监督微调阶段。
**数据集采集时间:**
ShareGPT4V Captions 1.2M采集于2023年11月7日。
**更多信息的论文与资源:**
[[项目主页](https://ShareGPT4V.github.io/)] [[论文](https://huggingface.co/papers/2311.12793)] [[代码](https://github.com/ShareGPT4Omni/ShareGPT4V)]
**许可协议:**
知识共享署名-非商业性使用4.0国际许可协议(Attribution-NonCommercial 4.0 International)
同时需遵守OpenAI的相关政策:https://openai.com/policies/terms-of-use
## 预期用途
**主要预期用途:**
ShareGPT4V Captions 1.2M的主要用途为多模态大模型与聊天机器人相关研究。
**主要预期用户:**
本数据集的主要目标用户为计算机视觉、自然语言处理、机器学习与人工智能领域的研究人员与爱好者。
提供机构:
maas
创建时间:
2024-05-09



