ShareGPT4Video
收藏魔搭社区2026-01-08 更新2024-06-29 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/ShareGPT4Video
下载链接
链接失效反馈官方服务:
资源简介:
# ShareGPT4Video 4.8M Dataset Card
## Dataset details
**Dataset type:**
ShareGPT4Video Captions 4.8M is a set of GPT4-Vision-powered multi-modal captions data of videos.
It is constructed to enhance modality alignment and fine-grained visual concept perception in Large Video-Language Models (LVLMs) and Text-to-Video Models (T2VMs). This advancement aims to bring LVLMs and T2VMs towards the capabilities of GPT4V and Sora.
* sharegpt4video_40k.jsonl is generated by GPT4-Vision (ShareGPT4Video).
* share-captioner-video_mixkit-pexels-pixabay_4814k_0417.json is generated by our ShareCaptioner-Video trained on GPT4-Vision-generated video-caption pairs.
* sharegpt4video_mix181k_vqa-153k_share-cap-28k.json is curated from sharegpt4video_instruct_gpt4-vision_cap40k.json for the supervised fine-tuning stage of LVLMs.
* llava_v1_5_mix665k_with_video_chatgpt72k_share4video28k.json has replaced 28K detailed-caption-related data in VideoChatGPT with 28K high-quality captions from ShareGPT4Video. This file is utilized to validate the effectiveness of high-quality captions under the VideoLLaVA and LLaMA-VID models.
**Dataset date:**
ShareGPT4Video Captions 4.8M was collected in 4.17 2024.
**Paper or resources for more information:**
[[Project](https://ShareGPT4Video.github.io/)] [[Paper](https://arxiv.org/abs/2406.04325v1)] [[Code](https://github.com/ShareGPT4Omni/ShareGPT4Video)] [[ShareGPT4Video-8B](https://huggingface.co/Lin-Chen/sharegpt4video-8b)]
**License:**
Attribution-NonCommercial 4.0 International
It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use
## Intended use
**Primary intended uses:**
The primary use of ShareGPT4Video Captions 4.8M is research on large multimodal models and text-to-video models.
**Primary intended users:**
The primary intended users of this dataset are researchers and hobbyists in computer vision, natural language processing, machine learning, AIGC, and artificial intelligence.
## Paper
arxiv.org/abs/2406.04325
# ShareGPT4Video 4.8M 数据集卡片
## 数据集详情
**数据集类型:**
ShareGPT4Video Captions 4.8M 是一组由GPT4-Vision驱动的多模态视频字幕数据集。
本数据集旨在提升大型视频语言模型(Large Video-Language Models, LVLMs)与文本到视频模型(Text-to-Video Models, T2VMs)的模态对齐能力与细粒度视觉概念感知能力,以期推动这类模型向GPT4V与Sora的性能边界迈进。
* `sharegpt4video_40k.jsonl` 由GPT4-Vision生成(即ShareGPT4Video)。
* `share-captioner-video_mixkit-pexels-pixabay_4814k_0417.json` 由我们基于GPT4-Vision生成的视频字幕对训练得到的ShareCaptioner-Video生成。
* `sharegpt4video_mix181k_vqa-153k_share-cap-28k.json` 从`sharegpt4video_instruct_gpt4-vision_cap40k.json`中整理而来,用于大型视频语言模型的监督微调阶段。
* `llava_v1_5_mix665k_with_video_chatgpt72k_share4video28k.json` 将VideoChatGPT中的2.8万条细粒度字幕相关数据替换为ShareGPT4Video提供的2.8万条高质量字幕,该文件用于验证高质量字幕在VideoLLaVA与LLaMA-VID模型上的有效性。
**数据集采集时间:**
ShareGPT4Video Captions 4.8M 于2024年4月17日完成采集。
**更多信息的论文与资源:**
[[项目页](https://ShareGPT4Video.github.io/)] [[论文](https://arxiv.org/abs/2406.04325v1)] [[代码仓库](https://github.com/ShareGPT4Omni/ShareGPT4Video)] [[ShareGPT4Video-8B 模型](https://huggingface.co/Lin-Chen/sharegpt4video-8b)]
**许可协议:**
署名-非商业性使用4.0国际许可(Attribution-NonCommercial 4.0 International),需遵守OpenAI官方政策:https://openai.com/policies/terms-of-use
## 预期用途
**主要用途:**
ShareGPT4Video Captions 4.8M 主要用于大型多模态模型与文本到视频模型的相关研究。
**目标用户:**
本数据集的目标用户为计算机视觉、自然语言处理、机器学习、AIGC以及人工智能领域的研究人员与爱好者。
## 相关论文
arxiv.org/abs/2406.04325
提供机构:
maas
创建时间:
2024-06-21



