five

JavisVerse/AV-FineTune

收藏
Hugging Face2026-01-03 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/JavisVerse/AV-FineTune
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- ## <div align="center"> JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation</div> <div align="center"> [[`HomePage`](https://javisverse.github.io/JavisGPT-page/)] [[`Paper`](https://arxiv.org/abs/2512.22905)] [[`GitHub`](https://github.com/JavisVerse/JavisGPT)] </div> ## TL;DR We introduce **`JavisGPT`**, a multimodal LLM that can understand audiovisual inputs and simultaneously generate synchronized sounding videos in a unified model. We also curate the **`JavisInst-Omni`** dataset to facilitate instruction-tuning for comprehension and generation on sounding videos. ## 📰 News - **[2025.12.30]** 🚀 We release the training dataset of [JavisInst-Omni](https://huggingface.co/datasets/JavisVerse/JavisInst-Omni) to support multimodal instruction tuning on sounding video comprehension and generation tasks, as well as [MM-PreTrain](https://huggingface.co/datasets/JavisVerse/MM-PreTrain) and [AV-FineTune](https://huggingface.co/datasets/JavisVerse/AV-FineTune) datasets to enable preliminary multimodal alignment for LLMs. - **[2025.12.26]** 🔥 We release the code of [JavisGPT](https://arxiv.org/abs/2512.22905), with the preview [JavisGPT-v0.1-7B-Instruct](https://huggingface.co/JavisVerse/JavisGPT-v0.1-7B-Instruct) checkpoint at huggingface. Feel free to play with it! ## The `AV-FineTune` Dataset ### Introduction `AV-FineTune` is constructed to perform the second-stage-alignment of understanding and generation capability on sounding videos. The data sources come from [TAVGBench](https://arxiv.org/abs/2404.14381) datasets, and we utilize diversified prompt templates to curate question-answer pairs to support multimodal alignment. For more details please refer to our [paper](https://arxiv.org/abs/2512.22905). ### Usage Download the dataset from [huggingface](https://huggingface.co/datasets/JavisVerse/AV-FineTune): ```bash huggingface-cli download --repo-type dataset JavisVerse/AV-FineTune --local-dir /path/to/AV-FineTune ``` Data source and QA pairs are organized with the `stage2_av_ft.json` meta file, and we also provide the separated understanding and generation instances in `stage2_av_ft_und.json` and `stage2_av_ft_gen.json`, respectively. However, we cannot release the source data of [TAVGBench](https://arxiv.org/abs/2404.14381) due to policy issues. Instead, the video_ids (formatted with `{youtube_id}_{start_time}_{end_time}`) are provided in [`video_ids.txt`](video_ids.txt), and users can refer to [TAVGBench](https://github.com/OpenNLPLab/TAVGBench) to download raw videos. ## Citation If you find JavisGPT is useful and use it in your project, please kindly cite: ``` @inproceedings{liu2025javisgpt, title={JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation}, author={Kai Liu and Jungang Li and Yuchong Sun and Shengqiong Wu and jianzhang gao and Daoan Zhang and Wei Zhang and Sheng Jin and Sicheng Yu and Geng Zhan and Jiayi Ji and Fan Zhou and Liang Zheng and Shuicheng YAN and Hao Fei and Tat-Seng Chua}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, } ```
提供机构:
JavisVerse
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作