JavisVerse/AV-FineTune

Name: JavisVerse/AV-FineTune
Creator: JavisVerse
Published: 2026-01-03 07:54:35
License: 暂无描述

Hugging Face2026-01-03 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/JavisVerse/AV-FineTune

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 --- ## <div align="center"> JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation</div> <div align="center"> [[`HomePage`](https://javisverse.github.io/JavisGPT-page/)] [[`Paper`](https://arxiv.org/abs/2512.22905)] [[`GitHub`](https://github.com/JavisVerse/JavisGPT)] </div> ## TL;DR We introduce **`JavisGPT`**, a multimodal LLM that can understand audiovisual inputs and simultaneously generate synchronized sounding videos in a unified model. We also curate the **`JavisInst-Omni`** dataset to facilitate instruction-tuning for comprehension and generation on sounding videos. ## 📰 News - **[2025.12.30]** 🚀 We release the training dataset of [JavisInst-Omni](https://huggingface.co/datasets/JavisVerse/JavisInst-Omni) to support multimodal instruction tuning on sounding video comprehension and generation tasks, as well as [MM-PreTrain](https://huggingface.co/datasets/JavisVerse/MM-PreTrain) and [AV-FineTune](https://huggingface.co/datasets/JavisVerse/AV-FineTune) datasets to enable preliminary multimodal alignment for LLMs. - **[2025.12.26]** 🔥 We release the code of [JavisGPT](https://arxiv.org/abs/2512.22905), with the preview [JavisGPT-v0.1-7B-Instruct](https://huggingface.co/JavisVerse/JavisGPT-v0.1-7B-Instruct) checkpoint at huggingface. Feel free to play with it! ## The `AV-FineTune` Dataset ### Introduction `AV-FineTune` is constructed to perform the second-stage-alignment of understanding and generation capability on sounding videos. The data sources come from [TAVGBench](https://arxiv.org/abs/2404.14381) datasets, and we utilize diversified prompt templates to curate question-answer pairs to support multimodal alignment. For more details please refer to our [paper](https://arxiv.org/abs/2512.22905). ### Usage Download the dataset from [huggingface](https://huggingface.co/datasets/JavisVerse/AV-FineTune): ```bash huggingface-cli download --repo-type dataset JavisVerse/AV-FineTune --local-dir /path/to/AV-FineTune ``` Data source and QA pairs are organized with the `stage2_av_ft.json` meta file, and we also provide the separated understanding and generation instances in `stage2_av_ft_und.json` and `stage2_av_ft_gen.json`, respectively. However, we cannot release the source data of [TAVGBench](https://arxiv.org/abs/2404.14381) due to policy issues. Instead, the video_ids (formatted with `{youtube_id}_{start_time}_{end_time}`) are provided in [`video_ids.txt`](video_ids.txt), and users can refer to [TAVGBench](https://github.com/OpenNLPLab/TAVGBench) to download raw videos. ## Citation If you find JavisGPT is useful and use it in your project, please kindly cite: ``` @inproceedings{liu2025javisgpt, title={JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation}, author={Kai Liu and Jungang Li and Yuchong Sun and Shengqiong Wu and jianzhang gao and Daoan Zhang and Wei Zhang and Sheng Jin and Sicheng Yu and Geng Zhan and Jiayi Ji and Fan Zhou and Liang Zheng and Shuicheng YAN and Hao Fei and Tat-Seng Chua}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, } ```

提供机构：

JavisVerse

5,000+

优质数据集

54 个

任务类型

进入经典数据集