five

Dalision/Omni2Sound_Result

收藏
Hugging Face2026-04-24 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/Dalision/Omni2Sound_Result
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 language: - en tags: - audio-generation - evaluation - video-to-audio - text-to-audio - benchmark-results task_categories: - text-to-audio --- <h1 align="center">Omni2Sound Evaluation Results</h1> <p align="center"> <a href="https://arxiv.org/pdf/2601.02731"><img src="https://img.shields.io/badge/arXiv-2601.02731-red"></a> <a href="https://omni2sound.github.io/"><img src="https://img.shields.io/badge/Project-Page-blue"></a> <a href="https://github.com/omni2sound/Omni2Sound"><img src="https://img.shields.io/badge/GitHub-Code-black"></a> <a href="https://huggingface.co/Dalision/Omni2Sound"><img src="https://img.shields.io/badge/HF-Model-yellow"></a> </p> <p align="center"> <b>CVPR 2026 (Highlight)</b> </p> ## Overview This repository contains the evaluation results of [Omni2Sound](https://huggingface.co/Dalision/Omni2Sound) on three sub-tasks: - **VT2A** (Video + Text → Audio) - **V2A** (Video → Audio) - **T2A** (Text → Audio) All results are evaluated on the [VGGSound-Omni benchmark](https://huggingface.co/datasets/Dalision/Omni2Sound_Benchmark) and stored as JSON files for reproducibility. ## Evaluation Setup **Benchmark**: [Dalision/Omni2Sound_Benchmark](https://huggingface.co/datasets/Dalision/Omni2Sound_Benchmark) (VGGSound-Omni) **Evaluation Toolkit**: [AV-Benchmark](https://github.com/hkchengrex/av-benchmark) — the standardized evaluation toolkit from MMAudio, applied on 8-second clips following prior work. **Metrics** cover four dimensions: | Dimension | Metrics | |---|---| | Distribution Matching | FAD, FD, FD_PaSST, KL, KL_PaSST | | Audio Quality | IS, IS_PaSST, PQ (Production Quality) | | Semantic Alignment | CLAP, MS-CLAP (text-audio), IB / ImageBind (video-audio) | | Temporal Alignment | DS / Desynchronization Score (Synchformer) | All baseline models are re-evaluated using their official checkpoints with the same standardized toolkit and identical video/text conditions for fair comparison. ## Links - **Model**: [Dalision/Omni2Sound](https://huggingface.co/Dalision/Omni2Sound) - **Benchmark & Dataset**: [Dalision/Omni2Sound_Benchmark](https://huggingface.co/datasets/Dalision/Omni2Sound_Benchmark) - **Evaluation Toolkit**: [hkchengrex/av-benchmark](https://github.com/hkchengrex/av-benchmark) - **Paper**: [arXiv:2601.02731](https://arxiv.org/pdf/2601.02731) - **Project Page**: [omni2sound.github.io](https://omni2sound.github.io/) - **Code**: [github.com/omni2sound/Omni2Sound](https://github.com/omni2sound/Omni2Sound) ## Citation ```bibtex @article{dai2026omni2sound, title = {Omni2Sound: Towards Unified Video-Text-to-Audio Generation}, author = {Dai, Yusheng and Chen, Zehua and Jiang, Yuxuan and Gao, Baolong and Ke, Qiuhong and Cai, Jianfei and Zhu, Jun }, journal = {arXiv preprint arXiv:2601.02731}, year = {2026} } ``` ## License Released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) (non-commercial use only).
提供机构:
Dalision
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作