Dalision/Omni2Sound_Result

Name: Dalision/Omni2Sound_Result
Creator: Dalision
Published: 2026-04-24 14:57:20
License: 暂无描述

Hugging Face2026-04-24 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/Dalision/Omni2Sound_Result

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 language: - en tags: - audio-generation - evaluation - video-to-audio - text-to-audio - benchmark-results task_categories: - text-to-audio --- <h1 align="center">Omni2Sound Evaluation Results</h1> <p align="center"> <a href="https://arxiv.org/pdf/2601.02731"><img src="https://img.shields.io/badge/arXiv-2601.02731-red"></a> <a href="https://omni2sound.github.io/"><img src="https://img.shields.io/badge/Project-Page-blue"></a> <a href="https://github.com/omni2sound/Omni2Sound"><img src="https://img.shields.io/badge/GitHub-Code-black"></a> <a href="https://huggingface.co/Dalision/Omni2Sound"><img src="https://img.shields.io/badge/HF-Model-yellow"></a> </p> <p align="center"> <b>CVPR 2026 (Highlight)</b> </p> ## Overview This repository contains the evaluation results of [Omni2Sound](https://huggingface.co/Dalision/Omni2Sound) on three sub-tasks: - **VT2A** (Video + Text → Audio) - **V2A** (Video → Audio) - **T2A** (Text → Audio) All results are evaluated on the [VGGSound-Omni benchmark](https://huggingface.co/datasets/Dalision/Omni2Sound_Benchmark) and stored as JSON files for reproducibility. ## Evaluation Setup **Benchmark**: [Dalision/Omni2Sound_Benchmark](https://huggingface.co/datasets/Dalision/Omni2Sound_Benchmark) (VGGSound-Omni) **Evaluation Toolkit**: [AV-Benchmark](https://github.com/hkchengrex/av-benchmark) — the standardized evaluation toolkit from MMAudio, applied on 8-second clips following prior work. **Metrics** cover four dimensions: | Dimension | Metrics | |---|---| | Distribution Matching | FAD, FD, FD_PaSST, KL, KL_PaSST | | Audio Quality | IS, IS_PaSST, PQ (Production Quality) | | Semantic Alignment | CLAP, MS-CLAP (text-audio), IB / ImageBind (video-audio) | | Temporal Alignment | DS / Desynchronization Score (Synchformer) | All baseline models are re-evaluated using their official checkpoints with the same standardized toolkit and identical video/text conditions for fair comparison. ## Links - **Model**: [Dalision/Omni2Sound](https://huggingface.co/Dalision/Omni2Sound) - **Benchmark & Dataset**: [Dalision/Omni2Sound_Benchmark](https://huggingface.co/datasets/Dalision/Omni2Sound_Benchmark) - **Evaluation Toolkit**: [hkchengrex/av-benchmark](https://github.com/hkchengrex/av-benchmark) - **Paper**: [arXiv:2601.02731](https://arxiv.org/pdf/2601.02731) - **Project Page**: [omni2sound.github.io](https://omni2sound.github.io/) - **Code**: [github.com/omni2sound/Omni2Sound](https://github.com/omni2sound/Omni2Sound) ## Citation ```bibtex @article{dai2026omni2sound, title = {Omni2Sound: Towards Unified Video-Text-to-Audio Generation}, author = {Dai, Yusheng and Chen, Zehua and Jiang, Yuxuan and Gao, Baolong and Ke, Qiuhong and Cai, Jianfei and Zhu, Jun }, journal = {arXiv preprint arXiv:2601.02731}, year = {2026} } ``` ## License Released under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) (non-commercial use only).

提供机构：

Dalision

5,000+

优质数据集

54 个

任务类型

进入经典数据集