AF-Chat

Name: AF-Chat
Creator: maas
Published: 2026-01-06 16:38:36
License: 暂无描述

魔搭社区2026-01-06 更新2025-07-19 收录

下载链接：

https://modelscope.cn/datasets/nv-community/AF-Chat

下载链接

链接失效反馈

官方服务：

资源简介：

# AF-Chat Dataset [Project page](https://research.nvidia.com/labs/adlr/AF3/) | [Paper](https://huggingface.co/papers/2507.08128) | [Code](https://github.com/NVIDIA/audio-flamingo/tree/audio_flamingo_3) ## Dataset Description **AF-Chat** is a high-quality fine-tuning dataset of **~75K** multi-turn, multi-audio conversations (avg. 4.6 clips & 6.2 turns; range 2–8 clips & 2–10 turns) spanning speech, environmental sounds, and music. The dataset is partitioned into subsets based on each audio’s source dataset: 1. **Sound (`sound.json`)** - Domain: Sound and Speech - Additional Note: Audios are primarily sourced from YouTube8m and AudioSet, both which can be downloaded from https://github.com/JishengBai/AudioSetCaps. If any audio is not found, please contact corresponding authors. 2. **Music4ALL (`Music4ALL.json`)** - Domain: Music - Link to original dataset: https://github.com/amaai-lab/Music4All - Additional Note: Please email the corresponding authors with approved license for access to this JSON. 3. **Million Song Dataset (`MSD.json`)** - Domain: Music - Link to original dataset: http://millionsongdataset.com/. By releasing AF-Chat, researchers can train models for multi-turn, multi-audio chat. **Please note: we only provide the text QA annotations—not the audio files themselves. You must download each clip from its original source (e.g., YouTube-8M, AudioSet, Music4All) using the file name in the `"sound"` field of the JSON. In conversations, a tag like `<sound-i>` refers to the *i*-th item in that list. We recognize this lookup can be cumbersome; if you run into issues, please open an issue or contact the corresponding authors for assistance.** ## Dataset Owner(s) NVIDIA Corporation ## Dataset Creation Date 2025/07/10 ## License / Terms of Use The use of AF-Chat is governed by the [NVIDIA OneWay Noncommercial License](licenses/NVIDIA%20OneWay%20Noncommercial%20License.docx). Synthetic data generation may be subject to OpenAI’s [Terms of Use](https://openai.com/policies/terms-of-use) and [Qwen Research License](https://huggingface.co/Qwen/Qwen2.5-7B/blob/main/LICENSE). Additionally, audios may be governed by its own dataset license, which users should review before downloading or using the audio content. ## Intended Usage AF-Chat is intended to support: - Training and fine-tuning (large) audio-language models for multi-turn, multi-audio chat/dialogue. ## Dataset Characterization The dataset has no special characterization. Each example is a pair of a long clip and a corresponding QA item. Audio encompasses environmental sounds, speech (primarily English), and music. Audios are sourced from open-source datasets (see Table 8 in paper). Text QA is generated using a variety of methods mentioned in the paper. Metadata from the original datasets (if available) is used to for QA generation. ## Data Curation Method - Audio is drawn from open-source datasets. - Metadata (captions, transcripts, tags) is gathered from each source. Additional meta-data, if required, is generated. - For each seed audio, we retrieve its top 8 semantically similar and 8 dissimilar clips using NV-Embed-v2 embeddings and FAISS clustering. - An LLM is prompted with expert exemplars and clustering constraints to produce natural multi-turn, multi-audio dialogues. - Human-in-the-loop refinement: clustering parameters, prompts, and data sources are iteratively tuned based on model outputs and qualitative feedback. ## Data Collection Method Hybrid: Human, Synthetic and Automated ## Labeling Method Synthetic ## Dataset Format - **Modality**: Audio (WAV/MP3/FLAC) + Text (JSON) - **JSON Schema Example**: ```json [ { "id": "Arbitary ID", "sound": "List of wav files.", "conversations": [ { "from": "human", "value": "<sound-i> The Question." }, { "from": "gpt", "value": "The Answer." } ] }, ] ``` ## Reference(s): - Audio Flamingo 3 ``` @misc{goel2025audioflamingo3advancing, title={Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models}, author={Arushi Goel and Sreyan Ghosh and Jaehyeon Kim and Sonal Kumar and Zhifeng Kong and Sang-gil Lee and Chao-Han Huck Yang and Ramani Duraiswami and Dinesh Manocha and Rafael Valle and Catanzaro, Bryan}, year={2025}, eprint={2507.08128}, archivePrefix={arXiv}, primaryClass={cs.SD}, url={https://arxiv.org/abs/2507.08128}, } ``` - Audio Flamingo ``` @inproceedings{kong2024audio, title={Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities}, author={Kong, Zhifeng and Goel, Arushi and Badlani, Rohan and Ping, Wei and Valle, Rafael and Catanzaro, Bryan}, booktitle={International Conference on Machine Learning}, pages={25125--25148}, year={2024}, organization={PMLR} } ``` - Audio Flamingo 2 ``` @article{ghosh2025audio, title={Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities}, author={Ghosh, Sreyan and Kong, Zhifeng and Kumar, Sonal and Sakshi, S and Kim, Jaehyeon and Ping, Wei and Valle, Rafael and Manocha, Dinesh and Catanzaro, Bryan}, journal={arXiv preprint arXiv:2503.03983}, year={2025} } ``` ## Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

# AF-Chat 数据集 [项目页面](https://research.nvidia.com/labs/adlr/AF3/) | [论文](https://huggingface.co/papers/2507.08128) | [代码](https://github.com/NVIDIA/audio-flamingo/tree/audio_flamingo_3) ## 数据集描述 **AF-Chat** 是一个高质量的微调数据集，包含约7.5万轮多模态多音频对话（平均每对话含4.6个音频片段、6.2轮交互；区间为2~8个片段、2~10轮交互），涵盖语音、环境音与音乐三类内容。该数据集根据每个音频的来源数据集划分为以下子集： 1. **音频（sound.json）** - 领域：环境音与语音 - 补充说明：音频主要源自YouTube8M与AudioSet，二者均可通过https://github.com/JishengBai/AudioSetCaps下载。若存在无法获取的音频，请联系通讯作者。 2. **Music4ALL（Music4ALL.json）** - 领域：音乐 - 原始数据集链接：https://github.com/amaai-lab/Music4All - 补充说明：如需获取该JSON文件，请发送邮件至通讯作者并获得许可协议授权。 3. **百万歌曲数据集（MSD.json）** - 领域：音乐 - 原始数据集链接：http://millionsongdataset.com/。发布AF-Chat的目的是支持研究者训练面向多轮、多音频对话的模型。**请注意：本数据集仅提供文本问答标注，不包含音频文件本身。您需根据JSON文件中"sound"字段内的文件名，从原始来源（如YouTube-8M、AudioSet、Music4ALL）下载对应音频片段。在对话中，形如`<sound-i>`的标签指代该列表中的第i个音频项。我们深知该检索流程较为繁琐；若遇到问题，请提交Issue或联系通讯作者获取协助。** ## 数据集所有者 NVIDIA 公司（NVIDIA Corporation） ## 数据集创建日期 2025/07/10 ## 许可与使用条款 AF-Chat的使用受[NVIDIA OneWay 非商业许可协议](licenses/NVIDIA%20OneWay%20Noncommercial%20License.docx)约束。合成数据生成可能需遵守OpenAI的[使用条款](https://openai.com/policies/terms-of-use)与[Qwen研究许可协议](https://huggingface.co/Qwen/Qwen2.5-7B/blob/main/LICENSE)。此外，音频内容可能受其原始数据集许可协议约束，用户在下载或使用音频前应自行查阅相关条款。 ## 预期用途 AF-Chat旨在支持： - 面向多轮、多音频对话/交互的（大）音频语言模型的训练与微调。 ## 数据集特征本数据集无特殊特征描述。每个示例均为长音频片段与对应问答项的配对。音频涵盖环境音、语音（主要为英语）与音乐。音频源自开源数据集（详见论文表8）。文本问答通过论文中提及的多种方法生成。原始数据集的元数据（若有）被用于问答生成流程。 ## 数据整理方法 - 音频从开源数据集采集。 - 从各来源收集元数据（字幕、转录文本、标签），如需额外元数据则进行生成。 - 针对每个种子音频，我们使用NV-Embed-v2嵌入与FAISS聚类，检索出与其语义最相似的8个片段与最不相似的8个片段。 - 向大语言模型（Large Language Model，简称LLM）提供专家示例与聚类约束提示，以生成自然的多轮多音频对话。 - 人机协同优化：基于模型输出与定性反馈，迭代调整聚类参数、提示词与数据源。 ## 数据采集方法混合模式：人工、合成与自动化 ## 标注方法合成式 ## 数据集格式 - **模态**：音频（WAV/MP3/FLAC）+ 文本（JSON） - **JSON 模式示例**： json [ { "id": "任意ID", "sound": "音频文件列表", "conversations": [ { "from": "human", "value": "<sound-i> 问题内容。" }, { "from": "gpt", "value": "回答内容。" } ] }, ] ## 参考文献 - Audio Flamingo 3 bibtex @misc{goel2025audioflamingo3advancing, title={Audio Flamingo 3: 面向完全开源的大型音频语言模型的音频智能进阶}, author={Arushi Goel and Sreyan Ghosh and Jaehyeon Kim and Sonal Kumar and Zhifeng Kong and Sang-gil Lee and Chao-Han Huck Yang and Ramani Duraiswami and Dinesh Manocha and Rafael Valle and Catanzaro, Bryan}, year={2025}, eprint={2507.08128}, archivePrefix={arXiv}, primaryClass={cs.SD}, url={https://arxiv.org/abs/2507.08128}, } - Audio Flamingo bibtex @inproceedings{kong2024audio, title={Audio Flamingo: 具备少样本学习与对话能力的新型音频语言模型}, author={Kong, Zhifeng and Goel, Arushi and Badlani, Rohan and Ping, Wei and Valle, Rafael and Catanzaro, Bryan}, booktitle={International Conference on Machine Learning}, pages={25125--25148}, year={2024}, organization={PMLR} } - Audio Flamingo 2 bibtex @article{ghosh2025audio, title={Audio Flamingo 2: 具备长音频理解与专家推理能力的音频语言模型}, author={Ghosh, Sreyan and Kong, Zhifeng and Kumar, Sonal and Sakshi, S and Kim, Jaehyeon and Ping, Wei and Valle, Rafael and Manocha, Dinesh and Catanzaro, Bryan}, journal={arXiv preprint arXiv:2503.03983}, year={2025} } ## 伦理考量 NVIDIA 认为可信AI是一项共同责任，我们已制定相关政策与实践，以支持各类AI应用的开发。开发者在符合本服务条款的前提下下载或使用本模型时，应与其内部模型团队协作，确保本模型满足相关行业与应用场景的要求，并应对潜在的产品误用问题。请通过[此链接](https://www.nvidia.com/en-us/support/submit-security-vulnerability/)报告安全漏洞或NVIDIA AI相关问题。

提供机构：

maas

创建时间：

2025-07-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集