AF-Think
收藏魔搭社区2026-01-06 更新2025-07-19 收录
下载链接:
https://modelscope.cn/datasets/nv-community/AF-Think
下载链接
链接失效反馈官方服务:
资源简介:
# AF-Think Dataset
[Project page](https://research.nvidia.com/labs/adlr/AF3/) | [Paper](https://huggingface.co/papers/2507.08128) | [Code](https://github.com/NVIDIA/audio-flamingo)
## Dataset Description
**AF-Think** is a lightweight, on-demand reasoning dataset designed to teach concise chain-of-thought (CoT)-type reasoning to (large) audio-language models. It contains **500K** multiple-choice and open-ended audio QA triplets, where each answer is augmented with a short thought prefix that precedes the answer and a special suffix to trigger thinking only when requested. AF-Think examples are sampled from both AudioSkills-XL and LongAudio-XL to cover diverse audio lengths and reasoning skills. The dataset is partitioned into subsets based on each audio’s source dataset:
1. **UrbanSound8K (`UrbanSound8K.json`)**
- Domain: Sound
- Link to original dataset: https://urbansounddataset.weebly.com/urbansound8k.html
2. **MusicCaps (`MusicCaps.json`)**
- Domain: Sound
- Link to original dataset: https://huggingface.co/datasets/google/MusicCaps
3. **MSD (`MSD.json`)**
- Domain: Music
- Link to original dataset: http://millionsongdataset.com/
4. **Freesound (`Freesound.json`)**
- Domain: Sound
- Link to original dataset: https://freesound.org
- Additional Note: Can also be downloaded from https://github.com/XinhaoMei/WavCaps
5. **CochlScene (`CochlScene.json`)**
- Domain: Sound
- Link to original dataset: https://github.com/cochlearai/cochlscene
6. **AudioSet_SL (`AudioSet_SL.json`)**
- Domain: Sound
- Link to original dataset: https://research.google.com/audioset/ Can also be downloaded from https://github.com/JishengBai/AudioSetCaps
7. **WavText5K (`WavText5K.json`)**
- Domain: Sound
- Link to original dataset: https://github.com/microsoft/WavText5K
8. **MELD (`MELD.json`)**
- Domain: Speech
- Link to original dataset: https://github.com/declare-lab/MELD
- Additional Note: The entire non-segmented original episodes are treated as the corresponding audios.
9. **AudioSet (`AudioSet.json`)**
- Domain: Sound
- Link to original dataset: https://research.google.com/audioset/ Can also be downloaded from https://github.com/JishengBai/AudioSetCaps
10. **TUT_Urban (`TUT_Urban.json`)**
- Domain: Sound
- Link to original dataset: https://dcase-repo.github.io/dcase_datalist/datasets/scenes/tut_asc_2018_mobile_eval.html
11. **Switchboard (`Switchboard.json`)**
- Domain: Speech
- Link to original dataset: https://catalog.ldc.upenn.edu/LDC97S62
- Additional Note: Combine each audio in the list in the exact order for the corresponding audio.
12. **SoundDescs (`SoundDescs.json`)**
- Domain: Sound
- Link to original dataset: https://github.com/akoepke/audio-retrieval-benchmark
13. **Fisher (`Fisher.json`)**
- Domain: Speech
- Link to original dataset: https://catalog.ldc.upenn.edu/LDC2004T19
- Additional Note: Each audio file is named in the format `file_start_end.wav`. Segment the original wav by the start and end time.
14. **ESC-50 (`ESC-50.json`)**
- Domain: Sound
- Link to original dataset: https://github.com/karolpiczak/ESC-50
15. **Clotho-v2 (`Clotho-v2.json`)**
- Domain: Sound
- Link to original dataset: https://zenodo.org/records/4783391
16. **BBC Sound Effects (`BBC_Sound_Effects.json`)**
- Domain: Sound
- Link to original dataset: https://sound-effects.bbcrewind.co.uk/
17. **YouTube-8M (`YouTube8M.json`)**
- Domain: Sound, Speech
- Link to original dataset: https://research.google.com/youtube8m/ Can also be downloaded from https://github.com/JishengBai/AudioSetCaps
18. **Medley-solos-DB (`Medley-solos-DB.json`)**
- Domain: Music
- Link to original dataset: https://zenodo.org/records/3464194
19. **MACS (`MACS.json`)**
- Domain: Sound
- Link to original dataset: https://zenodo.org/records/5114771
20. **Europarl (`Europarl.json`)**
- Domain: Speech
- Link to original dataset: https://www.statmt.org/europarl/
- Additional Note: Combine each audio in the list in the exact order for the corresponding audio.
21. **VoxPopuli (`VoxPopuli.json`)**
- Domain: Speech
- Link to original dataset: https://github.com/facebookresearch/voxpopuli
- Additional Note: Combine each audio in the list in the exact order for the corresponding audio.
22. **Music4ALL (`Music4ALL.json`)**
- Domain: Music
- Link to original dataset: https://github.com/amaai-lab/Music4All
- Additional Note: Please email the corresponding authors with approved license for access to this JSON.
23. **MultiDialog (`MultiDialog.json`)**
- Domain: Speech
- Link to original dataset: https://huggingface.co/datasets/IVLLab/MultiDialog
- Additional Note: The entire original dialogues are treated as the corresponding audios.
24. **Medley-Pitch-DB (`Medley-Pitch-DB.json`)**
- Domain: Music
- Link to original dataset: https://zenodo.org/records/3464194
25. **LibriSpeech (`LibriSpeech.json`)**
- Domain: Speech
- Link to original dataset: https://www.openslr.org/12/
- Additional Note: Combine each audio in the list in the exact order for the corresponding audio.
26. **IEMOCAP (`IEMOCAP.json`)**
- Domain: Speech
- Link to original dataset: https://sail.usc.edu/iemocap/
- Additional Note: The entire non-segmented original wav files are treated as the corresponding audios.
27. **FSD50k (`FSD50k.json`)**
- Domain: Sound
- Link to original dataset: https://zenodo.org/records/4060432
28. **FMA (`FMA.json`)**
- Domain: Music
- Link to original dataset: https://github.com/mdeff/fma
29. **DailyTalk (`DailyTalk.json`)**
- Domain: Speech
- Link to original dataset: https://github.com/keonlee9420/DailyTalk
- Additional Note: The entire non-segmented original wav files are treated as the corresponding audios.
30. **VGGSound (`VGG.json`)**
- Domain: Sound
- Link to original dataset: https://github.com/amirabd/vggsound
31. **SONNISS (`SONNISS.json`)**
- Domain: Sound
- Link to original dataset: https://sonniss.com/
32. **MagnaTagATune (`MagnaTagATune.json`)**
- Domain: Music
- Link to original dataset: http://mirg.city.ac.uk/codeapps/the-magnatagatune-dataset
33. **GTZAN (`GTZAN.json`)**
- Domain: Music
- Link to original dataset: https://github.com/chittalpatel/Music-Genre-Classification-GTZAN
34. **WavCaps (`WavCaps.json`)**
- Domain: Sound
- Link to original dataset: https://github.com/XinhaoMei/WavCaps
35. **MusicBench (`MusicBench.json`)**
- Domain: Music
- Link to original dataset: https://huggingface.co/datasets/amaai-lab/MusicBench
36. **Chime-Home (`Chime-Home.json`)**
- Domain: Sound
- Link to original dataset: https://archive.org/details/chime-home
37. **Clotho-AQA (`Clotho-AQA.json`)**
- Domain: Sound
- Link to original dataset: https://zenodo.org/records/6473207
38. **NonSpeech7K (`NonSpeech7K.json`)**
- Domain: Sound
- Link to original dataset: https://zenodo.org/records/6967442
39. **SoundBible (`SoundBible.json`)**
- Domain: Sound
- Link to original dataset: http://soundbible.com
By releasing AF-Think, researchers can train models on a broad spectrum of audio reasoning tasks. **Please note that we only provide the text QA annotations. Due to licensing constraints, we do not host the original audio files. Users are responsible for retrieving the corresponding audio clips from their original sources (e.g., YouTube8M, Music4All) using the wav file name from the "sound" tag in the JSONs and dowloading the dataset from the URLs mentioned.**
## Sample Usage
You can load the dataset using the Hugging Face `datasets` library:
```python
from datasets import load_dataset
# Load the 'afthink' configuration of the AF-Think dataset
# This will load all specified data files under the 'afthink' config as separate splits.
dataset = load_dataset("nvidia/AF-Think", "afthink")
# Access a specific split by its name, for example 'urbansound8k':
print(dataset)
print(dataset["urbansound8k"][0])
# Note: The dataset provides JSON annotations. The actual audio files need to be downloaded
# separately from their original sources as described in the "Dataset Description" section.
```
## Dataset Owner(s)
NVIDIA Corporation
## Dataset Creation Date
2025/07/10
## License / Terms of Use
The use of AF-Think is governed by the [NVIDIA OneWay Noncommercial License](licenses/NVIDIA-OneWay-Noncommercial-License_22Mar2022-research.docx).
Synthetic data generation may be subject to OpenAI’s [Terms of Use](https://openai.com/policies/terms-of-use) and [Qwen Research License](https://huggingface.co/Qwen/Qwen2.5-7B/blob/main/LICENSE). Additionally, audios may be governed by its own dataset license, which users should review before downloading or using the audio content.
## Intended Usage
AF-Think is intended to support:
- Training and fine-tuning (large) audio-language models for reasoning over audio and enabling them with thinking abilities.
## Dataset Characterization
AF-Think examples are sampled from both AudioSkills-XL and LongAudio-XL to cover diverse audio lengths and reasoning skills. There is no seperate characterization involved. Each example is a pair of a short audio clip (≤30 s) and a corresponding QA item. Audio encompasses environmental sounds, speech (primarily English), and music. Audios are sourced from open-source datasets (see Table 7 in paper). Text QA is generated using a variety of methods mentioned in the paper. Metadata from the original datasets (if available) is used to for QA generation.
## Data Curation Method
- Audio is drawn from several open-source datasets. Some audios are synthetically generated.
- Available metadata (e.g., captions, transcripts, etc.) from respective datasets is curated. Additional meta-data (if required) is generated (see paper for details).
- LLMs are used to generate QA pairs from the meta-data using expert-designed reasoning prompts.
- Dataset curation had human-in-the-loop, where prompts and data sources were iteratively refined based on model outputs.
## Data Collection Method
Hybrid: Human, Synthetic and Automated
## Labeling Method
Synthetic
## Dataset Format
- **Modality**: Audio (WAV/MP3/FLAC) + Text (JSON)
- **JSON Schema Example**:
```json
[
{
"id": "Arbitary ID",
"sound": "Name of the wav file.",
"conversations": [
{
"from": "human",
"value": "<sound>
The Question."
},
{
"from": "gpt",
"value": "The Answer."
}
]
},
]
```
**Note:** While the `duration` field is accurate in most cases, it may be incorrect in some files and should be treated as a placeholder. If your code relies on audio durations, we recommend recalculating them. Please also note that all QA pairs are intended to correspond to the entire audio clip, not just a segment.
## Reference(s):
- Audio Flamingo 3
```
@misc{goel2025audioflamingo3advancing,
title={Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models},
author={Arushi Goel and Sreyan Ghosh and Jaehyeon Kim and Sonal Kumar and Zhifeng Kong and Sang-gil Lee and Chao-Han Huck Yang and Ramani Duraiswami and Dinesh Manocha and Rafael Valle and Bryan Catanzaro},
year={2025},
eprint={2507.08128},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2507.08128},
}
```
- Audio Flamingo
```
@inproceedings{kong2024audio,
title={Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities},
author={Kong, Zhifeng and Goel, Arushi and Badlani, Rohan and Ping, Wei and Valle, Rafael and Catanzaro, Bryan},
booktitle={International Conference on Machine Learning},
pages={25125--25148},
year={2024},
organization={PMLR}
}
```
- Audio Flamingo 2
```
@article{ghosh2025audio,
title={Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities},
author={Ghosh, Sreyan and Kong, Zhifeng and Kumar, Sonal and Sakshi, S and Kim, Jaehyeon and Ping, Wei and Valle, Rafael and Manocha, Dinesh and Catanzaro, Bryan},
journal={arXiv preprint arXiv:2503.03983},
year={2025}
}
```
## Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
# AF-Think 数据集
[项目页面](https://research.nvidia.com/labs/adlr/AF3/) | [论文](https://huggingface.co/papers/2507.08128) | [代码](https://github.com/NVIDIA/audio-flamingo)
## 数据集描述
**AF-Think** 是一款轻量级、按需触发的推理数据集,旨在向(大)音频语言模型教授简洁型思维链(Chain-of-Thought,CoT)类推理能力。该数据集包含**50万**道选择题与开放式音频问答(Question Answering,QA)三元组,每个答案均配有位于答案之前的简短思维前缀,以及仅在收到请求时才会触发推理的特殊后缀。AF-Think 的示例样本取自 AudioSkills-XL 与 LongAudio-XL 两个数据集,以覆盖多样化的音频时长与推理任务类型。本数据集按照每个音频的来源数据集划分为多个子集:
1. **UrbanSound8K(`UrbanSound8K.json`)**
- 领域:声音
- 原始数据集链接:https://urbansounddataset.weebly.com/urbansound8k.html
2. **MusicCaps(`MusicCaps.json`)**
- 领域:声音
- 原始数据集链接:https://huggingface.co/datasets/google/MusicCaps
3. **MSD(`MSD.json`)**
- 领域:音乐
- 原始数据集链接:http://millionsongdataset.com/
4. **Freesound(`Freesound.json`)**
- 领域:声音
- 原始数据集链接:https://freesound.org
- 附加说明:也可通过 https://github.com/XinhaoMei/WavCaps 下载
5. **CochlScene(`CochlScene.json`)**
- 领域:声音
- 原始数据集链接:https://github.com/cochlearai/cochlscene
6. **AudioSet_SL(`AudioSet_SL.json`)**
- 领域:声音
- 原始数据集链接:https://research.google.com/audioset/ 也可通过 https://github.com/JishengBai/AudioSetCaps 下载
7. **WavText5K(`WavText5K.json`)**
- 领域:声音
- 原始数据集链接:https://github.com/microsoft/WavText5K
8. **MELD(`MELD.json`)**
- 领域:语音
- 原始数据集链接:https://github.com/declare-lab/MELD
- 附加说明:完整的非分段原始剧集将作为对应的音频使用。
9. **AudioSet(`AudioSet.json`)**
- 领域:声音
- 原始数据集链接:https://research.google.com/audioset/ 也可通过 https://github.com/JishengBai/AudioSetCaps 下载
10. **TUT_Urban(`TUT_Urban.json`)**
- 领域:声音
- 原始数据集链接:https://dcase-repo.github.io/dcase_datalist/datasets/scenes/tut_asc_2018_mobile_eval.html
11. **Switchboard(`Switchboard.json`)**
- 领域:语音
- 原始数据集链接:https://catalog.ldc.upenn.edu/LDC97S62
- 附加说明:严格按照列表中的顺序组合各音频,以得到目标音频。
12. **SoundDescs(`SoundDescs.json`)**
- 领域:声音
- 原始数据集链接:https://github.com/akoepke/audio-retrieval-benchmark
13. **Fisher(`Fisher.json`)**
- 领域:语音
- 原始数据集链接:https://catalog.ldc.upenn.edu/LDC2004T19
- 附加说明:每个音频文件的命名格式为`file_start_end.wav`,需根据起始与结束时间对原始WAV文件进行分段。
14. **ESC-50(`ESC-50.json`)**
- 领域:声音
- 原始数据集链接:https://github.com/karolpiczak/ESC-50
15. **Clotho-v2(`Clotho-v2.json`)**
- 领域:声音
- 原始数据集链接:https://zenodo.org/records/4783391
16. **BBC Sound Effects(`BBC_Sound_Effects.json`)**
- 领域:声音
- 原始数据集链接:https://sound-effects.bbcrewind.co.uk/
17. **YouTube-8M(`YouTube8M.json`)**
- 领域:声音、语音
- 原始数据集链接:https://research.google.com/youtube8m/ 也可通过 https://github.com/JishengBai/AudioSetCaps 下载
18. **Medley-solos-DB(`Medley-solos-DB.json`)**
- 领域:音乐
- 原始数据集链接:https://zenodo.org/records/3464194
19. **MACS(`MACS.json`)**
- 领域:声音
- 原始数据集链接:https://zenodo.org/records/5114771
20. **Europarl(`Europarl.json`)**
- 领域:语音
- 原始数据集链接:https://www.statmt.org/europarl/
- 附加说明:严格按照列表中的顺序组合各音频,以得到目标音频。
21. **VoxPopuli(`VoxPopuli.json`)**
- 领域:语音
- 原始数据集链接:https://github.com/facebookresearch/voxpopuli
- 附加说明:严格按照列表中的顺序组合各音频,以得到目标音频。
22. **Music4ALL(`Music4ALL.json`)**
- 领域:音乐
- 原始数据集链接:https://github.com/amaai-lab/Music4All
- 附加说明:请通过获批许可联系对应作者以获取该JSON文件的访问权限。
23. **MultiDialog(`MultiDialog.json`)**
- 领域:语音
- 原始数据集链接:https://huggingface.co/datasets/IVLLab/MultiDialog
- 附加说明:完整的原始对话将作为对应的音频使用。
24. **Medley-Pitch-DB(`Medley-Pitch-DB.json`)**
- 领域:音乐
- 原始数据集链接:https://zenodo.org/records/3464194
25. **LibriSpeech(`LibriSpeech.json`)**
- 领域:语音
- 原始数据集链接:https://www.openslr.org/12/
- 附加说明:严格按照列表中的顺序组合各音频,以得到目标音频。
26. **IEMOCAP(`IEMOCAP.json`)**
- 领域:语音
- 原始数据集链接:https://sail.usc.edu/iemocap/
- 附加说明:完整的非分段原始WAV文件将作为对应的音频使用。
27. **FSD50k(`FSD50k.json`)**
- 领域:声音
- 原始数据集链接:https://zenodo.org/records/4060432
28. **FMA(`FMA.json`)**
- 领域:音乐
- 原始数据集链接:https://github.com/mdeff/fma
29. **DailyTalk(`DailyTalk.json`)**
- 领域:语音
- 原始数据集链接:https://github.com/keonlee9420/DailyTalk
- 附加说明:完整的非分段原始WAV文件将作为对应的音频使用。
30. **VGGSound(`VGG.json`)**
- 领域:声音
- 原始数据集链接:https://github.com/amirabd/vggsound
31. **SONNISS(`SONNISS.json`)**
- 领域:声音
- 原始数据集链接:https://sonniss.com/
32. **MagnaTagATune(`MagnaTagATune.json`)**
- 领域:音乐
- 原始数据集链接:http://mirg.city.ac.uk/codeapps/the-magnatagatune-dataset
33. **GTZAN(`GTZAN.json`)**
- 领域:音乐
- 原始数据集链接:https://github.com/chittalpatel/Music-Genre-Classification-GTZAN
34. **WavCaps(`WavCaps.json`)**
- 领域:声音
- 原始数据集链接:https://github.com/XinhaoMei/WavCaps
35. **MusicBench(`MusicBench.json`)**
- 领域:音乐
- 原始数据集链接:https://huggingface.co/datasets/amaai-lab/MusicBench
36. **Chime-Home(`Chime-Home.json`)**
- 领域:声音
- 原始数据集链接:https://archive.org/details/chime-home
37. **Clotho-AQA(`Clotho-AQA.json`)**
- 领域:声音
- 原始数据集链接:https://zenodo.org/records/6473207
38. **NonSpeech7K(`NonSpeech7K.json`)**
- 领域:声音
- 原始数据集链接:https://zenodo.org/records/6967442
39. **SoundBible(`SoundBible.json`)**
- 领域:声音
- 原始数据集链接:http://soundbible.com
通过发布AF-Think,研究人员可在广泛的音频推理任务上训练模型。**请注意:本数据集仅提供文本问答标注。由于授权限制,我们未托管原始音频文件。用户需根据JSON文件中“sound”字段对应的音频文件名,从原始来源(如YouTube8M、Music4ALL)获取对应音频片段,并从上述提及的URL下载数据集。**
## 示例用法
您可通过 Hugging Face 的 `datasets` 库加载该数据集:
python
from datasets import load_dataset
# 加载 AF-Think 数据集的 'afthink' 配置
# 该操作会将 'afthink' 配置下的所有指定数据文件作为独立拆分加载。
dataset = load_dataset("nvidia/AF-Think", "afthink")
# 通过名称访问特定拆分,例如 'urbansound8k':
print(dataset)
print(dataset["urbansound8k"][0])
# 注意:本数据集仅提供JSON格式的标注文件。实际音频文件需按照「数据集描述」章节中说明的方式,从原始来源单独下载。
## 数据集所有者
英伟达(NVIDIA)公司
## 数据集创建日期
2025/07/10
## 许可与使用条款
AF-Think 的使用受[英伟达单向非商业许可](licenses/NVIDIA-OneWay-Noncommercial-License_22Mar2022-research.docx)约束。合成数据生成可能需遵守OpenAI的[使用条款](https://openai.com/policies/terms-of-use)与[Qwen研究许可](https://huggingface.co/Qwen/Qwen2.5-7B/blob/main/LICENSE)。此外,音频内容可能受其所属数据集的许可协议约束,用户在下载或使用音频前应自行查阅相关许可。
## 预期用途
AF-Think 旨在支持:
- 针对音频开展推理任务的(大)音频语言模型的训练与微调,并赋予其推理能力。
## 数据集特征
AF-Think 的示例样本取自 AudioSkills-XL 与 LongAudio-XL 两个数据集,以覆盖多样化的音频时长与推理任务类型,无额外单独的特征描述环节。每个示例均为一段时长≤30秒的音频片段与对应的问答项。音频涵盖环境声、语音(主要为英语)与音乐。音频数据源自开源数据集(详见论文中的表7)。文本问答通过论文中提及的多种方法生成。问答生成会使用原始数据集的元数据(若可用)。
## 数据整理方法
- 音频取自多个开源数据集,部分音频为合成生成。
- 整理各数据集的现有元数据(如字幕、转录文本等)。若需额外元数据,则进行生成(详见论文细节)。
- 借助大语言模型(LLM),通过专家设计的推理提示词从元数据中生成问答对。
- 数据集整理过程采用人机协同模式,根据模型输出迭代优化提示词与数据源。
## 数据收集方法
混合模式:人工、合成与自动化
## 标注方法
合成
## 数据集格式
- **模态**:音频(WAV/MP3/FLAC)+ 文本(JSON)
- **JSON 格式示例**:
json
[
{
"id": "任意ID",
"sound": "WAV文件名。",
"conversations": [
{
"from": "human",
"value": "<sound>
问题内容。"
},
{
"from": "gpt",
"value": "答案内容。"
}
]
},
]
**注意**:尽管`duration`字段在多数情况下准确,但部分文件中可能存在错误,仅可作为占位符使用。若你的代码依赖音频时长,建议重新计算。同时请注意,所有问答对均对应完整的音频片段,而非其中某一段。
## 参考文献
- Audio Flamingo 3
bibtex
@misc{goel2025audioflamingo3advancing,
title={Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models},
author={Arushi Goel and Sreyan Ghosh and Jaehyeon Kim and Sonal Kumar and Zhifeng Kong and Sang-gil Lee and Chao-Han Huck Yang and Ramani Duraiswami and Dinesh Manocha and Rafael Valle and Bryan Catanzaro},
year={2025},
eprint={2507.08128},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2507.08128},
}
- Audio Flamingo
bibtex
@inproceedings{kong2024audio,
title={Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities},
author={Kong, Zhifeng and Goel, Arushi and Badlani, Rohan and Ping, Wei and Valle, Rafael and Catanzaro, Bryan},
booktitle={International Conference on Machine Learning},
pages={25125--25148},
year={2024},
organization={PMLR}
}
- Audio Flamingo 2
bibtex
@article{ghosh2025audio,
title={Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities},
author={Ghosh, Sreyan and Kong, Zhifeng and Kumar, Sonal and Sakshi, S and Kim, Jaehyeon and Ping, Wei and Valle, Rafael and Manocha, Dinesh and Catanzaro, Bryan},
journal={arXiv preprint arXiv:2503.03983},
year={2025}
}
## 伦理考量
英伟达(NVIDIA)认为可信人工智能是一项共同责任,我们已制定相关政策与实践,以支持各类AI应用的开发。当开发者按照我们的服务条款下载或使用本数据集时,应与其内部模型团队协作,确保该模型符合相关行业与用例的要求,并应对可能出现的产品误用问题。
请[在此处](https://www.nvidia.com/en-us/support/submit-security-vulnerability/)报告安全漏洞或NVIDIA AI相关问题。
提供机构:
maas
创建时间:
2025-07-12



