typhoon-audio-preview-data
收藏魔搭社区2025-08-15 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/scb10x/typhoon-audio-preview-data
下载链接
链接失效反馈官方服务:
资源简介:
# Typhoon Audio Preview Data
## Overview
- This dataset is for aligning speech/audio representations with textual representations. It consists of {audio, instruction, response} examples in both Thai and English. This repository provides {instruction, response} pairs that we generated for Typhoon-Audio training. We do not own the original data sources (e.g., CommonVoice, LibriSpeech, etc), and you can download these datasets from the original sources, or contact `{potsawee, kunat}@scb10x.com`
- Please refer to our technical report for more information about the dataset: https://arxiv.org/abs/2409.10999
## Data Splits
1. **Pretrained**: 1.8M examples consisting of ASR and Audio Captioning data
2. **SFT**: 665K examples consisting of a range of audio tasks
## Attributes
- `path`: path to the local wav file -- please change the directory on your machine.
- `instruction`: text instruction (which can be null, i.e., the instruction is in the audio)
- `response`: target answer
## Citation
If you find this work useful, please consider citing:
```
@article{manakul2024enhancing,
title={Enhancing low-resource language and instruction following capabilities of audio language models},
author={Manakul, Potsawee and Sun, Guangzhi and Sirichotedumrong, Warit and Tharnpipitchai, Kasima and Pipatanakul, Kunat},
journal={arXiv preprint arXiv:2409.10999},
year={2024}
}
```
# 台风音频预览数据集(Typhoon Audio Preview Data)
## 概述
- 本数据集用于实现语音/音频表征与文本表征的对齐,包含泰语和英语的{音频、指令、回复}样本。本仓库提供了我们为Typhoon-Audio模型训练所生成的{指令、回复}配对样本。我们并不拥有原始数据源(例如CommonVoice、LibriSpeech等),您可从原始来源下载这些数据集,或联系`{potsawee, kunat}@scb10x.com`进行咨询。
- 如需了解该数据集的更多细节,请参阅我们的技术报告:https://arxiv.org/abs/2409.10999
## 数据划分
1. **预训练集(Pretrained)**:包含180万个样本,涵盖自动语音识别(ASR, Automatic Speech Recognition)与音频字幕任务数据
2. **监督微调集(SFT, Supervised Fine-Tuning)**:包含66.5万个样本,涵盖多种音频相关任务
## 数据属性
- `path`:本地WAV音频文件路径——请根据您的设备修改对应目录
- `instruction`:文本指令(可为空值,即指令内嵌于音频内容中)
- `response`:目标应答内容
## 引用说明
若您的工作使用了本数据集,请引用以下文献:
@article{manakul2024enhancing,
title={增强音频语言模型的低资源语言与指令跟随能力},
author={Manakul, Potsawee and Sun, Guangzhi and Sirichotedumrong, Warit and Tharnpipitchai, Kasima and Pipatanakul, Kunat},
journal={arXiv preprint arXiv:2409.10999},
year={2024}
}
提供机构:
maas
创建时间:
2025-05-23



