scb10x/typhoon-audio-preview-data

Name: scb10x/typhoon-audio-preview-data
Creator: scb10x
Published: 2024-12-20 09:17:45
License: 暂无描述

Hugging Face2024-12-20 更新2024-12-21 收录

下载链接：

https://hf-mirror.com/datasets/scb10x/typhoon-audio-preview-data

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集用于对齐语音/音频表示与文本表示，包含泰语和英语的{音频、指令、响应}示例。数据集提供了我们为Typhoon-Audio训练生成的{指令、响应}对。我们不拥有原始数据源（如CommonVoice、LibriSpeech等），您可以从原始来源下载这些数据集，或联系`{potsawee, kunat}@scb10x.com`。数据集分为两个部分：1. **Pretrained**：包含180万示例，涵盖ASR和音频字幕数据；2. **SFT**：包含66.5万示例，涵盖多种音频任务。

This dataset is for aligning speech/audio representations with textual representations. It consists of {audio, instruction, response} examples in both Thai and English. The dataset is divided into two parts: Pretrained (1.8M examples, including ASR and audio captioning data) and SFT (665K examples, including a range of audio tasks). Attributes include path, instruction, and response.

提供机构：

scb10x

5,000+

优质数据集

54 个

任务类型

进入经典数据集