doof-ferb/VietMed_labeled

Name: doof-ferb/VietMed_labeled
Creator: doof-ferb
Published: 2024-07-06 23:13:21
License: 暂无描述

Hugging Face2024-07-06 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/doof-ferb/VietMed_labeled

下载链接

链接失效反馈

官方服务：

资源简介：

VietMed数据集是一个包含9.2k个样本的越南语语音数据集，主要用于自动语音识别和文本到语音的任务。数据集分为训练集、验证集和测试集，每个集包含音频文件和对应的转录文本。此外，数据集中还包含Speaker ID信息。数据集的下载大小为183555285字节，总大小为185279995.896字节。README还提供了如何使用HuggingFace加载数据集的代码示例，并提到了需要检查拼写错误和恢复被越南语音化的外来词。

The VietMed dataset is a Vietnamese speech dataset containing 9.2k samples, primarily used for automatic speech recognition and text-to-speech tasks. The dataset is divided into training, validation, and test sets, each containing audio files and corresponding transcriptions. Additionally, the dataset includes Speaker ID information. The download size of the dataset is 183555285 bytes, and the total size is 185279995.896 bytes. The README also provides a code example on how to load the dataset using HuggingFace and mentions the need to check for misspellings and restore foreign words phonetized to Vietnamese.

提供机构：

doof-ferb

原始信息汇总

VietMed labeled set 数据集概述

基本信息

许可证: cc-by-4.0
任务类别:
- 自动语音识别
- 文本到语音
语言: 越南语
数据集名称: VietMed labeled set
数据集大小: 1K<n<10K

数据集特征

音频:
- 数据类型: audio
转录文本:
- 数据类型: string
说话者ID:
- 数据类型: string

数据集分割

训练集:
- 样本数量: 2858
- 字节数: 58513440.578
验证集:
- 样本数量: 2912
- 字节数: 56714850.712
测试集:
- 样本数量: 3437
- 字节数: 70051704.606

数据集配置

配置名称: default
- 数据文件路径:
  - 训练集: data/train-*
  - 验证集: data/validation-*
  - 测试集: data/test-*

数据集大小

下载大小: 183555285
数据集总大小: 185279995.896

5,000+

优质数据集

54 个

任务类型

进入经典数据集