AdoCleanCode/portuguese-multi-mfa-train_v0

Name: AdoCleanCode/portuguese-multi-mfa-train_v0
Creator: AdoCleanCode
Published: 2025-12-15 09:50:38
License: 暂无描述

Hugging Face2025-12-15 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/AdoCleanCode/portuguese-multi-mfa-train_v0

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: sequence dtype: string - name: transcription_full dtype: string - name: transcription_original dtype: string - name: removed_words dtype: string - name: phonemes_annotated dtype: string - name: phonemes_actual dtype: string - name: xcodec2_tokens dtype: string - name: left_audio dtype: audio: sampling_rate: 16000 - name: removed_audio dtype: audio: sampling_rate: 16000 - name: right_audio dtype: audio: sampling_rate: 16000 - name: full_audio dtype: audio: sampling_rate: 16000 splits: - name: batch_001 num_bytes: 41687312.0 num_examples: 64 download_size: 40055476 dataset_size: 41687312.0 configs: - config_name: default data_files: - split: batch_001 path: data/batch_001-* ---

数据集信息：数据特征： - 字段名：sequence，中文释义：序列，数据类型：字符串 - 字段名：transcription_full，中文释义：完整转录文本，数据类型：字符串 - 字段名：transcription_original，中文释义：原始转录文本，数据类型：字符串 - 字段名：removed_words，中文释义：移除词汇，数据类型：字符串 - 字段名：phonemes_annotated，中文释义：标注音素，数据类型：字符串 - 字段名：phonemes_actual，中文释义：实际音素，数据类型：字符串 - 字段名：xcodec2_tokens，中文释义：xcodec2 Token，数据类型：字符串 - 字段名：left_audio，中文释义：左声道音频，数据类型：音频，采样率为16000 Hz - 字段名：removed_audio，中文释义：移除片段音频，数据类型：音频，采样率为16000 Hz - 字段名：right_audio，中文释义：右声道音频，数据类型：音频，采样率为16000 Hz - 字段名：full_audio，中文释义：完整音频，数据类型：音频，采样率为16000 Hz 数据划分： - 划分名称：batch_001，占用字节数：41687312.0，样本总量：64 下载大小：40055476 字节数据集总大小：41687312.0 字节配置项： - 配置名称：default，数据文件配置： - 对应划分：batch_001，文件路径：data/batch_001-*

提供机构：

AdoCleanCode

5,000+

优质数据集

54 个

任务类型

进入经典数据集