anilkeshwani/GigaSpeech_aligned_hubert

Name: anilkeshwani/GigaSpeech_aligned_hubert
Creator: anilkeshwani
Published: 2024-08-30 17:35:37
License: 暂无描述

Hugging Face2024-08-30 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/anilkeshwani/GigaSpeech_aligned_hubert

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含多个特征，如segment_id、text、text_processed、audio_id、path、speaker、begin_time、end_time、title、url、source、category、original_full_path、tokenized、normalized、uroman、speech_tokens、aligned_token_start_time和aligned_token_end_time。这些特征涵盖了文本、音频、时间戳、来源、类别等信息。数据集分为一个训练集，包含2,266,371个样本，总大小为6,154,412,303字节。

This dataset contains multiple features such as segment_id, text, text_processed, audio_id, path, speaker, begin_time, end_time, title, url, source, category, original_full_path, tokenized, normalized, uroman, speech_tokens, aligned_token_start_time, and aligned_token_end_time. These features cover text, audio, timestamps, sources, categories, and more. The dataset is divided into a training set containing 2,266,371 samples with a total size of 6,154,412,303 bytes.

提供机构：

anilkeshwani

5,000+

优质数据集

54 个

任务类型

进入经典数据集