amuvarma/2m-fac-raw-1dups-proc-train
收藏Hugging Face2024-12-02 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/amuvarma/2m-fac-raw-1dups-proc-train
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含音频和文本数据,主要用于语音识别或音频处理任务。数据集的特征包括文本转录(transcript)、多个音频编码序列(facodec_0到facodec_5)、分词后的文本(tokenised_text)以及输入ID(input_ids)。数据集仅包含一个训练集,共有2,282,634个示例,总大小为149,975,162,675字节。
This dataset contains audio and text data, primarily used for speech recognition or audio processing tasks. The features of the dataset include text transcripts (transcript), multiple audio codec sequences (facodec_0 to facodec_5), tokenized text (tokenised_text), and input IDs (input_ids). The dataset includes only a training set with 2,282,634 examples and a total size of 149,975,162,675 bytes.
提供机构:
amuvarma



