Masternlp/stt-dataset-phoenix-600h-600k-row
收藏Hugging Face2025-02-13 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/Masternlp/stt-dataset-phoenix-600h-600k-row
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个包含音频和对应文本的数据集,适用于语音识别和自然语言处理任务。数据集中的每个样本都包含音频文件的路径、文本内容、前一个文本内容、唯一标识符、客户端标识符、音频持续时间、句子内容、创建时间、原始句子标识符、句子片段数量、点赞数、踩数、举报次数、举报原因、跳过片段数、性别信息、口音地区、母语和出生年份等信息。数据集分为训练集,提供了字节数和示例数的统计信息。
This dataset is a collection of audio files and their corresponding text, suitable for speech recognition and natural language processing tasks. Each sample in the dataset includes the path to the audio file, text content, previous text content, unique identifier, client identifier, audio duration, sentence content, creation time, original sentence identifier, number of sentence clips, number of upvotes, number of downvotes, number of reports, reasons for reports, number of skipped clips, gender information, accent region, native language, and year of birth. The dataset is split into a training set and provides statistics on the number of bytes and the number of examples.
提供机构:
Masternlp



