CS5647Team3/data_mini
收藏Hugging Face2023-11-24 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/CS5647Team3/data_mini
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- token-classification
language:
- zh
tags:
- tone
- pinyin
- sentence
- audio
size_categories:
- 100M<n<1B
---
## Dataset Details
- Welcome to the Single-Speaker Mandarin Audio Dataset! This dataset is a curated subset extracted from a larger collection, focusing on audio recordings of a single speaker. Each audio file is accompanied by valuable linguistic annotations, including Pinyin transcriptions, tone information, and onset and offset details.
### Dataset Description
<!-- Provide a longer summary of what this dataset is. -->
- **Speaker:** The dataset exclusively features recordings of a single Mandarin speaker, providing consistency for various linguistic analyses and applications.
- **Pinyin Transcriptions:** Each audio file comes with a corresponding Pinyin transcription, offering a phonetic representation of the spoken Mandarin.
- **Tone Information:** Tone annotations are included to capture the tonal characteristics of the spoken language. This feature is essential for tone-related studies and applications.
- **Onset and Offset Details:** Precise information about the onset and offset of each audio segment is provided. This allows for accurate segmentation and analysis of the spoken content.
### Dataset Sources [optional]
<!-- Provide the basic links for the dataset. -->
- Subset of the Original Kaggle Dataset
## Uses
- Use for model evaluation or demo
提供机构:
CS5647Team3
原始信息汇总
数据集详情
- 欢迎使用单说话人普通话音频数据集! 该数据集是从一个更大的集合中精心挑选的子集,专注于单个说话人的音频记录。每个音频文件都附带有价值的语言学注释,包括拼音转录、声调信息以及起始和结束细节。
数据集描述
- 说话人: 该数据集仅包含单个普通话说话人的录音,为各种语言学分析和应用提供了一致性。
- 拼音转录: 每个音频文件都附带相应的拼音转录,提供了普通话口语的音标表示。
- 声调信息: 包含声调注释,以捕捉口语语言的声调特征。这对于声调相关的研究和应用至关重要。
- 起始和结束细节: 提供了每个音频片段的起始和结束的精确信息。这允许对口语内容进行准确的分割和分析。
数据集来源 [可选]
- 原始Kaggle数据集的子集
用途
- 用于模型评估或演示



