czyzi0/pwr-azon-speech-dataset

Name: czyzi0/pwr-azon-speech-dataset
Creator: czyzi0
Published: 2024-03-27 21:31:37
License: 暂无描述

Hugging Face2024-03-27 更新2024-06-11 收录

下载链接：

https://hf-mirror.com/datasets/czyzi0/pwr-azon-speech-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-sa-4.0 task_categories: - automatic-speech-recognition language: - pl pretty_name: PWr AZON size_categories: - 10K<n<100K dataset_info: features: - name: audio dtype: audio: sampling_rate: 44100 - name: transcript dtype: string - name: gender dtype: string - name: id dtype: string - name: id_og dtype: string splits: - name: train num_bytes: 8585221408.406 num_examples: 14491 - name: unsup num_bytes: 1128648882 num_examples: 841 download_size: 9746452069 dataset_size: 9713870290.406 configs: - config_name: default data_files: - split: train path: data/train-* - split: unsup path: data/unsup-* --- This speech dataset consists of 15332 short audio clips of multiple speakers speaking in Polish. Transcription is provided for 14491 audio clips (`train` split), and it is missing for 841 audio clips (`unsup` split). Gender of speaker is provided for the whole dataset. Clips have total length of almost 31 hours. This dataset was created from _Korpus nagrań próbek mowy do celów budowy modeli akustycznych dla automatycznego rozpoznawania mowy w języku polskim_. The dataset was repackaged into easier to use format. If you are interested in the original data, please visit https://zasobynauki.pl/zasoby/korpus-nagran-probek-mowy-do-celow-budowy-modeli-akustycznych-dla-automatycznego-rozpoznawania-mowy,53293/ Also, if you find this resource helpful, kindly consider leaving a like.

提供机构：

czyzi0

原始信息汇总

数据集概述

基本信息

许可证: cc-by-sa-4.0
任务类别: automatic-speech-recognition
语言: pl
数据集名称: PWr AZON
数据量级: 10K<n<100K

数据集详情

特征

音频:
- 采样率: 44100
转录文本: string
性别: string
ID: string
原始ID: string

数据分割

训练集:
- 字节数: 8585221408.406
- 样本数: 14491
无监督集:
- 字节数: 1128648882
- 样本数: 841

下载与数据集大小

下载大小: 9746452069
数据集大小: 9713870290.406

配置

默认配置:
- 训练集路径: data/train-*
- 无监督集路径: data/unsup-*

数据集描述

该语音数据集包含15332个短音频片段，由多名波兰语说话者录制。其中14491个音频片段（训练集）提供了转录文本，841个音频片段（无监督集）未提供转录文本。整个数据集提供了说话者的性别信息，总时长近31小时。

5,000+

优质数据集

54 个

任务类型

进入经典数据集