imvladikon/hebrew_speech_coursera

Name: imvladikon/hebrew_speech_coursera
Creator: imvladikon
Published: 2023-05-05 09:05:00
License: 暂无描述

Hugging Face2023-05-05 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/imvladikon/hebrew_speech_coursera

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - automatic-speech-recognition language: - he dataset_info: features: - name: audio dtype: audio: sampling_rate: 16000 - name: sentence dtype: string splits: - name: train num_bytes: 6670706136.352 num_examples: 20306 - name: validation num_bytes: 1648062261.28 num_examples: 5076 download_size: 7726933856 dataset_size: 8318768397.632 size_categories: - 1K<n<10K --- # Dataset Card for Dataset Name ## Dataset Description - **Homepage:** - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1). ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances ```json {'audio': {'path': '/root/.cache/huggingface/datasets/downloads/extracted/89efd3a0fa3ead3f0b8e432e8796697a738d4561b24ff91f4fb2cc25d86e9fb0/train/ccef55189b7843d49110228cb0a71bfa115.wav', 'array': array([-0.01217651, -0.04351807, -0.06278992, ..., -0.00018311, -0.00146484, -0.00349426]), 'sampling_rate': 16000}, 'sentence': 'מצד אחד ובתנועה הציונית הצעירה'} ``` ### Data Fields [More Information Needed] ### Data Splits | | train | validation | | ---- | ----- | ---------- | | number of samples | 20306 | 5076 | | hours | 28.88 | 7.23 | ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information ``` @misc{imvladikon2022hebrew_speech_coursera, author = {Gurevich, Vladimir}, title = {Hebrew Speech Recognition Dataset: Coursera}, year = {2022}, howpublished = \url{https://huggingface.co/datasets/imvladikon/hebrew_speech_coursera}, } ``` ### Contributions [More Information Needed]

提供机构：

imvladikon

原始信息汇总

数据集概述

任务类别

自动语音识别

语言

希伯来语（he）

数据集信息

特征：
- 音频：
  - 采样率：16000 Hz
- 句子：字符串类型
数据分割：
- 训练集：
  - 样本数：20306
  - 数据大小：6670706136.352字节
- 验证集：
  - 样本数：5076
  - 数据大小：1648062261.28字节
下载大小：7726933856字节
数据集总大小：8318768397.632字节

数据集结构

数据实例： json { "audio": { "path": "/root/.cache/huggingface/datasets/downloads/extracted/89efd3a0fa3ead3f0b8e432e8796697a738d4561b24ff91f4fb2cc25d86e9fb0/train/ccef55189b7843d49110228cb0a71bfa115.wav", "array": array([-0.01217651, -0.04351807, -0.06278992, ..., -0.00018311, -0.00146484, -0.00349426]), "sampling_rate": 16000 }, "sentence": "מצד אחד ובתנועה הציונית הצעירה" }
数据字段：
- 音频（包含路径、数组、采样率）
- 句子（文本）
数据分割详情：

训练集验证集

样本数 20306 5076

小时数 28.88 7.23

引用信息

@misc{imvladikon2022hebrew_speech_coursera, author = {Gurevich, Vladimir}, title = {Hebrew Speech Recognition Dataset: Coursera}, year = {2022}, howpublished = url{https://huggingface.co/datasets/imvladikon/hebrew_speech_coursera}, }

5,000+

优质数据集

54 个

任务类型

进入经典数据集

	训练集	验证集
样本数	20306	5076
小时数	28.88	7.23