imvladikon/hebrew_speech_coursera
收藏Hugging Face2023-05-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/imvladikon/hebrew_speech_coursera
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- automatic-speech-recognition
language:
- he
dataset_info:
features:
- name: audio
dtype:
audio:
sampling_rate: 16000
- name: sentence
dtype: string
splits:
- name: train
num_bytes: 6670706136.352
num_examples: 20306
- name: validation
num_bytes: 1648062261.28
num_examples: 5076
download_size: 7726933856
dataset_size: 8318768397.632
size_categories:
- 1K<n<10K
---
# Dataset Card for Dataset Name
## Dataset Description
- **Homepage:**
- **Repository:**
- **Paper:**
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1).
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
[More Information Needed]
## Dataset Structure
### Data Instances
```json
{'audio': {'path': '/root/.cache/huggingface/datasets/downloads/extracted/89efd3a0fa3ead3f0b8e432e8796697a738d4561b24ff91f4fb2cc25d86e9fb0/train/ccef55189b7843d49110228cb0a71bfa115.wav',
'array': array([-0.01217651, -0.04351807, -0.06278992, ..., -0.00018311,
-0.00146484, -0.00349426]),
'sampling_rate': 16000},
'sentence': 'מצד אחד ובתנועה הציונית הצעירה'}
```
### Data Fields
[More Information Needed]
### Data Splits
| | train | validation |
| ---- | ----- | ---------- |
| number of samples | 20306 | 5076 |
| hours | 28.88 | 7.23 |
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
```
@misc{imvladikon2022hebrew_speech_coursera,
author = {Gurevich, Vladimir},
title = {Hebrew Speech Recognition Dataset: Coursera},
year = {2022},
howpublished = \url{https://huggingface.co/datasets/imvladikon/hebrew_speech_coursera},
}
```
### Contributions
[More Information Needed]
提供机构:
imvladikon
原始信息汇总
数据集概述
任务类别
- 自动语音识别
语言
- 希伯来语(he)
数据集信息
-
特征:
- 音频:
- 采样率:16000 Hz
- 句子:字符串类型
- 音频:
-
数据分割:
- 训练集:
- 样本数:20306
- 数据大小:6670706136.352字节
- 验证集:
- 样本数:5076
- 数据大小:1648062261.28字节
- 训练集:
-
下载大小:7726933856字节
-
数据集总大小:8318768397.632字节
数据集结构
-
数据实例: json { "audio": { "path": "/root/.cache/huggingface/datasets/downloads/extracted/89efd3a0fa3ead3f0b8e432e8796697a738d4561b24ff91f4fb2cc25d86e9fb0/train/ccef55189b7843d49110228cb0a71bfa115.wav", "array": array([-0.01217651, -0.04351807, -0.06278992, ..., -0.00018311, -0.00146484, -0.00349426]), "sampling_rate": 16000 }, "sentence": "מצד אחד ובתנועה הציונית הצעירה" }
-
数据字段:
- 音频(包含路径、数组、采样率)
- 句子(文本)
-
数据分割详情:
训练集 验证集 样本数 20306 5076 小时数 28.88 7.23
引用信息
@misc{imvladikon2022hebrew_speech_coursera, author = {Gurevich, Vladimir}, title = {Hebrew Speech Recognition Dataset: Coursera}, year = {2022}, howpublished = url{https://huggingface.co/datasets/imvladikon/hebrew_speech_coursera}, }



