five

imvladikon/hebrew_speech_coursera

收藏
Hugging Face2023-05-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/imvladikon/hebrew_speech_coursera
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - automatic-speech-recognition language: - he dataset_info: features: - name: audio dtype: audio: sampling_rate: 16000 - name: sentence dtype: string splits: - name: train num_bytes: 6670706136.352 num_examples: 20306 - name: validation num_bytes: 1648062261.28 num_examples: 5076 download_size: 7726933856 dataset_size: 8318768397.632 size_categories: - 1K<n<10K --- # Dataset Card for Dataset Name ## Dataset Description - **Homepage:** - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1). ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances ```json {'audio': {'path': '/root/.cache/huggingface/datasets/downloads/extracted/89efd3a0fa3ead3f0b8e432e8796697a738d4561b24ff91f4fb2cc25d86e9fb0/train/ccef55189b7843d49110228cb0a71bfa115.wav', 'array': array([-0.01217651, -0.04351807, -0.06278992, ..., -0.00018311, -0.00146484, -0.00349426]), 'sampling_rate': 16000}, 'sentence': 'מצד אחד ובתנועה הציונית הצעירה'} ``` ### Data Fields [More Information Needed] ### Data Splits | | train | validation | | ---- | ----- | ---------- | | number of samples | 20306 | 5076 | | hours | 28.88 | 7.23 | ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information ``` @misc{imvladikon2022hebrew_speech_coursera, author = {Gurevich, Vladimir}, title = {Hebrew Speech Recognition Dataset: Coursera}, year = {2022}, howpublished = \url{https://huggingface.co/datasets/imvladikon/hebrew_speech_coursera}, } ``` ### Contributions [More Information Needed]
提供机构:
imvladikon
原始信息汇总

数据集概述

任务类别

  • 自动语音识别

语言

  • 希伯来语(he)

数据集信息

  • 特征

    • 音频
      • 采样率:16000 Hz
    • 句子:字符串类型
  • 数据分割

    • 训练集
      • 样本数:20306
      • 数据大小:6670706136.352字节
    • 验证集
      • 样本数:5076
      • 数据大小:1648062261.28字节
  • 下载大小:7726933856字节

  • 数据集总大小:8318768397.632字节

数据集结构

  • 数据实例: json { "audio": { "path": "/root/.cache/huggingface/datasets/downloads/extracted/89efd3a0fa3ead3f0b8e432e8796697a738d4561b24ff91f4fb2cc25d86e9fb0/train/ccef55189b7843d49110228cb0a71bfa115.wav", "array": array([-0.01217651, -0.04351807, -0.06278992, ..., -0.00018311, -0.00146484, -0.00349426]), "sampling_rate": 16000 }, "sentence": "מצד אחד ובתנועה הציונית הצעירה" }

  • 数据字段

    • 音频(包含路径、数组、采样率)
    • 句子(文本)
  • 数据分割详情

    训练集 验证集
    样本数 20306 5076
    小时数 28.88 7.23

引用信息

@misc{imvladikon2022hebrew_speech_coursera, author = {Gurevich, Vladimir}, title = {Hebrew Speech Recognition Dataset: Coursera}, year = {2022}, howpublished = url{https://huggingface.co/datasets/imvladikon/hebrew_speech_coursera}, }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作