imvladikon/hebrew_speech_kan

Name: imvladikon/hebrew_speech_kan
Creator: imvladikon
Published: 2023-05-05 09:12:15
License: 暂无描述

Hugging Face2023-05-05 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/imvladikon/hebrew_speech_kan

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - automatic-speech-recognition language: - he size_categories: - 1K<n<10K dataset_info: features: - name: audio dtype: audio: sampling_rate: 16000 - name: sentence dtype: string splits: - name: train num_bytes: 1569850175.0 num_examples: 8000 - name: validation num_bytes: 394275049.0 num_examples: 2000 download_size: 1989406585 dataset_size: 1964125224.0 --- # Dataset Card for Dataset Name ## Dataset Description - **Homepage:** - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary Hebrew Dataset for ASR ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances ```json {'audio': {'path': '/root/.cache/huggingface/datasets/downloads/extracted/8ce7402f6482c6053251d7f3000eec88668c994beb48b7ca7352e77ef810a0b6/train/e429593fede945c185897e378a5839f4198.wav', 'array': array([-0.00265503, -0.0018158 , -0.00149536, ..., -0.00135803, -0.00231934, -0.00190735]), 'sampling_rate': 16000}, 'sentence': 'היא מבינה אותי יותר מכל אחד אחר'} ``` ### Data Fields [More Information Needed] ### Data Splits | | train | validation | | ---- | ----- | ---------- | | number of samples | 8000 | 2000 | | hours | 6.92 | 1.73 | ## Dataset Creation ### Curation Rationale scraped data from youtube (channel כאן) with removing outliers (by length and ratio between length of the audio and sentences) ### Source Data #### Initial Data Collection and Normalization #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information ``` @misc{imvladikon2022hebrew_speech_kan, author = {Gurevich, Vladimir}, title = {Hebrew Speech Recognition Dataset: Kan}, year = {2022}, howpublished = \url{https://huggingface.co/datasets/imvladikon/hebrew_speech_kan}, } ``` ### Contributions [More Information Needed]

提供机构：

imvladikon

原始信息汇总