spygaurad/bhashini_nepali_unlabelled_pseudo

Name: spygaurad/bhashini_nepali_unlabelled_pseudo
Creator: spygaurad
Published: 2024-02-01 08:41:28
License: 暂无描述

Hugging Face2024-02-01 更新2024-06-22 收录

下载链接：

https://hf-mirror.com/datasets/spygaurad/bhashini_nepali_unlabelled_pseudo

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: audio dtype: audio: sampling_rate: 16000 - name: text dtype: string - name: whisper_transcript sequence: int64 - name: sentence dtype: string splits: - name: train num_bytes: 4232990183.932 num_examples: 33524 download_size: 3329250560 dataset_size: 4232990183.932 configs: - config_name: default data_files: - split: train path: data/train-* license: afl-3.0 language: - ne pretty_name: Nepali ASR Pseudo Labelled size_categories: - 10K<n<100K --- # Dataset Card for Dataset Name  This dataset is the unlabelled ASR Dataset for Nepali language downloaded from the Bhashini project. ## Dataset Details ### Dataset Description Total of 32k files that are pseudo labelled with Nepali trained Whisper.  - **Curated by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] ### Dataset Sources [optional]  - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses  ### Direct Use  [More Information Needed] ### Out-of-Scope Use  [More Information Needed] ## Dataset Structure  [More Information Needed] ## Dataset Creation ### Curation Rationale  [More Information Needed] ### Source Data  #### Data Collection and Processing  [More Information Needed] #### Who are the source data producers?  [More Information Needed] ### Annotations [optional]  #### Annotation process  [More Information Needed] #### Who are the annotators?  [More Information Needed] #### Personal and Sensitive Information  [More Information Needed] ## Bias, Risks, and Limitations  [More Information Needed] ### Recommendations  Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Citation [optional]  **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional]  [More Information Needed] ## More Information [optional] [More Information Needed] ## Dataset Card Authors [optional] [More Information Needed] ## Dataset Card Contact [More Information Needed]

提供机构：

spygaurad

原始信息汇总

数据集概述

数据集描述

名称: Nepali ASR Pseudo Labelled
语言: 尼泊尔语 (ne)
大小类别: 10K<n<100K
许可证: afl-3.0

数据集结构

特征:
- audio: 采样率 16000
- text: 字符串
- whisper_transcript: 序列，类型为 int64
- sentence: 字符串
分割:
- train: 包含 33524 个样本，大小为 4232990183.932 字节

数据集大小

下载大小: 3329250560 字节
数据集大小: 4232990183.932 字节

数据集详情

描述: 包含 32k 个文件，使用尼泊尔语训练的 Whisper 进行伪标注。

5,000+

优质数据集

54 个

任务类型

进入经典数据集