doof-ferb/infore1_25hours

Name: doof-ferb/infore1_25hours
Creator: doof-ferb
Published: 2024-02-10 11:23:22
License: 暂无描述

Hugging Face2024-02-10 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/doof-ferb/infore1_25hours

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - automatic-speech-recognition - text-to-speech language: - vi pretty_name: InfoRe Technology public dataset №1 size_categories: - 10K<n<100K dataset_info: features: - name: audio dtype: audio - name: transcription dtype: string splits: - name: train num_bytes: 7370428827.92 num_examples: 14935 download_size: 7832947140 dataset_size: 7370428827.92 configs: - config_name: default data_files: - split: train path: data/train-* --- # unofficial mirror of InfoRe Technology public dataset №1 official announcement: https://www.facebook.com/groups/j2team.community/permalink/1010834009248719/ 25h, 14.9k samples, InfoRe paid a contractor to read text official download: `magnet:?xt=urn:btih:1cbe13fb14a390c852c016a924b4a5e879d85f41&dn=25hours.zip&tr=http%3A%2F%2Foffice.socials.vn%3A8725%2Fannounce` mirror: https://files.huylenguyen.com/25hours.zip unzip password: `BroughtToYouByInfoRe` pre-process: none need to do: check misspelling usage with HuggingFace: ```python # pip install -q "datasets[audio]" from datasets import load_dataset from torch.utils.data import DataLoader dataset = load_dataset("doof-ferb/infore1_25hours", split="train", streaming=True) dataset.set_format(type="torch", columns=["audio", "transcription"]) dataloader = DataLoader(dataset, batch_size=4) ```

提供机构：

doof-ferb

原始信息汇总

InfoRe Technology public dataset №1

基本信息

许可证: cc-by-4.0
任务类别:
- 自动语音识别
- 文本到语音
语言: 越南语
数据集名称: InfoRe Technology public dataset №1
数据集大小类别: 10K<n<100K

数据集详情

特征:
- audio: 音频数据
- transcription: 字符串数据
分割:
- train:
  - 字节数: 7370428827.92
  - 样本数: 14935
下载大小: 7832947140
数据集大小: 7370428827.92

配置

配置名称: default
数据文件:
- split: train
- path: data/train-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集