five

xaviviro/Variants-catala-cv16_1

收藏
Hugging Face2024-01-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/xaviviro/Variants-catala-cv16_1
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: balear features: - name: client_id dtype: string - name: path dtype: string - name: audio dtype: audio: sampling_rate: 48000 - name: sentence dtype: string - name: up_votes dtype: int64 - name: down_votes dtype: int64 - name: age dtype: string - name: gender dtype: string - name: accent dtype: string - name: locale dtype: string - name: segment dtype: string - name: variant dtype: string splits: - name: train num_bytes: 571949427.9563617 num_examples: 15601 - name: test num_bytes: 9785506.517906336 num_examples: 268 download_size: 537570970 dataset_size: 581734934.474268 - config_name: central features: - name: client_id dtype: string - name: path dtype: string - name: audio dtype: audio: sampling_rate: 48000 - name: sentence dtype: string - name: up_votes dtype: int64 - name: down_votes dtype: int64 - name: age dtype: string - name: gender dtype: string - name: accent dtype: string - name: locale dtype: string - name: segment dtype: string - name: variant dtype: string splits: - name: train num_bytes: 19128245726.625427 num_examples: 521759 - name: test num_bytes: 61670598.91322314 num_examples: 1689 download_size: 18768643598 dataset_size: 19189916325.53865 - config_name: nord-occidental features: - name: client_id dtype: string - name: path dtype: string - name: audio dtype: audio: sampling_rate: 48000 - name: sentence dtype: string - name: up_votes dtype: int64 - name: down_votes dtype: int64 - name: age dtype: string - name: gender dtype: string - name: accent dtype: string - name: locale dtype: string - name: segment dtype: string - name: variant dtype: string splits: - name: train num_bytes: 1023833835.9423954 num_examples: 27927 - name: test num_bytes: 8105904.652892562 num_examples: 222 download_size: 940662501 dataset_size: 1031939740.595288 - config_name: septentrional features: - name: client_id dtype: string - name: path dtype: string - name: audio dtype: audio: sampling_rate: 48000 - name: sentence dtype: string - name: up_votes dtype: int64 - name: down_votes dtype: int64 - name: age dtype: string - name: gender dtype: string - name: accent dtype: string - name: locale dtype: string - name: segment dtype: string - name: variant dtype: string splits: - name: train num_bytes: 643438523.8165568 num_examples: 17551 - name: test num_bytes: 2738481.3016528925 num_examples: 75 download_size: 532590002 dataset_size: 646177005.1182097 - config_name: valencià features: - name: client_id dtype: string - name: path dtype: string - name: audio dtype: audio: sampling_rate: 48000 - name: sentence dtype: string - name: up_votes dtype: int64 - name: down_votes dtype: int64 - name: age dtype: string - name: gender dtype: string - name: accent dtype: string - name: locale dtype: string - name: segment dtype: string - name: variant dtype: string splits: - name: train num_bytes: 932364454.3161457 num_examples: 25432 - name: test num_bytes: 7375642.97245179 num_examples: 202 download_size: 1008157848 dataset_size: 939740097.2885975 configs: - config_name: balear data_files: - split: train path: balear/train-* - split: test path: balear/test-* - config_name: central data_files: - split: train path: central/train-* - split: test path: central/test-* - config_name: nord-occidental data_files: - split: train path: nord-occidental/train-* - split: test path: nord-occidental/test-* - config_name: septentrional data_files: - split: train path: septentrional/train-* - split: test path: septentrional/test-* - config_name: valencià data_files: - split: train path: valencià/train-* - split: test path: valencià/test-* ---
提供机构:
xaviviro
原始信息汇总

数据集概述

数据集配置

配置名称:balear

  • 特征列表
    • client_id: 字符串
    • path: 字符串
    • audio: 音频,采样率48000
    • sentence: 字符串
    • up_votes: 整数
    • down_votes: 整数
    • age: 字符串
    • gender: 字符串
    • accent: 字符串
    • locale: 字符串
    • segment: 字符串
    • variant: 字符串
  • 数据分割
    • train: 字节数571949427.9563617,样本数15601
    • test: 字节数9785506.517906336,样本数268
  • 下载大小:537570970字节
  • 数据集大小:581734934.474268字节

配置名称:central

  • 特征列表
    • client_id: 字符串
    • path: 字符串
    • audio: 音频,采样率48000
    • sentence: 字符串
    • up_votes: 整数
    • down_votes: 整数
    • age: 字符串
    • gender: 字符串
    • accent: 字符串
    • locale: 字符串
    • segment: 字符串
    • variant: 字符串
  • 数据分割
    • train: 字节数19128245726.625427,样本数521759
    • test: 字节数61670598.91322314,样本数1689
  • 下载大小:18768643598字节
  • 数据集大小:19189916325.53865字节

配置名称:nord-occidental

  • 特征列表
    • client_id: 字符串
    • path: 字符串
    • audio: 音频,采样率48000
    • sentence: 字符串
    • up_votes: 整数
    • down_votes: 整数
    • age: 字符串
    • gender: 字符串
    • accent: 字符串
    • locale: 字符串
    • segment: 字符串
    • variant: 字符串
  • 数据分割
    • train: 字节数1023833835.9423954,样本数27927
    • test: 字节数8105904.652892562,样本数222
  • 下载大小:940662501字节
  • 数据集大小:1031939740.595288字节

配置名称:septentrional

  • 特征列表
    • client_id: 字符串
    • path: 字符串
    • audio: 音频,采样率48000
    • sentence: 字符串
    • up_votes: 整数
    • down_votes: 整数
    • age: 字符串
    • gender: 字符串
    • accent: 字符串
    • locale: 字符串
    • segment: 字符串
    • variant: 字符串
  • 数据分割
    • train: 字节数643438523.8165568,样本数17551
    • test: 字节数2738481.3016528925,样本数75
  • 下载大小:532590002字节
  • 数据集大小:646177005.1182097字节

配置名称:valencià

  • 特征列表
    • client_id: 字符串
    • path: 字符串
    • audio: 音频,采样率48000
    • sentence: 字符串
    • up_votes: 整数
    • down_votes: 整数
    • age: 字符串
    • gender: 字符串
    • accent: 字符串
    • locale: 字符串
    • segment: 字符串
    • variant: 字符串
  • 数据分割
    • train: 字节数932364454.3161457,样本数25432
    • test: 字节数7375642.97245179,样本数202
  • 下载大小:1008157848字节
  • 数据集大小:939740097.2885975字节

数据文件路径

配置名称:balear

  • train: balear/train-*
  • test: balear/test-*

配置名称:central

  • train: central/train-*
  • test: central/test-*

配置名称:nord-occidental

  • train: nord-occidental/train-*
  • test: nord-occidental/test-*

配置名称:septentrional

  • train: septentrional/train-*
  • test: septentrional/test-*

配置名称:valencià

  • train: valencià/train-*
  • test: valencià/test-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作