thanhduycao/soict_train_dataset_filter

Name: thanhduycao/soict_train_dataset_filter
Creator: thanhduycao
Published: 2023-10-27 01:02:51
License: 暂无描述

Hugging Face2023-10-27 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/thanhduycao/soict_train_dataset_filter

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* dataset_info: features: - name: id dtype: string - name: sentence dtype: string - name: intent dtype: string - name: sentence_annotation dtype: string - name: entities list: - name: type dtype: string - name: filler dtype: string - name: file dtype: string - name: audio struct: - name: array sequence: float64 - name: path dtype: string - name: sampling_rate dtype: int64 - name: origin_transcription dtype: string - name: sentence_norm dtype: string - name: sentence_norm_v2 dtype: string - name: w2v2_large_transcription dtype: string - name: wer dtype: float64 splits: - name: train num_bytes: 3205296038.433596 num_examples: 6184 - name: test num_bytes: 566006350.9006286 num_examples: 1092 download_size: 902006355 dataset_size: 3771302389.3342247 --- # Dataset Card for "soict_train_dataset_filter" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

thanhduycao

原始信息汇总

数据集概述

配置

默认配置：
- 训练集：路径为 data/train-*
- 测试集：路径为 data/test-*

数据特征

id：字符串类型
sentence：字符串类型
intent：字符串类型
sentence_annotation：字符串类型
entities：列表类型，包含以下字段：
- type：字符串类型
- filler：字符串类型
file：字符串类型
audio：结构类型，包含以下字段：
- array：浮点数序列
- path：字符串类型
- sampling_rate：整数类型
origin_transcription：字符串类型
sentence_norm：字符串类型
sentence_norm_v2：字符串类型
w2v2_large_transcription：字符串类型
wer：浮点数类型

数据分割

训练集：
- 字节数：3205296038.433596
- 样本数：6184
测试集：
- 字节数：566006350.9006286
- 样本数：1092

数据大小

下载大小：902006355
数据集大小：3771302389.3342247

5,000+

优质数据集

54 个

任务类型

进入经典数据集