kotoba-speech/wiki40b_lines_zh-cn
收藏Hugging Face2025-12-10 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/kotoba-speech/wiki40b_lines_zh-cn
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: shard_01
features: &id001
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 143239786
num_examples: 200000
download_size: 97098249
dataset_size: 143239786
- config_name: shard_02
features:
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 141928965
num_examples: 200000
download_size: 96461542
dataset_size: 141928965
- config_name: shard_03
features:
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 142649790
num_examples: 200000
download_size: 96698932
dataset_size: 142649790
- config_name: shard_04
features:
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 141095924
num_examples: 200000
download_size: 95772220
dataset_size: 141095924
- config_name: shard_05
features:
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 142289985
num_examples: 200000
download_size: 96544920
dataset_size: 142289985
- config_name: shard_06
features:
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 135622401
num_examples: 192704
download_size: 92058123
dataset_size: 135622401
- config_name: subset_400K
features: *id001
splits:
- name: train
num_examples: 400000
- config_name: subset_1M
features: *id001
splits:
- name: train
num_examples: 1000000
configs:
- config_name: shard_01
data_files:
- split: train
path: shard_01/train-*
- config_name: shard_02
data_files:
- split: train
path: shard_02/train-*
- config_name: shard_03
data_files:
- split: train
path: shard_03/train-*
- config_name: shard_04
data_files:
- split: train
path: shard_04/train-*
- config_name: shard_05
data_files:
- split: train
path: shard_05/train-*
- config_name: shard_06
data_files:
- split: train
path: shard_06/train-*
- config_name: subset_400K
data_files:
- split: train
path:
- shard_01/train-*
- shard_02/train-*
- config_name: subset_1M
data_files:
- split: train
path:
- shard_01/train-*
- shard_02/train-*
- shard_03/train-*
- shard_04/train-*
- shard_05/train-*
---
提供机构:
kotoba-speech



