kotoba-speech/wiki40b_lines_ja
收藏Hugging Face2025-12-10 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/kotoba-speech/wiki40b_lines_ja
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: shard_01
features: &id001
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 212440404
num_examples: 200000
download_size: 124570858
dataset_size: 212440404
- config_name: shard_02
features:
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 211274275
num_examples: 200000
download_size: 123945124
dataset_size: 211274275
- config_name: shard_03
features:
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 211251979
num_examples: 200000
download_size: 123902395
dataset_size: 211251979
- config_name: shard_04
features:
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 213941084
num_examples: 200000
download_size: 125227026
dataset_size: 213941084
- config_name: shard_05
features:
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 211003952
num_examples: 200000
download_size: 123663665
dataset_size: 211003952
- config_name: shard_06
features:
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 212069269
num_examples: 200000
download_size: 124377718
dataset_size: 212069269
- config_name: shard_07
features:
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 210877893
num_examples: 200000
download_size: 123789666
dataset_size: 210877893
- config_name: shard_08
features:
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 210994556
num_examples: 200000
download_size: 123705972
dataset_size: 210994556
- config_name: shard_09
features:
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 211494812
num_examples: 200000
download_size: 124078944
dataset_size: 211494812
- config_name: shard_10
features:
- name: text
dtype: string
- name: key
dtype: string
splits:
- name: train
num_bytes: 71696376
num_examples: 67228
download_size: 41988287
dataset_size: 71696376
- config_name: subset_2M
features: *id001
splits:
- name: train
num_examples: 1867228
configs:
- config_name: shard_01
data_files:
- split: train
path: shard_01/train-*
- config_name: shard_02
data_files:
- split: train
path: shard_02/train-*
- config_name: shard_03
data_files:
- split: train
path: shard_03/train-*
- config_name: shard_04
data_files:
- split: train
path: shard_04/train-*
- config_name: shard_05
data_files:
- split: train
path: shard_05/train-*
- config_name: shard_06
data_files:
- split: train
path: shard_06/train-*
- config_name: shard_07
data_files:
- split: train
path: shard_07/train-*
- config_name: shard_08
data_files:
- split: train
path: shard_08/train-*
- config_name: shard_09
data_files:
- split: train
path: shard_09/train-*
- config_name: shard_10
data_files:
- split: train
path: shard_10/train-*
- config_name: subset_2M
data_files:
- split: train
path:
- shard_01/train-*
- shard_02/train-*
- shard_03/train-*
- shard_04/train-*
- shard_05/train-*
- shard_06/train-*
- shard_07/train-*
- shard_08/train-*
- shard_09/train-*
- shard_10/train-*
---
提供机构:
kotoba-speech



