five

kotoba-speech/wiki40b_lines_zh-cn

收藏
Hugging Face2025-12-10 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/kotoba-speech/wiki40b_lines_zh-cn
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: shard_01 features: &id001 - name: text dtype: string - name: key dtype: string splits: - name: train num_bytes: 143239786 num_examples: 200000 download_size: 97098249 dataset_size: 143239786 - config_name: shard_02 features: - name: text dtype: string - name: key dtype: string splits: - name: train num_bytes: 141928965 num_examples: 200000 download_size: 96461542 dataset_size: 141928965 - config_name: shard_03 features: - name: text dtype: string - name: key dtype: string splits: - name: train num_bytes: 142649790 num_examples: 200000 download_size: 96698932 dataset_size: 142649790 - config_name: shard_04 features: - name: text dtype: string - name: key dtype: string splits: - name: train num_bytes: 141095924 num_examples: 200000 download_size: 95772220 dataset_size: 141095924 - config_name: shard_05 features: - name: text dtype: string - name: key dtype: string splits: - name: train num_bytes: 142289985 num_examples: 200000 download_size: 96544920 dataset_size: 142289985 - config_name: shard_06 features: - name: text dtype: string - name: key dtype: string splits: - name: train num_bytes: 135622401 num_examples: 192704 download_size: 92058123 dataset_size: 135622401 - config_name: subset_400K features: *id001 splits: - name: train num_examples: 400000 - config_name: subset_1M features: *id001 splits: - name: train num_examples: 1000000 configs: - config_name: shard_01 data_files: - split: train path: shard_01/train-* - config_name: shard_02 data_files: - split: train path: shard_02/train-* - config_name: shard_03 data_files: - split: train path: shard_03/train-* - config_name: shard_04 data_files: - split: train path: shard_04/train-* - config_name: shard_05 data_files: - split: train path: shard_05/train-* - config_name: shard_06 data_files: - split: train path: shard_06/train-* - config_name: subset_400K data_files: - split: train path: - shard_01/train-* - shard_02/train-* - config_name: subset_1M data_files: - split: train path: - shard_01/train-* - shard_02/train-* - shard_03/train-* - shard_04/train-* - shard_05/train-* ---
提供机构:
kotoba-speech
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作