five

dododo1234/fineweb

收藏
Hugging Face2024-05-24 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/dododo1234/fineweb
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en dataset_info: - config_name: sample10bt-0 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 979947523 num_examples: 290008 download_size: 590161503 dataset_size: 979947523 - config_name: sample10bt-10150280 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 993074330 num_examples: 290008 download_size: 596140547 dataset_size: 993074330 - config_name: sample10bt-10440288 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 988913778 num_examples: 290008 download_size: 594008653 dataset_size: 988913778 - config_name: sample10bt-10730296 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 992590689 num_examples: 290008 download_size: 596255798 dataset_size: 992590689 - config_name: sample10bt-11020304 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 1000994216 num_examples: 290008 download_size: 600446360 dataset_size: 1000994216 - config_name: sample10bt-11310312 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 1003260462 num_examples: 290008 download_size: 602303904 dataset_size: 1003260462 - config_name: sample10bt-1160032 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 984660083 num_examples: 290008 download_size: 593684839 dataset_size: 984660083 - config_name: sample10bt-11600320 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 1003701977 num_examples: 290008 download_size: 602486295 dataset_size: 1003701977 - config_name: sample10bt-11890328 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 1005174566 num_examples: 290008 download_size: 603592910 dataset_size: 1005174566 - config_name: sample10bt-12180336 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 1001479206 num_examples: 290008 download_size: 601526731 dataset_size: 1001479206 - config_name: sample10bt-12470344 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 997656177 num_examples: 290008 download_size: 599795270 dataset_size: 997656177 - config_name: sample10bt-12760352 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 995369007 num_examples: 290008 download_size: 598308150 dataset_size: 995369007 - config_name: sample10bt-13050360 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 995696120 num_examples: 290008 download_size: 598678925 dataset_size: 995696120 - config_name: sample10bt-13340368 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 985424533 num_examples: 290008 download_size: 593220864 dataset_size: 985424533 - config_name: sample10bt-13630376 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 986689008 num_examples: 290008 download_size: 594972485 dataset_size: 986689008 - config_name: sample10bt-13920384 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 983042536 num_examples: 290008 download_size: 592682782 dataset_size: 983042536 - config_name: sample10bt-14210392 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 974595216 num_examples: 290008 download_size: 587924013 dataset_size: 974595216 - config_name: sample10bt-1450040 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 979367659 num_examples: 290008 download_size: 590238554 dataset_size: 979367659 - config_name: sample10bt-1740048 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 980964151 num_examples: 290008 download_size: 591922978 dataset_size: 980964151 - config_name: sample10bt-2030056 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 979597732 num_examples: 290008 download_size: 591009136 dataset_size: 979597732 - config_name: sample10bt-2320064 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 977164918 num_examples: 290008 download_size: 589114003 dataset_size: 977164918 - config_name: sample10bt-2610072 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 979829645 num_examples: 290008 download_size: 590546192 dataset_size: 979829645 - config_name: sample10bt-290008 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 977396405 num_examples: 290008 download_size: 589367664 dataset_size: 977396405 - config_name: sample10bt-2900080 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 980144418 num_examples: 290008 download_size: 591075940 dataset_size: 980144418 - config_name: sample10bt-3190088 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 981462351 num_examples: 290008 download_size: 592293366 dataset_size: 981462351 - config_name: sample10bt-3480096 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 981505785 num_examples: 290008 download_size: 591388473 dataset_size: 981505785 - config_name: sample10bt-3770104 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 990661398 num_examples: 290008 download_size: 596809014 dataset_size: 990661398 - config_name: sample10bt-4060112 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 986106212 num_examples: 290008 download_size: 594370141 dataset_size: 986106212 - config_name: sample10bt-4350120 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 980994956 num_examples: 290008 download_size: 591229442 dataset_size: 980994956 - config_name: sample10bt-4640128 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 986543200 num_examples: 290008 download_size: 593865155 dataset_size: 986543200 - config_name: sample10bt-4930136 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 979453845 num_examples: 290008 download_size: 589568442 dataset_size: 979453845 - config_name: sample10bt-5220144 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 981062571 num_examples: 290008 download_size: 590870288 dataset_size: 981062571 - config_name: sample10bt-5510152 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 981563924 num_examples: 290008 download_size: 591012138 dataset_size: 981563924 - config_name: sample10bt-580016 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 973592252 num_examples: 290008 download_size: 587456628 dataset_size: 973592252 - config_name: sample10bt-5800160 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 979367823 num_examples: 290008 download_size: 589524522 dataset_size: 979367823 - config_name: sample10bt-6090168 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 986494575 num_examples: 290008 download_size: 593063121 dataset_size: 986494575 - config_name: sample10bt-6380176 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 985063594 num_examples: 290008 download_size: 592892070 dataset_size: 985063594 - config_name: sample10bt-6670184 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 984476301 num_examples: 290008 download_size: 592780129 dataset_size: 984476301 - config_name: sample10bt-6960192 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 980895711 num_examples: 290008 download_size: 590544189 dataset_size: 980895711 - config_name: sample10bt-7250200 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 987200399 num_examples: 290008 download_size: 593909623 dataset_size: 987200399 - config_name: sample10bt-7540208 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 984340891 num_examples: 290008 download_size: 592165422 dataset_size: 984340891 - config_name: sample10bt-7830216 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 982026628 num_examples: 290008 download_size: 591036590 dataset_size: 982026628 - config_name: sample10bt-8120224 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 985505505 num_examples: 290008 download_size: 592868517 dataset_size: 985505505 - config_name: sample10bt-8410232 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 994961977 num_examples: 290008 download_size: 597775294 dataset_size: 994961977 - config_name: sample10bt-870024 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 969199330 num_examples: 290008 download_size: 584786536 dataset_size: 969199330 - config_name: sample10bt-8700240 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 996668327 num_examples: 290008 download_size: 598511850 dataset_size: 996668327 - config_name: sample10bt-8990248 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 994382609 num_examples: 290008 download_size: 597511094 dataset_size: 994382609 - config_name: sample10bt-9280256 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 993179135 num_examples: 290008 download_size: 597223946 dataset_size: 993179135 - config_name: sample10bt-9570264 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 990235851 num_examples: 290008 download_size: 595184122 dataset_size: 990235851 - config_name: sample10bt-9860272 features: - name: text dtype: string - name: id dtype: string - name: dump dtype: string - name: url dtype: string - name: date dtype: string - name: file_path dtype: string - name: language dtype: string - name: language_score dtype: float64 - name: token_count dtype: int64 splits: - name: train num_bytes: 991242320 num_examples: 290008 download_size: 594897638 dataset_size: 991242320 configs: - config_name: sample10bt-0 data_files: - split: train path: sample10bt-0/train-* - config_name: sample10bt-10150280 data_files: - split: train path: sample10bt-10150280/train-* - config_name: sample10bt-10440288 data_files: - split: train path: sample10bt-10440288/train-* - config_name: sample10bt-10730296 data_files: - split: train path: sample10bt-10730296/train-* - config_name: sample10bt-11020304 data_files: - split: train path: sample10bt-11020304/train-* - config_name: sample10bt-11310312 data_files: - split: train path: sample10bt-11310312/train-* - config_name: sample10bt-1160032 data_files: - split: train path: sample10bt-1160032/train-* - config_name: sample10bt-11600320 data_files: - split: train path: sample10bt-11600320/train-* - config_name: sample10bt-11890328 data_files: - split: train path: sample10bt-11890328/train-* - config_name: sample10bt-12180336 data_files: - split: train path: sample10bt-12180336/train-* - config_name: sample10bt-12470344 data_files: - split: train path: sample10bt-12470344/train-* - config_name: sample10bt-12760352 data_files: - split: train path: sample10bt-12760352/train-* - config_name: sample10bt-13050360 data_files: - split: train path: sample10bt-13050360/train-* - config_name: sample10bt-13340368 data_files: - split: train path: sample10bt-13340368/train-* - config_name: sample10bt-13630376 data_files: - split: train path: sample10bt-13630376/train-* - config_name: sample10bt-13920384 data_files: - split: train path: sample10bt-13920384/train-* - config_name: sample10bt-14210392 data_files: - split: train path: sample10bt-14210392/train-* - config_name: sample10bt-1450040 data_files: - split: train path: sample10bt-1450040/train-* - config_name: sample10bt-1740048 data_files: - split: train path: sample10bt-1740048/train-* - config_name: sample10bt-2030056 data_files: - split: train path: sample10bt-2030056/train-* - config_name: sample10bt-2320064 data_files: - split: train path: sample10bt-2320064/train-* - config_name: sample10bt-2610072 data_files: - split: train path: sample10bt-2610072/train-* - config_name: sample10bt-290008 data_files: - split: train path: sample10bt-290008/train-* - config_name: sample10bt-2900080 data_files: - split: train path: sample10bt-2900080/train-* - config_name: sample10bt-3190088 data_files: - split: train path: sample10bt-3190088/train-* - config_name: sample10bt-3480096 data_files: - split: train path: sample10bt-3480096/train-* - config_name: sample10bt-3770104 data_files: - split: train path: sample10bt-3770104/train-* - config_name: sample10bt-4060112 data_files: - split: train path: sample10bt-4060112/train-* - config_name: sample10bt-4350120 data_files: - split: train path: sample10bt-4350120/train-* - config_name: sample10bt-4640128 data_files: - split: train path: sample10bt-4640128/train-* - config_name: sample10bt-4930136 data_files: - split: train path: sample10bt-4930136/train-* - config_name: sample10bt-5220144 data_files: - split: train path: sample10bt-5220144/train-* - config_name: sample10bt-5510152 data_files: - split: train path: sample10bt-5510152/train-* - config_name: sample10bt-580016 data_files: - split: train path: sample10bt-580016/train-* - config_name: sample10bt-5800160 data_files: - split: train path: sample10bt-5800160/train-* - config_name: sample10bt-6090168 data_files: - split: train path: sample10bt-6090168/train-* - config_name: sample10bt-6380176 data_files: - split: train path: sample10bt-6380176/train-* - config_name: sample10bt-6670184 data_files: - split: train path: sample10bt-6670184/train-* - config_name: sample10bt-6960192 data_files: - split: train path: sample10bt-6960192/train-* - config_name: sample10bt-7250200 data_files: - split: train path: sample10bt-7250200/train-* - config_name: sample10bt-7540208 data_files: - split: train path: sample10bt-7540208/train-* - config_name: sample10bt-7830216 data_files: - split: train path: sample10bt-7830216/train-* - config_name: sample10bt-8120224 data_files: - split: train path: sample10bt-8120224/train-* - config_name: sample10bt-8410232 data_files: - split: train path: sample10bt-8410232/train-* - config_name: sample10bt-870024 data_files: - split: train path: sample10bt-870024/train-* - config_name: sample10bt-8700240 data_files: - split: train path: sample10bt-8700240/train-* - config_name: sample10bt-8990248 data_files: - split: train path: sample10bt-8990248/train-* - config_name: sample10bt-9280256 data_files: - split: train path: sample10bt-9280256/train-* - config_name: sample10bt-9570264 data_files: - split: train path: sample10bt-9570264/train-* - config_name: sample10bt-9860272 data_files: - split: train path: sample10bt-9860272/train-* ---

The dataset includes multiple configurations, each with the same set of features: text, id, dump, url, date, file_path, language, language_score, and token_count. All configurations have only one training split, each containing 290008 samples. The dataset is primarily used for English text processing.
提供机构:
dododo1234
原始信息汇总

数据集概述

数据集配置信息

  • config_name: 多个配置名称,如sample10bt-0, sample10bt-10150280, sample10bt-10440288等。

数据集特征

  • 名称: 包括text, id, dump, url, date, file_path, language, language_score, token_count
  • 数据类型: 主要为string, float64, int64

数据集分割

  • 分割名称: 均为train
  • 大小信息: 每个配置的train分割包含的num_bytesnum_examples均为98亿字节和290008个样本。

数据集大小

  • 下载大小: 不同配置的下载大小在59亿字节左右。
  • 数据集大小: 不同配置的数据集大小在98亿字节左右。

数据集详细信息

配置sample10bt-0

  • 特征: 同上。
  • 分割: train分割的num_bytes为979947523字节,num_examples为290008。
  • 大小: download_size为590161503字节,dataset_size为979947523字节。

配置sample10bt-10150280

  • 特征: 同上。
  • 分割: train分割的num_bytes为993074330字节,num_examples为290008。
  • 大小: download_size为596140547字节,dataset_size为993074330字节。

配置sample10bt-10440288

  • 特征: 同上。
  • 分割: train分割的num_bytes为988913778字节,num_examples为290008。
  • 大小: download_size为594008653字节,dataset_size为988913778字节。

配置sample10bt-10730296

  • 特征: 同上。
  • 分割: train分割的num_bytes为992590689字节,num_examples为290008。
  • 大小: download_size为596255798字节,dataset_size为992590689字节。

配置sample10bt-11020304

  • 特征: 同上。
  • 分割: train分割的num_bytes为1000994216字节,num_examples为290008。
  • 大小: download_size为600446360字节,dataset_size为1000994216字节。

配置sample10bt-11310312

  • 特征: 同上。
  • 分割: train分割的num_bytes为1003260462字节,num_examples为290008。
  • 大小: download_size为602303904字节,dataset_size为1003260462字节。

配置sample10bt-1160032

  • 特征: 同上。
  • 分割: train分割的num_bytes为984660083字节,num_examples为290008。
  • 大小: download_size为593684839字节,dataset_size为984660083字节。

配置sample10bt-11600320

  • 特征: 同上。
  • 分割: train分割的num_bytes为1003701977字节,num_examples为290008。
  • 大小: download_size为602486295字节,dataset_size为1003701977字节。

配置sample10bt-11890328

  • 特征: 同上。
  • 分割: train分割的num_bytes为1005174566字节,num_examples为290008。
  • 大小: download_size为603592910字节,dataset_size为1005174566字节。

配置sample10bt-12180336

  • 特征: 同上。
  • 分割: train分割的num_bytes为1001479206字节,num_examples为290008。
  • 大小: download_size为601526731字节,dataset_size为1001479206字节。

配置sample10bt-12470344

  • 特征: 同上。
  • 分割: train分割的num_bytes为997656177字节,num_examples为290008。
  • 大小: download_size为599795270字节,dataset_size为997656177字节。

配置sample10bt-12760352

  • 特征: 同上。
  • 分割: train分割的num_bytes为995369007字节,num_examples为290008。
  • 大小: download_size为598308150字节,dataset_size为995369007字节。

配置sample10bt-13050360

  • 特征: 同上。
  • 分割: train分割的num_bytes为995696120字节,num_examples为290008。
  • 大小: download_size为598678925字节,dataset_size为995696120字节。

配置sample10bt-13340368

  • 特征: 同上。
  • 分割: train分割的num_bytes为985424533字节,num_examples为290008。
  • 大小: download_size为593220864字节,dataset_size为985424533字节。

配置sample10bt-13630376

  • 特征: 同上。
  • 分割: train分割的num_bytes为986689008字节,num_examples为290008。
  • 大小: download_size为594972485字节,dataset_size为986689008字节。

配置sample10bt-13920384

  • 特征: 同上。
  • 分割: train分割的num_bytes为983042536字节,num_examples为290008。
  • 大小: download_size为592682782字节,dataset_size为983042536字节。

配置sample10bt-14210392

  • 特征: 同上。
  • 分割: train分割的num_bytes为974595216字节,num_examples为290008。
  • 大小: download_size为587924013字节,dataset_size为974595216字节。

配置sample10bt-1450040

  • 特征: 同上。
  • 分割: train分割的num_bytes为979367659字节,num_examples为290008。
  • 大小: download_size为590238554字节,dataset_size为979367659字节。

配置sample10bt-1740048

  • 特征: 同上。
  • 分割: train分割的num_bytes为980964151字节,num_examples为290008。
  • 大小: download_size为591922978字节,dataset_size为980964151字节。

配置sample10bt-2030056

  • 特征: 同上。
  • 分割: train分割的num_bytes为979597732字节,num_examples为290008。
  • 大小: download_size为591009136字节,dataset_size为979597732字节。

配置sample10bt-2320064

  • 特征: 同上。
  • 分割: train分割的num_bytes为977164918字节,num_examples为290008。
  • 大小: download_size为589114003字节,dataset_size为977164918字节。

配置sample10bt-2610072

  • 特征: 同上。
  • 分割: train分割的num_bytes为979829645字节,num_examples为290008。
  • 大小: download_size为590546192字节,dataset_size为979829645字节。

配置sample10bt-290008

  • 特征: 同上。
  • 分割: train分割的num_bytes为977396405字节,num_examples为290008。
  • 大小: download_size为589367664字节,dataset_size为977396405字节。

配置sample10bt-2900080

  • 特征: 同上。
  • 分割: train分割的num_bytes为980144418字节,num_examples为290008。
  • 大小: download_size为591075940字节,dataset_size为980144418字节。

配置sample10bt-3190088

  • 特征: 同上。
  • 分割: train分割的num_bytes为981462351字节,num_examples为290008。
  • 大小: download_size为592293366字节,dataset_size为981462351字节。

配置sample10bt-3480096

  • 特征: 同上。
  • 分割: train分割的num_bytes为981505785字节,num_examples为290008。
  • 大小: download_size为591388473字节,dataset_size为981505785字节。

配置sample10bt-3770104

  • 特征: 同上。
  • 分割: train分割的num_bytes为990661398字节,num_examples为290008。
  • 大小: download_size为596809014字节,dataset_size为990661398字节。

配置sample10bt-4060112

  • 特征: 同上。
  • 分割: train分割的num_bytes为986106212字节,num_examples为290008。
  • 大小: download_size为594370141字节,dataset_size为986106212字节。

配置sample10bt-4350120

  • 特征: 同上。
  • 分割: train分割的num_bytes为980994956字节,num_examples为290008。
  • 大小: download_size为591229442字节,dataset_size为980994956字节。

配置sample10bt-4640128

  • 特征: 同上。
  • 分割: train分割的num_bytes为986543200字节,num_examples为290008。
  • 大小: download_size为593865155字节,dataset_size为986543200字节。

配置sample10bt-4930136

  • 特征: 同上。
  • 分割: train分割的num_bytes为979453845字节,num_examples为290008。
  • 大小: download_size为589568442字节,dataset_size为979453845字节。

配置sample10bt-5220144

  • 特征: 同上。
  • 分割: train分割的num_bytes为981062571字节,num_examples为290008。
  • 大小: download_size为590870288字节,dataset_size为981062571字节。

配置sample10bt-5510152

  • 特征: 同上。
  • 分割: train分割的num_bytes为981563924字节,num_examples为290008。
  • 大小: download_size为591012138字节,dataset_size为981563924字节。

配置sample10bt-580016

  • 特征: 同上。
  • 分割: train分割的num_bytes为973592252字节,num_examples为290008。
  • 大小: download_size为587456628字节,dataset_size为973592252字节。

配置sample10bt-5800160

  • 特征: 同上。
  • 分割: train分割的num_bytes为979367823字节,num_examples为290008。
  • 大小: download_size为58
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作