five

ibm-aimc/phi3-dataset-collection

收藏
Hugging Face2024-06-19 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/ibm-aimc/phi3-dataset-collection
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: auto_math_text features: - name: input_ids sequence: int32 - name: attention_mask sequence: int8 splits: - name: train num_bytes: 11396962200 num_examples: 556275 - name: test num_bytes: 114855728 num_examples: 5606 download_size: 3990157337 dataset_size: 11511817928 - config_name: math_dataset features: - name: input_ids sequence: int32 - name: attention_mask sequence: int8 splits: - name: train num_bytes: 24660709008 num_examples: 1203666 - name: test num_bytes: 132311504 num_examples: 6458 download_size: 3809009276 dataset_size: 24793020512 - config_name: math_dataset_10_subsets features: - name: input_ids sequence: int32 - name: attention_mask sequence: int8 splits: - name: train num_bytes: 5029926928 num_examples: 245506 - name: test num_bytes: 27228552 num_examples: 1329 download_size: 812191113 dataset_size: 5057155480 - config_name: math_dataset_20_subsets features: - name: input_ids sequence: int32 - name: attention_mask sequence: int8 splits: - name: train num_bytes: 8866202488 num_examples: 432751 - name: test num_bytes: 48331192 num_examples: 2359 download_size: 1368856155 dataset_size: 8914533680 - config_name: math_dataset_30_subsets features: - name: input_ids sequence: int32 - name: attention_mask sequence: int8 splits: - name: train num_bytes: 12844193544 num_examples: 626913 - name: test num_bytes: 69618224 num_examples: 3398 download_size: 1949570318 dataset_size: 12913811768 - config_name: math_dataset_40_subsets features: - name: input_ids sequence: int32 - name: attention_mask sequence: int8 splits: - name: train num_bytes: 16593681936 num_examples: 809922 - name: test num_bytes: 89430120 num_examples: 4365 download_size: 2514779153 dataset_size: 16683112056 - config_name: math_dataset_50_subsets features: - name: input_ids sequence: int32 - name: attention_mask sequence: int8 splits: - name: train num_bytes: 21544176888 num_examples: 1051551 - name: test num_bytes: 115695736 num_examples: 5647 download_size: 3302155206 dataset_size: 21659872624 - config_name: mathinstruct features: - name: input_ids sequence: int32 - name: attention_mask sequence: int8 splits: - name: train num_bytes: 330000216 num_examples: 16107 - name: test num_bytes: 3319056 num_examples: 162 download_size: 97127809 dataset_size: 333319272 - config_name: metamathQA features: - name: input_ids sequence: int32 - name: attention_mask sequence: int8 splits: - name: train num_bytes: 492408592 num_examples: 24034 - name: test num_bytes: 4917120 num_examples: 240 download_size: 138979849 dataset_size: 497325712 - config_name: synthetic_phi3_100m features: - name: input_ids sequence: int32 - name: attention_mask sequence: int32 - name: special_tokens_mask sequence: int32 splits: - name: train num_bytes: 1199257452 num_examples: 24393 - name: test num_bytes: 12143508 num_examples: 247 download_size: 188991055 dataset_size: 1211400960 - config_name: synthetic_phi3_1b features: - name: input_ids sequence: int32 - name: attention_mask sequence: int32 - name: special_tokens_mask sequence: int32 splits: - name: train num_bytes: 11852899596 num_examples: 241089 - name: test num_bytes: 119763504 num_examples: 2436 download_size: 1865151479 dataset_size: 11972663100 - config_name: synthetic_phi3_1b_rgs features: - name: input_ids sequence: int32 - name: attention_mask sequence: int32 splits: - name: train num_bytes: 7420584728 num_examples: 226403 - name: test num_bytes: 74958712 num_examples: 2287 download_size: 1675463479 dataset_size: 7495543440 - config_name: synthetic_phi3_1b_rgs_bos features: - name: input_ids sequence: int32 - name: attention_mask sequence: int32 splits: - name: train num_bytes: 7420584728 num_examples: 226403 - name: test num_bytes: 74958712 num_examples: 2287 download_size: 1675509454 dataset_size: 7495543440 - config_name: synthetic_phi3_1b_sgs features: - name: input_ids sequence: int32 - name: attention_mask sequence: int32 splits: - name: train num_bytes: 5621641192 num_examples: 171517 - name: test num_bytes: 56800808 num_examples: 1733 download_size: 1281009960 dataset_size: 5678442000 - config_name: synthetic_phi3_1b_sgs_bos features: - name: input_ids sequence: int32 - name: attention_mask sequence: int32 splits: - name: train num_bytes: 7943329152 num_examples: 242352 - name: test num_bytes: 80235648 num_examples: 2448 download_size: 1859803721 dataset_size: 8023564800 - config_name: synthetic_phi3_1b_sss features: - name: input_ids sequence: int32 - name: attention_mask sequence: int32 splits: - name: train num_bytes: 7322748368 num_examples: 223418 - name: test num_bytes: 73975432 num_examples: 2257 download_size: 1670989148 dataset_size: 7396723800 - config_name: synthetic_phi3_1b_sss_bos features: - name: input_ids sequence: int32 - name: attention_mask sequence: int32 splits: - name: train num_bytes: 7322748368 num_examples: 223418 - name: test num_bytes: 73975432 num_examples: 2257 download_size: 1671132336 dataset_size: 7396723800 configs: - config_name: auto_math_text data_files: - split: train path: auto_math_text/train-* - split: test path: auto_math_text/test-* - config_name: math_dataset data_files: - split: train path: math_dataset/train-* - split: test path: math_dataset/test-* - config_name: math_dataset_10_subsets data_files: - split: train path: math_dataset_10_subsets/train-* - split: test path: math_dataset_10_subsets/test-* - config_name: math_dataset_20_subsets data_files: - split: train path: math_dataset_20_subsets/train-* - split: test path: math_dataset_20_subsets/test-* - config_name: math_dataset_30_subsets data_files: - split: train path: math_dataset_30_subsets/train-* - split: test path: math_dataset_30_subsets/test-* - config_name: math_dataset_40_subsets data_files: - split: train path: math_dataset_40_subsets/train-* - split: test path: math_dataset_40_subsets/test-* - config_name: math_dataset_50_subsets data_files: - split: train path: math_dataset_50_subsets/train-* - split: test path: math_dataset_50_subsets/test-* - config_name: mathinstruct data_files: - split: train path: mathinstruct/train-* - split: test path: mathinstruct/test-* - config_name: metamathQA data_files: - split: train path: metamathQA/train-* - split: test path: metamathQA/test-* - config_name: synthetic_phi3_100m data_files: - split: train path: synthetic_phi3_100m/train-* - split: test path: synthetic_phi3_100m/test-* - config_name: synthetic_phi3_1b data_files: - split: train path: synthetic_phi3_1b/train-* - split: test path: synthetic_phi3_1b/test-* - config_name: synthetic_phi3_1b_rgs data_files: - split: train path: synthetic_phi3_1b_rgs/train-* - split: test path: synthetic_phi3_1b_rgs/test-* - config_name: synthetic_phi3_1b_rgs_bos data_files: - split: train path: synthetic_phi3_1b_rgs_bos/train-* - split: test path: synthetic_phi3_1b_rgs_bos/test-* - config_name: synthetic_phi3_1b_sgs data_files: - split: train path: synthetic_phi3_1b_sgs/train-* - split: test path: synthetic_phi3_1b_sgs/test-* - config_name: synthetic_phi3_1b_sgs_bos data_files: - split: train path: synthetic_phi3_1b_sgs_bos/train-* - split: test path: synthetic_phi3_1b_sgs_bos/test-* - config_name: synthetic_phi3_1b_sss data_files: - split: train path: synthetic_phi3_1b_sss/train-* - split: test path: synthetic_phi3_1b_sss/test-* - config_name: synthetic_phi3_1b_sss_bos data_files: - split: train path: synthetic_phi3_1b_sss_bos/train-* - split: test path: synthetic_phi3_1b_sss_bos/test-* ---
提供机构:
ibm-aimc
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作