zhongshupeng/dataset_4090_2
收藏Hugging Face2023-10-27 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/zhongshupeng/dataset_4090_2
下载链接
链接失效反馈官方服务:
资源简介:
# Disclaimer:
this dataset is curated for NeurIPS 2023 LLM efficiency challange, and currently work in progress. Please use at your own risk.
# Data composition:
All data were derived from the training set portion of the open source dataset.
**gsm2k_dolly15k_cnnadd6k_mmlulog1.7w_bbqabc8k.json**:
-gsm8k_2000: https://huggingface.co/datasets/gsm8k
-dolly_15000: https://huggingface.co/datasets/databricks/databricks-dolly-15k
-cnn_dailymail_6000: https://huggingface.co/datasets/cnn_dailymail
-mmlu_17000: https://huggingface.co/datasets/cais/mmlu
-bbq_8000: https://huggingface.co/datasets/tasksource/bigbench
**lima_4kall.json**
-lima_1000: https://huggingface.co/datasets/GAIR/lima
-3000 of gsm8k_dolly15k_cnnadd8k_mmlulog1.7w_bbqabc8k.json: https://huggingface.co/datasets/zhongshupeng/dataset_4090_1
提供机构:
zhongshupeng
原始信息汇总
数据集概述
数据组成
所有数据均来源于开源数据集的训练集部分。
数据文件
-
gsm2k_dolly15k_cnnadd6k_mmlulog1.7w_bbqabc8k.json
- gsm8k_2000
- dolly_15000
- cnn_dailymail_6000
- mmlu_17000
- bbq_8000
-
lima_4kall.json
- lima_1000
- 3000 条数据来自 gsm8k_dolly15k_cnnadd8k_mmlulog1.7w_bbqabc8k.json



