taskydata/realtasky
收藏Hugging Face2023-03-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/taskydata/realtasky
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
---
|Dataset|Bytes|Samples|Capping|
|-------|-----|-------|-------|
|[Unnatural Instructions](https://huggingface.co/datasets/mrm8488/unnatural-instructions-full) | 27M | 66010 | / |
|[Big-Bench](https://huggingface.co/datasets/bigbench) | 1.7G | 2631238| / |
|[FLAN](https://huggingface.co/datasets/Muennighoff/flan) | 3.1G | 3354260 | [30K examples per dataset max with 10 templates total (So 3K / template)](https://github.com/Muennighoff/FLAN/blob/main/flan/tasks.py) |
|[SuperNatural-Instructions](https://huggingface.co/datasets/Muennighoff/natural-instructions) | 7.4G | 7101558 | / |
|[StackOverflow](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_titlebody_best_voted_answer_jsonl) | 9.0G | 4730542 | / |
|[xP3-EN](https://huggingface.co/datasets/bigscience/xP3) | 37G | 31495184 | [100K examples per data subset per prompt allowed (So 100K / template)](https://github.com/bigscience-workshop/bigscience/blob/e848657707a549dda35c8b3cc63a96d2064b2983/data/xp3/prepare_xp3_train.py#L15) |
|Total|58GB|49378792|
语言:
- en
| 数据集 | 字节数 | 样本数 | 样本上限规则 |
|-------|-----|-------|-------|
| [非自然指令(Unnatural Instructions)](https://huggingface.co/datasets/mrm8488/unnatural-instructions-full) | 27M | 66010 | / |
| [大基准测试(Big-Bench)](https://huggingface.co/datasets/bigbench) | 1.7G | 2631238 | / |
| [FLAN](https://huggingface.co/datasets/Muennighoff/flan) | 3.1G | 3354260 | 每个数据集最多30K个样本,共10个模板(即每个模板3K个样本),详见:https://github.com/Muennighoff/FLAN/blob/main/flan/tasks.py |
| [超自然指令(SuperNatural-Instructions)](https://huggingface.co/datasets/Muennighoff/natural-instructions) | 7.4G | 7101558 | / |
| [堆栈溢出(StackOverflow)](https://huggingface.co/datasets/flax-sentence-embeddings/stackexchange_titlebody_best_voted_answer_jsonl) | 9.0G | 4730542 | / |
| [xP3-EN](https://huggingface.co/datasets/bigscience/xP3) | 37G | 31495184 | 每个数据子集每个提示词最多允许100K个样本(即每个模板100K个样本),详见:https://github.com/bigscience-workshop/bigscience/blob/e848657707a549dda35c8b3cc63a96d2064b2983/data/xp3/prepare_xp3_train.py#L15 |
| 总计 | 58GB | 49378792 | |
提供机构:
taskydata
原始信息汇总
数据集概述
Unnatural Instructions
- 大小: 27M
- 样本数: 66010
- 限制: 无
Big-Bench
- 大小: 1.7G
- 样本数: 2631238
- 限制: 无
FLAN
- 大小: 3.1G
- 样本数: 3354260
- 限制: 每个数据集最多30K示例,共10个模板(即每个模板3K示例)
SuperNatural-Instructions
- 大小: 7.4G
- 样本数: 7101558
- 限制: 无
StackOverflow
- 大小: 9.0G
- 样本数: 4730542
- 限制: 无
xP3-EN
- 大小: 37G
- 样本数: 31495184
- 限制: 每个数据子集每个提示允许100K示例(即每个模板100K示例)
总计
- 总大小: 58GB
- 总样本数: 49378792



