BadDepartment/FLAN-Small
收藏Hugging Face2023-12-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/BadDepartment/FLAN-Small
下载链接
链接失效反馈官方服务:
资源简介:
# FLAN-Small
This repository is a reduced version of the data provided by the hardwork of: https://huggingface.co/datasets/imone/OpenOrca_FLAN.
FLAN-Small amounts to ~10m examples sampled to approximately hold to the FLAN's final "submix" of:
```
{
'flan': 0.4,
't0': 0.32,
'niv2': 0.20,
'cot': 0.05,
'dialog': 0.03
}
```
Since the `cot` data is rather small -- this was sampled with replacement; consequently there are some duplicates.
Some token length statistics:
inputs:
```
{'min': 4, 'max': 176203, 'median': 215, '99_percentile': 1611, '75_percentile': 448, '90_percentile': 732}
```
targets:
```
{'min': 0, 'max': 71437, 'median': 7, '99_percentile': 266, '75_percentile': 30, '90_percentile': 67}
```
提供机构:
BadDepartment
原始信息汇总
FLAN-Small 数据集概述
数据集大小
- 数据集包含约1000万个示例。
数据分布
-
数据集按照以下比例进行采样: json { "flan": 0.4, "t0": 0.32, "niv2": 0.20, "cot": 0.05, "dialog": 0.03 }
-
由于
cot数据较小,采样时使用了有放回的抽样方法,因此存在一些重复项。
输入数据长度统计
- 最小长度:4
- 最大长度:176203
- 中位数:215
- 99百分位数:1611
- 75百分位数:448
- 90百分位数:732
目标数据长度统计
- 最小长度:0
- 最大长度:71437
- 中位数:7
- 99百分位数:266
- 75百分位数:30
- 90百分位数:67



