xzuyn/open-instruct-uncensored-alpaca
收藏Hugging Face2023-07-31 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/xzuyn/open-instruct-uncensored-alpaca
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
tags:
- allenai
- open-instruct
- ehartford
- alpaca
size_categories:
- 100K<n<1M
---
[Original dataset page from ehartford.](https://huggingface.co/datasets/ehartford/open-instruct-uncensored)
810,102 entries. Sourced from `open-instruct-uncensored.jsonl`.
Converted the jsonl to a json which can be loaded into something like LLaMa-LoRA-Tuner.
I've also included smaller datasets that includes less entries depending on how much memory you have to work with.
Each one is randomized before being converted, so each dataset is unique in order.
```
Count of each Dataset:
code_alpaca: 19991
unnatural_instructions: 68231
baize: 166096
self_instruct: 81512
oasst1: 49433
flan_v2: 97519
stanford_alpaca: 50098
sharegpt: 46733
super_ni: 96157
dolly: 14624
cot: 73946
gpt4_alpaca: 45774
```
提供机构:
xzuyn
原始信息汇总
数据集概述
基本信息
- 语言: 英语
- 标签: allenai, open-instruct, ehartford
- 大小类别: 100K<n<1M
数据集详情
- 总条目数: 810,102
- 来源文件:
open-instruct-uncensored.jsonl - 转换格式: 转换为JSON格式,适用于LLaMa-LoRA-Tuner等工具
- 子数据集: 包含多个较小数据集,根据可用内存选择
- 数据顺序: 每个数据集在转换前随机排序,确保顺序唯一
子数据集统计
- code_alpaca: 19991
- unnatural_instructions: 68231
- baize: 166096
- self_instruct: 81512
- oasst1: 49433
- flan_v2: 97519
- stanford_alpaca: 50098
- sharegpt: 46733
- super_ni: 96157
- dolly: 14624
- cot: 73946
- gpt4_alpaca: 45774



