xzuyn/tulu-uncensored
收藏Hugging Face2023-07-31 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/xzuyn/tulu-uncensored
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
tags:
- allenai
- tulu
- ehartford
- alpaca
size_categories:
- 100K<n<1M
---
[How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources](https://arxiv.org/abs/2306.04751)
[Original dataset page from ehartford.](https://huggingface.co/datasets/ehartford/open-instruct-uncensored)
348,020 entries. Sourced from `open-instruct-uncensored.jsonl`. Uses only these dataset subsets;
1. Flan V2
2. CoT
3. Dolly
4. OASST1
5. GPT4-Alpaca
6. Code-Alpaca
7. ShareGPT
```
Count of each Dataset:
code_alpaca: 19991
oasst1: 49433
flan_v2: 97519
sharegpt: 46733
dolly: 14624
cot: 73946
gpt4_alpaca: 45774
```
提供机构:
xzuyn
原始信息汇总
数据集概述
基本信息
- 语言: 英语
- 标签: allenai, tulu, ehartford, alpaca
- 大小: 10万至100万条记录
数据来源
- 数据集包含348,020条记录,来源于
open-instruct-uncensored.jsonl文件。
数据子集
- Flan V2: 97,519条
- CoT: 73,946条
- Dolly: 14,624条
- OASST1: 49,433条
- GPT4-Alpaca: 45,774条
- Code-Alpaca: 19,991条
- ShareGPT: 46,733条



