xzuyn/open-instruct-uncensored-alpaca

Name: xzuyn/open-instruct-uncensored-alpaca
Creator: xzuyn
Published: 2023-07-31 22:23:20
License: 暂无描述

Hugging Face2023-07-31 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/xzuyn/open-instruct-uncensored-alpaca

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en tags: - allenai - open-instruct - ehartford - alpaca size_categories: - 100K<n<1M --- [Original dataset page from ehartford.](https://huggingface.co/datasets/ehartford/open-instruct-uncensored) 810,102 entries. Sourced from `open-instruct-uncensored.jsonl`. Converted the jsonl to a json which can be loaded into something like LLaMa-LoRA-Tuner. I've also included smaller datasets that includes less entries depending on how much memory you have to work with. Each one is randomized before being converted, so each dataset is unique in order. ``` Count of each Dataset: code_alpaca: 19991 unnatural_instructions: 68231 baize: 166096 self_instruct: 81512 oasst1: 49433 flan_v2: 97519 stanford_alpaca: 50098 sharegpt: 46733 super_ni: 96157 dolly: 14624 cot: 73946 gpt4_alpaca: 45774 ```

提供机构：

xzuyn

原始信息汇总

数据集概述

基本信息

语言: 英语
标签: allenai, open-instruct, ehartford
大小类别: 100K<n<1M

数据集详情

总条目数: 810,102
来源文件: open-instruct-uncensored.jsonl
转换格式: 转换为JSON格式，适用于LLaMa-LoRA-Tuner等工具
子数据集: 包含多个较小数据集，根据可用内存选择
数据顺序: 每个数据集在转换前随机排序，确保顺序唯一

子数据集统计

code_alpaca: 19991
unnatural_instructions: 68231
baize: 166096
self_instruct: 81512
oasst1: 49433
flan_v2: 97519
stanford_alpaca: 50098
sharegpt: 46733
super_ni: 96157
dolly: 14624
cot: 73946
gpt4_alpaca: 45774

5,000+

优质数据集

54 个

任务类型

进入经典数据集