five

xzuyn/open-instruct-uncensored-alpaca

收藏
Hugging Face2023-07-31 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/xzuyn/open-instruct-uncensored-alpaca
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en tags: - allenai - open-instruct - ehartford - alpaca size_categories: - 100K<n<1M --- [Original dataset page from ehartford.](https://huggingface.co/datasets/ehartford/open-instruct-uncensored) 810,102 entries. Sourced from `open-instruct-uncensored.jsonl`. Converted the jsonl to a json which can be loaded into something like LLaMa-LoRA-Tuner. I've also included smaller datasets that includes less entries depending on how much memory you have to work with. Each one is randomized before being converted, so each dataset is unique in order. ``` Count of each Dataset: code_alpaca: 19991 unnatural_instructions: 68231 baize: 166096 self_instruct: 81512 oasst1: 49433 flan_v2: 97519 stanford_alpaca: 50098 sharegpt: 46733 super_ni: 96157 dolly: 14624 cot: 73946 gpt4_alpaca: 45774 ```
提供机构:
xzuyn
原始信息汇总

数据集概述

基本信息

  • 语言: 英语
  • 标签: allenai, open-instruct, ehartford
  • 大小类别: 100K<n<1M

数据集详情

  • 总条目数: 810,102
  • 来源文件: open-instruct-uncensored.jsonl
  • 转换格式: 转换为JSON格式,适用于LLaMa-LoRA-Tuner等工具
  • 子数据集: 包含多个较小数据集,根据可用内存选择
  • 数据顺序: 每个数据集在转换前随机排序,确保顺序唯一

子数据集统计

  • code_alpaca: 19991
  • unnatural_instructions: 68231
  • baize: 166096
  • self_instruct: 81512
  • oasst1: 49433
  • flan_v2: 97519
  • stanford_alpaca: 50098
  • sharegpt: 46733
  • super_ni: 96157
  • dolly: 14624
  • cot: 73946
  • gpt4_alpaca: 45774
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作