five

ewof/koishi-instruct-metharme

收藏
Hugging Face2024-02-02 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ewof/koishi-instruct-metharme
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - en pretty_name: koishi instruct metharme viewer: false size_categories: - 100K<n<1M --- koishi instruct metharme dataset, currently 414862 lines - oasst is from ewof/oasst-convo-unfiltered-deduped - sharegpt (vicuna) is from ewof/sharegpt-instruct-unfiltered-deduped - dolly is from ewof/dolly-instruct-unfiltered-deduped - hh-rlhf is from ewof/hh-rlhf-instruct-unfiltered-deduped - self_instruct is from ewof/self-instruct-unfiltered-deduped - hf_instruction is from ewof/hf-instruction-unfiltered - gpteacher is from ewof/gpteacher-unfiltered - asss is from ewof/asss-unfiltered-deduped - code_alpaca is from ewof/code-alpaca-instruct-unfiltered - synthetic_instruct is from ewof/synthetic-instruct-unfiltered-deduped - flan is from ewof/flan_unfiltered these each have their own READMEs that explain how i parsed them - evol instruct code is from nickrosh/Evol-Instruct-Code-80k-v1 - wizard is from ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered - airoboros is from jondurbin/airoboros-2.2.1 (i filtered out orca entries since orca has flan prompts and koishi already has flan) - llamini is from MBZUAI/LaMini-instruction i ran llamini_to_metharme.py then i ran llamini_merge_dedupe.py with koishi_data_metharme.jsonl (generated with merge.py and everything in subsets folder except llamini_data_metharme.jsonl) as k file and llamini_data_metharme.jsonl as lm file
提供机构:
ewof
原始信息汇总

数据集概述

基本信息

  • 许可证: Apache-2.0
  • 语言: 英语
  • 数据集名称: koishi instruct metharme
  • 数据集大小: 100K<n<1M,当前包含414,862行数据

数据集组成

数据集由多个子集组成,每个子集来自不同的源,具体如下:

  • oasst: 来自 ewof/oasst-convo-unfiltered-deduped
  • sharegpt (vicuna): 来自 ewof/sharegpt-instruct-unfiltered-deduped
  • dolly: 来自 ewof/dolly-instruct-unfiltered-deduped
  • hh-rlhf: 来自 ewof/hh-rlhf-instruct-unfiltered-deduped
  • self_instruct: 来自 ewof/self-instruct-unfiltered-deduped
  • hf_instruction: 来自 ewof/hf-instruction-unfiltered
  • gpteacher: 来自 ewof/gpteacher-unfiltered
  • asss: 来自 ewof/asss-unfiltered-deduped
  • code_alpaca: 来自 ewof/code-alpaca-instruct-unfiltered
  • synthetic_instruct: 来自 ewof/synthetic-instruct-unfiltered-deduped
  • flan: 来自 ewof/flan_unfiltered

其他相关数据集

  • evol instruct code: 来自 nickrosh/Evol-Instruct-Code-80k-v1
  • wizard: 来自 ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered
  • airoboros: 来自 jondurbin/airoboros-2.2.1,已过滤掉orca条目
  • llamini: 来自 MBZUAI/LaMini-instruction,经过llamini_to_metharme.py和llamini_merge_dedupe.py处理,合并了koishi_data_metharme.jsonl和llamini_data_metharme.jsonl
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作