ewof/koishi-instruct-metharme
收藏Hugging Face2024-02-02 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ewof/koishi-instruct-metharme
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
pretty_name: koishi instruct metharme
viewer: false
size_categories:
- 100K<n<1M
---
koishi instruct metharme dataset, currently 414862 lines
- oasst is from ewof/oasst-convo-unfiltered-deduped
- sharegpt (vicuna) is from ewof/sharegpt-instruct-unfiltered-deduped
- dolly is from ewof/dolly-instruct-unfiltered-deduped
- hh-rlhf is from ewof/hh-rlhf-instruct-unfiltered-deduped
- self_instruct is from ewof/self-instruct-unfiltered-deduped
- hf_instruction is from ewof/hf-instruction-unfiltered
- gpteacher is from ewof/gpteacher-unfiltered
- asss is from ewof/asss-unfiltered-deduped
- code_alpaca is from ewof/code-alpaca-instruct-unfiltered
- synthetic_instruct is from ewof/synthetic-instruct-unfiltered-deduped
- flan is from ewof/flan_unfiltered
these each have their own READMEs that explain how i parsed them
- evol instruct code is from nickrosh/Evol-Instruct-Code-80k-v1
- wizard is from ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered
- airoboros is from jondurbin/airoboros-2.2.1 (i filtered out orca entries since orca has flan prompts and koishi already has flan)
- llamini is from MBZUAI/LaMini-instruction i ran llamini_to_metharme.py then i ran llamini_merge_dedupe.py with koishi_data_metharme.jsonl (generated with merge.py and everything in subsets folder except llamini_data_metharme.jsonl) as k file and llamini_data_metharme.jsonl as lm file
提供机构:
ewof
原始信息汇总
数据集概述
基本信息
- 许可证: Apache-2.0
- 语言: 英语
- 数据集名称: koishi instruct metharme
- 数据集大小: 100K<n<1M,当前包含414,862行数据
数据集组成
数据集由多个子集组成,每个子集来自不同的源,具体如下:
- oasst: 来自 ewof/oasst-convo-unfiltered-deduped
- sharegpt (vicuna): 来自 ewof/sharegpt-instruct-unfiltered-deduped
- dolly: 来自 ewof/dolly-instruct-unfiltered-deduped
- hh-rlhf: 来自 ewof/hh-rlhf-instruct-unfiltered-deduped
- self_instruct: 来自 ewof/self-instruct-unfiltered-deduped
- hf_instruction: 来自 ewof/hf-instruction-unfiltered
- gpteacher: 来自 ewof/gpteacher-unfiltered
- asss: 来自 ewof/asss-unfiltered-deduped
- code_alpaca: 来自 ewof/code-alpaca-instruct-unfiltered
- synthetic_instruct: 来自 ewof/synthetic-instruct-unfiltered-deduped
- flan: 来自 ewof/flan_unfiltered
其他相关数据集
- evol instruct code: 来自 nickrosh/Evol-Instruct-Code-80k-v1
- wizard: 来自 ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered
- airoboros: 来自 jondurbin/airoboros-2.2.1,已过滤掉orca条目
- llamini: 来自 MBZUAI/LaMini-instruction,经过llamini_to_metharme.py和llamini_merge_dedupe.py处理,合并了koishi_data_metharme.jsonl和llamini_data_metharme.jsonl



