Crystalcareai/MoD-150k
收藏Hugging Face2024-02-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Crystalcareai/MoD-150k
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
datasets:
- jsonifize/Tested-188k-Python-Alpaca_stringified-jsonifize
- Norquinal/WizardLM_alpaca_claude_evol_instruct_70k
- allenai/ai2_arc
- Squish42/bluemoon-fandom-1-1-rp-cleaned
- google/boolq
- LDJnr/Capybara
- mattpscott/airoboros-summarization
- Locutusque/Hercules-v1.0
- lmsys/lmsys-chat-1m
- Muennighoff/natural-instructions
- HuggingFaceH4/no_robots
- grimulkan/PIPPA-augmented-dedup
- euclaise/reddit-instruct
- teknium/OpenHermes-2.5
- ropes
- Open-Orca/SlimOrca-Dedup
- migtissera/Synthia-v1.3
- HuggingFaceH4/ultrachat_200k
- winogrande
- CollectiveCognition/chats-data-2023-09-22
- CollectiveCognition/chats-data-2023-09-27
- CollectiveCognition/chats-data-2023-10-16
- Locutusque/GPT4-LLM-Cleaned-chatml
- Locutusque/GPT4-roleplay-chatml
- Locutusque/GPT4-roleplay-v2-chatml
- Locutusque/WizardLM_evol_instruct_70k_chatml
- Locutusque/camel-chatml
- Locutusque/code-assistant-chatml
- Locutusque/code-assistant-v2-chatml
- Locutusque/dolphin-gpt4-chatml
- Locutusque/function-calling-chatml
- Locutusque/general-instruct-chatml
- Locutusque/lmsys-chat-1m-best
- Locutusque/medtext-chatml
- Locutusque/metamathqa-chatml
- Locutusque/platypus-chatml
- Locutusque/pubmedqa-chatml
- Locutusque/unnatural-instructions-chatml
---
## Introduction
I'm excited to share the MoD 150k subset, a selection from the broader Mixture of Data project I've been working on. This subset is crafted for those looking to fine-tune AI models for both Mixture of Experts (MoE) architectures and standard architectures, with a keen eye on accessibility for those with limited computational resources.
## My Experimentation
After diving deep into MoEs and conducting various experiments, I've found this 150k subset not only facilitates adaptation to MoE but also significantly benefits standard architectures. Running three epochs with a 7B parameter model on this dataset resulted in a diverse and effective model.
## The Dataset
Originally curated for MoE, its versatility has proven equally potent for standard model architectures. This subset, distilled from a vast array of sources, aims to foster innovation and exploration within our community for those without extensive compute resources..
## Acknowledgments
I'm grateful for the contributions from the community and the insights from various datasets and researchers. Their dedication has inspired this project, and I look forward to seeing how it is used and adapted.
Thank you for your support,
Lucas
datasets used:
- jsonifize/Tested-188k-Python-Alpaca_stringified-jsonifize
- Norquinal/WizardLM_alpaca_claude_evol_instruct_70k
- allenai/ai2_arc
- Squish42/bluemoon-fandom-1-1-rp-cleaned
- google/boolq
- LDJnr/Capybara
- mattpscott/airoboros-summarization
- Locutusque/Hercules-v1.0
- lmsys/lmsys-chat-1m
- Muennighoff/natural-instructions
- HuggingFaceH4/no_robots
- grimulkan/PIPPA-augmented-dedup
- euclaise/reddit-instruct
- teknium/OpenHermes-2.5
- ropes
- Open-Orca/SlimOrca-Dedup
- migtissera/Synthia-v1.3
- HuggingFaceH4/ultrachat_200k
- winogrande
- CollectiveCognition/chats-data-2023-09-22
- CollectiveCognition/chats-data-2023-09-27
- CollectiveCognition/chats-data-2023-10-16
- Locutusque/GPT4-LLM-Cleaned-chatml
- Locutusque/GPT4-roleplay-chatml
- Locutusque/GPT4-roleplay-v2-chatml
- Locutusque/WizardLM_evol_instruct_70k_chatml
- Locutusque/camel-chatml
- Locutusque/code-assistant-chatml
- Locutusque/code-assistant-v2-chatml
- Locutusque/dolphin-gpt4-chatml
- Locutusque/function-calling-chatml
- Locutusque/general-instruct-chatml
- Locutusque/lmsys-chat-1m-best
- Locutusque/medtext-chatml
- Locutusque/metamathqa-chatml
- Locutusque/platypus-chatml
- Locutusque/pubmedqa-chatml
- Locutusque/unnatural-instructions-chatml
提供机构:
Crystalcareai
原始信息汇总
数据集概述
数据集介绍
MoD 150k 子集是从更广泛的 Mixture of Data 项目中精选出来的,旨在为那些希望对 AI 模型进行微调的用户提供支持,特别是针对 Mixture of Experts (MoE) 架构和标准架构,同时考虑到计算资源有限的用户。
实验结果
通过对 MoE 进行深入研究和各种实验,发现这个 150k 子集不仅有助于适应 MoE,而且对标准架构也有显著的益处。使用一个 7B 参数模型运行三个周期后,得到了一个多样且有效的模型。
数据集来源
该子集最初是为 MoE 设计的,但其多功能性已被证明对标准模型架构同样有效。这个子集是从大量来源中提炼出来的,旨在促进社区内的创新和探索,特别是对于那些计算资源有限的用户。
致谢
感谢社区的贡献和来自各个数据集及研究人员的见解。他们的努力启发了这个项目,期待看到它的使用和适应情况。
使用的数据集
- jsonifize/Tested-188k-Python-Alpaca_stringified-jsonifize
- Norquinal/WizardLM_alpaca_claude_evol_instruct_70k
- allenai/ai2_arc
- Squish42/bluemoon-fandom-1-1-rp-cleaned
- google/boolq
- LDJnr/Capybara
- mattpscott/airoboros-summarization
- Locutusque/Hercules-v1.0
- lmsys/lmsys-chat-1m
- Muennighoff/natural-instructions
- HuggingFaceH4/no_robots
- grimulkan/PIPPA-augmented-dedup
- euclaise/reddit-instruct
- teknium/OpenHermes-2.5
- ropes
- Open-Orca/SlimOrca-Dedup
- migtissera/Synthia-v1.3
- HuggingFaceH4/ultrachat_200k
- winogrande
- CollectiveCognition/chats-data-2023-09-22
- CollectiveCognition/chats-data-2023-09-27
- CollectiveCognition/chats-data-2023-10-16
- Locutusque/GPT4-LLM-Cleaned-chatml
- Locutusque/GPT4-roleplay-chatml
- Locutusque/GPT4-roleplay-v2-chatml
- Locutusque/WizardLM_evol_instruct_70k_chatml
- Locutusque/camel-chatml
- Locutusque/code-assistant-chatml
- Locutusque/code-assistant-v2-chatml
- Locutusque/dolphin-gpt4-chatml
- Locutusque/function-calling-chatml
- Locutusque/general-instruct-chatml
- Locutusque/lmsys-chat-1m-best
- Locutusque/medtext-chatml
- Locutusque/metamathqa-chatml
- Locutusque/platypus-chatml
- Locutusque/pubmedqa-chatml
- Locutusque/unnatural-instructions-chatml



