rombodawg/LimitlessMegaCodeTraining
收藏Hugging Face2023-10-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/rombodawg/LimitlessMegaCodeTraining
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
---
_________________
----- BREAK THROUGH YOUR LIMITS -----
_________________

LimitlessCodeTraining is the direct sequal to Megacodetraining that is now called Legacy_MegaCodeTraining200k.
This dataset is just over 646k lines of pure refined coding data.
It is the pinacle of open source code training. It is the combination of the filtered Megacode training dataset filtered by shahules786 (shoutout to him) and the bigcode commitpackft dataset I converted to alpaca format.
The dataset that were used to create this dataset are linked bellow:
- https://huggingface.co/datasets/rombodawg/Rombodawgs_commitpackft_Evolinstruct_Converted
- https://huggingface.co/datasets/shahules786/megacode-best
提供机构:
rombodawg
原始信息汇总
LimitlessCodeTraining 数据集概述
数据集简介
- 名称:LimitlessCodeTraining
- 前身:Legacy_MegaCodeTraining200k
- 规模:超过646k行代码
- 特点:纯精炼的编码数据,开放源代码训练的巅峰之作
数据来源
- 原始数据集:
- Rombodawgs_commitpackft_Evolinstruct_Converted:由rombodawg提供的commitpackft数据集,已转换为alpaca格式
- megacode-best:由shahules786提供的经过筛选的Megacode训练数据集
许可证
- MIT许可证



