five

rombodawg/LimitlessMegaCodeTraining

收藏
Hugging Face2023-10-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/rombodawg/LimitlessMegaCodeTraining
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit --- _________________ ----- BREAK THROUGH YOUR LIMITS ----- _________________ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/642cc1c253e76b4c2286c58e/FPna59yMG52VSq_5xbaHI.png) LimitlessCodeTraining is the direct sequal to Megacodetraining that is now called Legacy_MegaCodeTraining200k. This dataset is just over 646k lines of pure refined coding data. It is the pinacle of open source code training. It is the combination of the filtered Megacode training dataset filtered by shahules786 (shoutout to him) and the bigcode commitpackft dataset I converted to alpaca format. The dataset that were used to create this dataset are linked bellow: - https://huggingface.co/datasets/rombodawg/Rombodawgs_commitpackft_Evolinstruct_Converted - https://huggingface.co/datasets/shahules786/megacode-best
提供机构:
rombodawg
原始信息汇总

LimitlessCodeTraining 数据集概述

数据集简介

  • 名称:LimitlessCodeTraining
  • 前身:Legacy_MegaCodeTraining200k
  • 规模:超过646k行代码
  • 特点:纯精炼的编码数据,开放源代码训练的巅峰之作

数据来源

  • 原始数据集
    • Rombodawgs_commitpackft_Evolinstruct_Converted:由rombodawg提供的commitpackft数据集,已转换为alpaca格式
    • megacode-best:由shahules786提供的经过筛选的Megacode训练数据集

许可证

  • MIT许可证
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作