pnkvalavala/Labyrinth
收藏Hugging Face2023-11-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/pnkvalavala/Labyrinth
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
tags:
- code
size_categories:
- 100K<n<1M
---
# Labyrinth Dataset
Labyrinth is a code dataset that combines three existing datasets without modifying the data itself but adapting the structure/format to streamline fine-tuning for [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) on code.
## Dataset Sources
Labyrinth is composed of code examples and instructions from the following three datasets:
1. [CodeAlpaca](https://github.com/sahil280114/codealpaca/blob/master/data/code_alpaca_20k.json) by [Sahil Chaudhary](https://huggingface.co/sahil2801).
2. [Codegen-instruct](https://github.com/teknium1/GPTeacher/blob/main/Codegen/codegen-instruct.json) by [Teknium](https://huggingface.co/teknium).
3. [llama-2-instruct-121k-code](https://huggingface.co/datasets/emre/llama-2-instruct-121k-code) by [Davut Emre TASAR](https://huggingface.co/emre).
提供机构:
pnkvalavala
原始信息汇总
Labyrinth 数据集
Labyrinth 是一个代码数据集,它结合了三个现有的数据集,没有修改数据本身,而是调整了结构/格式,以便于在代码上对 Zephyr 进行微调。
数据集来源
Labyrinth 由以下三个数据集的代码示例和指令组成:
- CodeAlpaca 由 Sahil Chaudhary 提供。
- Codegen-instruct 由 Teknium 提供。
- llama-2-instruct-121k-code 由 Davut Emre TASAR 提供。



