candenizkocak/code-alpaca-297k
收藏Hugging Face2024-04-23 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/candenizkocak/code-alpaca-297k
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 446548774
num_examples: 297097
download_size: 216063284
dataset_size: 446548774
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
task_categories:
- question-answering
- text-generation
language:
- en
tags:
- code
size_categories:
- 100K<n<1M
---
# code-alpaca-297k
<!-- Provide a quick summary of the dataset. -->
This dataset has been generated by combining the code datasets below with minimal changes:
- [m-a-p/CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction).
- [AdithyaSK/TokenBender_code_instructions_122k_alpaca_style_LoRA](https://huggingface.co/AdithyaSK/TokenBender_code_instructions_122k_alpaca_style_LoRA).
- [iamtarun/python_code_instructions_18k_alpaca](https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca).
## Dataset Details
This dataset contains 297k rows.
- **Curated by:** [Can Deniz Koçak]
### Dataset Sources [optional]
<!-- Provide the basic links for the dataset. -->
- [m-a-p/CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction).
- [AdithyaSK/TokenBender_code_instructions_122k_alpaca_style_LoRA](https://huggingface.co/AdithyaSK/TokenBender_code_instructions_122k_alpaca_style_LoRA).
- [iamtarun/python_code_instructions_18k_alpaca](https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca).
## Contact
Please contact me on LinkedIn: [linkedin.com/in/candenizkocak](https://www.linkedin.com/in/candenizkocak/).
提供机构:
candenizkocak
原始信息汇总
数据集概述
数据集信息
- 特征:
instruction: 字符串类型input: 字符串类型output: 字符串类型source: 字符串类型
- 分割:
train: 包含297,097个样本,占用446,548,774字节
- 下载大小: 216,063,284字节
- 数据集大小: 446,548,774字节
配置
- 默认配置:
- 数据文件路径:
data/train-*
- 数据文件路径:
任务类别
- 问答
- 文本生成
语言
- 英语
标签
- 代码
大小类别
- 100K < n < 1M



