katenil/pipeline2code
收藏Hugging Face2023-06-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/katenil/pipeline2code
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
tags:
- code
---
# Dataset Card for Pipeline2Code
## Dataset origin
[Code4ML: a Large-scale Dataset of annotated Machine Learning Code](https://zenodo.org/record/7733823)
## Dataset Summary
This dataset is designed for the iterative generation of Machine Learning (ML) code based on high-level ML pipeline descriptions.
It consists of code snippets extracted from Kaggle kernels, organized as Jupyter Notebook snippets.
Each kernel includes a set of prompts and completions.
The initial prompt contains an <SOS> token, meta-information about the task the notebook aims
to solve, and the semantic type of the code snippet. The final prompt of each kernel consists of the semantic
type of the code snippet followed by an <EOS> token. Each prompt is associated with a code snippet completion.
Subsequent prompts include previously generated completions and the semantic type of the snippet.
提供机构:
katenil
原始信息汇总
数据集概述
数据集名称
Pipeline2Code
数据集来源
Code4ML: a Large-scale Dataset of annotated Machine Learning Code
数据集目的
该数据集旨在基于高层次的机器学习(ML)管道描述,迭代生成ML代码。
数据集内容
- 数据集由从Kaggle内核中提取的代码片段组成,这些片段被组织为Jupyter Notebook片段。
- 每个内核包含一组提示和完成。
- 初始提示包含一个<SOS>标记,关于笔记本旨在解决的任务的元信息,以及代码片段的语义类型。
- 每个内核的最终提示由代码片段的语义类型跟随一个<EOS>标记组成。
- 每个提示都与一个代码片段完成相关联。
- 后续提示包括先前生成的完成和代码片段的语义类型。
许可证
cc-by-4.0



