Code4ML
收藏arXiv2022-10-28 更新2024-06-21 收录
下载链接:
https://zenodo.org/record/6559329#.Yv9p_GgzZPY
下载链接
链接失效反馈官方服务:
资源简介:
Code4ML是一个大规模的机器学习代码数据集,由国家研究大学高等经济学院的计算机科学系创建。该数据集包含约250万条Python代码片段,这些片段来自Kaggle平台上约10万份公开的Jupyter笔记本。数据集中的代码片段通过一个用户友好的界面由人工评估者进行标注,以帮助解决软件工程或数据科学中的多种挑战。Code4ML数据集的应用领域包括语义代码分类、代码自动完成和机器学习任务的自然语言代码生成等。
Code4ML is a large-scale machine learning code dataset created by the Department of Computer Science of the National Research University Higher School of Economics. This dataset contains approximately 2.5 million Python code snippets derived from 100,000 public Jupyter notebooks on the Kaggle platform. The code snippets in the dataset are annotated by human annotators via a user-friendly interface to help resolve various challenges in software engineering or data science. Application areas of the Code4ML dataset include semantic code classification, code autocompletion, and natural language-to-code generation for machine learning tasks, among others.
提供机构:
计算机科学系,国家研究大学高等经济学院,莫斯科
创建时间:
2022-10-28



