five

Etherll/code-FIM

收藏
Hugging Face2024-07-20 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/Etherll/code-FIM
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集旨在解决开源社区中缺乏良好代码填充数据集和模型的问题。数据集包含了来自多种编程语言的10k代码文件,具体分布为Python 3000个、JavaScript 1500个、TypeScript 1250个、Rust 1750个、Java 1500个、GO 500个和C++ 500个。数据集支持最多4行的代码补全,并提供了训练建议和Colab笔记本链接。

The dataset includes two main features: prompt and completion, both of which are large string types. The dataset is divided into two parts: train and variation, containing 4,472,370 and 10,000 samples respectively. The dataset consists of code files from multiple programming languages, including Python, JavaScript, TypeScript, Rust, Java, GO, and C++. It is primarily used for code completion, supporting up to 4 lines of code completion tasks.
提供机构:
Etherll
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作