five

dreamorg/csn_truncated

收藏
Hugging Face2025-01-24 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/dreamorg/csn_truncated
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了多种编程语言(Go、Java、JavaScript、PHP、Python、Ruby)的代码文本数据。每种语言都有一个训练集,数据以字符串形式存储。具体信息如下: - Go语言:训练集包含317,832个示例,总大小为189,012,810字节。 - Java语言:训练集包含454,451个示例,总大小为421,533,871字节。 - JavaScript语言:训练集包含123,889个示例,总大小为129,604,662字节。 - PHP语言:训练集包含523,712个示例,总大小为444,914,551字节。 - Python语言:训练集包含412,178个示例,总大小为543,300,673字节。 - Ruby语言:训练集包含48,791个示例,总大小为37,039,237字节。

The dataset consists of code text data for various programming languages (Go, Java, JavaScript, PHP, Python, Ruby). Each language has a training set with data stored in string format. Specific information is as follows: - Go language: The training set contains 317,832 examples, totaling 189,012,810 bytes. - Java language: The training set contains 454,451 examples, totaling 421,533,871 bytes. - JavaScript language: The training set contains 123,889 examples, totaling 129,604,662 bytes. - PHP language: The training set contains 523,712 examples, totaling 444,914,551 bytes. - Python language: The training set contains 412,178 examples, totaling 543,300,673 bytes. - Ruby language: The training set contains 48,791 examples, totaling 37,039,237 bytes.
提供机构:
dreamorg
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作