dreamorg/csn_truncated
收藏Hugging Face2025-01-24 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/dreamorg/csn_truncated
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了多种编程语言(Go、Java、JavaScript、PHP、Python、Ruby)的代码文本数据。每种语言都有一个训练集,数据以字符串形式存储。具体信息如下:
- Go语言:训练集包含317,832个示例,总大小为189,012,810字节。
- Java语言:训练集包含454,451个示例,总大小为421,533,871字节。
- JavaScript语言:训练集包含123,889个示例,总大小为129,604,662字节。
- PHP语言:训练集包含523,712个示例,总大小为444,914,551字节。
- Python语言:训练集包含412,178个示例,总大小为543,300,673字节。
- Ruby语言:训练集包含48,791个示例,总大小为37,039,237字节。
The dataset consists of code text data for various programming languages (Go, Java, JavaScript, PHP, Python, Ruby). Each language has a training set with data stored in string format. Specific information is as follows:
- Go language: The training set contains 317,832 examples, totaling 189,012,810 bytes.
- Java language: The training set contains 454,451 examples, totaling 421,533,871 bytes.
- JavaScript language: The training set contains 123,889 examples, totaling 129,604,662 bytes.
- PHP language: The training set contains 523,712 examples, totaling 444,914,551 bytes.
- Python language: The training set contains 412,178 examples, totaling 543,300,673 bytes.
- Ruby language: The training set contains 48,791 examples, totaling 37,039,237 bytes.
提供机构:
dreamorg



