AISE-TUDelft/leading-comments

Name: AISE-TUDelft/leading-comments
Creator: AISE-TUDelft
Published: 2024-05-29 22:40:32
License: 暂无描述

Hugging Face2024-05-29 更新2024-06-11 收录

下载链接：

https://hf-mirror.com/datasets/AISE-TUDelft/leading-comments

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: CodeParrot features: - name: comments dtype: string splits: - name: train num_bytes: 6713883371 num_examples: 14372397 download_size: 1861971823 dataset_size: 6713883371 - config_name: CodeParrotComments features: - name: comments dtype: string splits: - name: train num_bytes: 6740538054 num_examples: 14372397 download_size: 1871456803 dataset_size: 6740538054 - config_name: GitHubCode features: - name: comments dtype: string splits: - name: train num_bytes: 21743488635 num_examples: 45301797 download_size: 6273669540 dataset_size: 21743488635 - config_name: RedPajama features: - name: comments dtype: string splits: - name: train num_bytes: 509970374 num_examples: 2281378 download_size: 232594972 dataset_size: 509970374 - config_name: RedPajamaComments features: - name: comments dtype: string splits: - name: train num_bytes: 510819441 num_examples: 2281378 download_size: 234861060 dataset_size: 510819441 - config_name: ThePile features: - name: comments dtype: string splits: - name: train num_bytes: 3450632528 num_examples: 6794995 download_size: 1229945960 dataset_size: 3450632528 - config_name: ThePileComments features: - name: comments dtype: string splits: - name: train num_bytes: 3462162891 num_examples: 6794995 download_size: 1245174054 dataset_size: 3462162891 - config_name: TheStack features: - name: comments dtype: string splits: - name: train num_bytes: 36267169085 num_examples: 77595559 download_size: 9666361191 dataset_size: 36267169085 configs: - config_name: CodeParrot data_files: - split: train path: data/CodeParrotComments/train-* - config_name: CodeParrotComments data_files: - split: train path: data/CodeParrot_Comments/train-* - config_name: GitHubCode data_files: - split: train path: data/GitHubCodeComments/train-* - config_name: RedPajama data_files: - split: train path: data/RedPajamaComments/train-* - config_name: RedPajamaComments data_files: - split: train path: data/RedPajama_Comments/train-* - config_name: ThePile data_files: - split: train path: data/ThePileComments/train-* - config_name: ThePileComments data_files: - split: train path: data/ThePile_Comments/train-* - config_name: TheStack data_files: - split: train path: data/TheStackComments/train-* --- # Dataset Card for Dataset Name  **Disclaimer:** This dataset will be released shortly. ## Dataset Details ### Dataset Description  - **Curated by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] ### Dataset Sources [optional]  - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses  ### Direct Use  [More Information Needed] ### Out-of-Scope Use  [More Information Needed] ## Dataset Structure  [More Information Needed] ## Dataset Creation ### Curation Rationale  [More Information Needed] ### Source Data  #### Data Collection and Processing  [More Information Needed] #### Who are the source data producers?  [More Information Needed] ### Annotations [optional]  #### Annotation process  [More Information Needed] #### Who are the annotators?  [More Information Needed] #### Personal and Sensitive Information  [More Information Needed] ## Bias, Risks, and Limitations  [More Information Needed] ### Recommendations  Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Citation [optional]  **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional]  [More Information Needed] ## More Information [optional] [More Information Needed] ## Dataset Card Authors [optional] [More Information Needed] ## Dataset Card Contact [More Information Needed]

This dataset includes multiple configurations, each primarily containing code comments (comments) with data type as string. The dataset is divided into training sets, with each configuration providing data size and number of examples. Specific information such as source, language, license, etc., is not yet provided.

提供机构：

AISE-TUDelft

原始信息汇总

数据集概述

CodeParrot

特征:
- comments: 字符串类型
分割:
- train: 14372397个示例，6713883371字节
下载大小: 1861971823字节
数据集大小: 6713883371字节

CodeParrotComments

特征:
- comments: 字符串类型
分割:
- train: 14372397个示例，6740538054字节
下载大小: 1871456803字节
数据集大小: 6740538054字节

GitHubCode

特征:
- comments: 字符串类型
分割:
- train: 45301797个示例，21743488635字节
下载大小: 6273669540字节
数据集大小: 21743488635字节

RedPajama

特征:
- comments: 字符串类型
分割:
- train: 2281378个示例，509970374字节
下载大小: 232594972字节
数据集大小: 509970374字节

RedPajamaComments

特征:
- comments: 字符串类型
分割:
- train: 2281378个示例，510819441字节
下载大小: 234861060字节
数据集大小: 510819441字节

ThePile

特征:
- comments: 字符串类型
分割:
- train: 6794995个示例，3450632528字节
下载大小: 1229945960字节
数据集大小: 3450632528字节

ThePileComments

特征:
- comments: 字符串类型
分割:
- train: 6794995个示例，3462162891字节
下载大小: 1245174054字节
数据集大小: 3462162891字节

TheStack

特征:
- comments: 字符串类型
分割:
- train: 77595559个示例，36267169085字节
下载大小: 9666361191字节
数据集大小: 36267169085字节

5,000+

优质数据集

54 个

任务类型

进入经典数据集