ZorraZabb/full_coding_sampling_xml_fitered

Name: ZorraZabb/full_coding_sampling_xml_fitered
Creator: ZorraZabb
Published: 2024-09-13 06:46:05
License: 暂无描述

Hugging Face2024-09-13 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/ZorraZabb/full_coding_sampling_xml_fitered

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含多个字段，包括文本内容（text）、目录（dir）、语言（lang）、创建日期（created_date）、更新日期（updated_date）、仓库名称（repo_name）、完整仓库名称（repo_full_name）、星标数（star）和令牌长度（len_tokens）。数据集主要用于存储与代码仓库相关的文本数据，可能用于自然语言处理或代码分析任务。数据集被分割为训练集（train），包含1,958,391个样本，总大小为18,846,193,457.238945字节。

This dataset includes multiple fields such as text content (text), directory (dir), language (lang), creation date (created_date), update date (updated_date), repository name (repo_name), full repository name (repo_full_name), star count (star), and token length (len_tokens). The dataset is primarily used for storing text data related to code repositories, potentially for natural language processing or code analysis tasks. The dataset is split into a training set (train) containing 1,958,391 samples, with a total size of 18,846,193,457.238945 bytes.

提供机构：

ZorraZabb

5,000+

优质数据集

54 个

任务类型

进入经典数据集