ZorraZabb/full_coding_sampling_xml_fitered
收藏Hugging Face2024-09-13 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/ZorraZabb/full_coding_sampling_xml_fitered
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个字段,包括文本内容(text)、目录(dir)、语言(lang)、创建日期(created_date)、更新日期(updated_date)、仓库名称(repo_name)、完整仓库名称(repo_full_name)、星标数(star)和令牌长度(len_tokens)。数据集主要用于存储与代码仓库相关的文本数据,可能用于自然语言处理或代码分析任务。数据集被分割为训练集(train),包含1,958,391个样本,总大小为18,846,193,457.238945字节。
This dataset includes multiple fields such as text content (text), directory (dir), language (lang), creation date (created_date), update date (updated_date), repository name (repo_name), full repository name (repo_full_name), star count (star), and token length (len_tokens). The dataset is primarily used for storing text data related to code repositories, potentially for natural language processing or code analysis tasks. The dataset is split into a training set (train) containing 1,958,391 samples, with a total size of 18,846,193,457.238945 bytes.
提供机构:
ZorraZabb



