D4vidHuang/LLM_Of_Babel_ZH_Result
收藏Hugging Face2024-05-16 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/D4vidHuang/LLM_Of_Babel_ZH_Result
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: hexsha
dtype: string
- name: size
dtype: int64
- name: ext
dtype: string
- name: lang
dtype: string
- name: max_stars_repo_path
dtype: string
- name: max_stars_repo_name
dtype: string
- name: max_stars_repo_head_hexsha
dtype: string
- name: max_stars_repo_licenses
sequence: string
- name: max_stars_count
dtype: float64
- name: max_stars_repo_stars_event_min_datetime
dtype: string
- name: max_stars_repo_stars_event_max_datetime
dtype: string
- name: max_issues_repo_path
dtype: string
- name: max_issues_repo_name
dtype: string
- name: max_issues_repo_head_hexsha
dtype: string
- name: max_issues_repo_licenses
sequence: string
- name: max_issues_count
dtype: float64
- name: max_issues_repo_issues_event_min_datetime
dtype: string
- name: max_issues_repo_issues_event_max_datetime
dtype: string
- name: max_forks_repo_path
dtype: string
- name: max_forks_repo_name
dtype: string
- name: max_forks_repo_head_hexsha
dtype: string
- name: max_forks_repo_licenses
sequence: string
- name: max_forks_count
dtype: float64
- name: max_forks_repo_forks_event_min_datetime
dtype: string
- name: max_forks_repo_forks_event_max_datetime
dtype: string
- name: content
dtype: string
- name: avg_line_length
dtype: float64
- name: max_line_length
dtype: int64
- name: alphanum_fraction
dtype: float64
- name: language
struct:
- name: lang
dtype: string
- name: score
dtype: float64
- name: score
dtype: float64
- name: comment
dtype: string
- name: replaced_content
dtype: string
- name: __index_level_0__
dtype: int64
- name: predict
list:
- name: generated_text
dtype: string
splits:
- name: train
num_bytes: 1375002
num_examples: 200
download_size: 268611
dataset_size: 1375002
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
This dataset includes various features related to code files and GitHub repositories, such as file hash, size, extension, language, and detailed information related to repository stars, issues, and forks. The dataset is divided into a training set with 200 examples, totaling 1375002 bytes.
提供机构:
D4vidHuang
原始信息汇总
数据集特征概述
基本特征
- hexsha: 字符串类型
- size: 整数类型
- ext: 字符串类型
- lang: 字符串类型
- max_stars_repo_path: 字符串类型
- max_stars_repo_name: 字符串类型
- max_stars_repo_head_hexsha: 字符串类型
- max_stars_repo_licenses: 字符串序列类型
- max_stars_count: 浮点数类型
- max_stars_repo_stars_event_min_datetime: 字符串类型
- max_stars_repo_stars_event_max_datetime: 字符串类型
- max_issues_repo_path: 字符串类型
- max_issues_repo_name: 字符串类型
- max_issues_repo_head_hexsha: 字符串类型
- max_issues_repo_licenses: 字符串序列类型
- max_issues_count: 浮点数类型
- max_issues_repo_issues_event_min_datetime: 字符串类型
- max_issues_repo_issues_event_max_datetime: 字符串类型
- max_forks_repo_path: 字符串类型
- max_forks_repo_name: 字符串类型
- max_forks_repo_head_hexsha: 字符串类型
- max_forks_repo_licenses: 字符串序列类型
- max_forks_count: 浮点数类型
- max_forks_repo_forks_event_min_datetime: 字符串类型
- max_forks_repo_forks_event_max_datetime: 字符串类型
- content: 字符串类型
- avg_line_length: 浮点数类型
- max_line_length: 整数类型
- alphanum_fraction: 浮点数类型
结构化特征
- language: 结构体类型,包含:
- lang: 字符串类型
- score: 浮点数类型
- score: 浮点数类型
- comment: 字符串类型
- replaced_content: 字符串类型
- index_level_0: 整数类型
- predict: 列表类型,包含:
- generated_text: 字符串类型
数据集信息
- 训练集 (train):
- 数据量: 200个样本
- 数据大小: 1375002字节
- 下载大小: 268611字节



