hjerpe/github-kubeflow-issues
收藏Hugging Face2023-09-13 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/hjerpe/github-kubeflow-issues
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
language_creators:
- other
language:
- en
license: []
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- text-classification
task_ids:
- semantic-similarity-classification
- semantic-similarity-scoring
- topic-classification
- intent-classification
- multi-label-classification
- multi-class-classification
pretty_name: github-kubeflow-pipelines-issues
tags:
- GitHub-Issues
dataset_info:
features:
- name: url
dtype: string
- name: repository_url
dtype: string
- name: labels_url
dtype: string
- name: comments_url
dtype: string
- name: events_url
dtype: string
- name: html_url
dtype: string
- name: id
dtype: int64
- name: node_id
dtype: string
- name: number
dtype: int64
- name: title
dtype: string
- name: user
struct:
- name: login
dtype: string
- name: id
dtype: int64
- name: node_id
dtype: string
- name: avatar_url
dtype: string
- name: gravatar_id
dtype: string
- name: url
dtype: string
- name: html_url
dtype: string
- name: followers_url
dtype: string
- name: following_url
dtype: string
- name: gists_url
dtype: string
- name: starred_url
dtype: string
- name: subscriptions_url
dtype: string
- name: organizations_url
dtype: string
- name: repos_url
dtype: string
- name: events_url
dtype: string
- name: received_events_url
dtype: string
- name: type
dtype: string
- name: site_admin
dtype: bool
- name: labels
list:
- name: id
dtype: int64
- name: node_id
dtype: string
- name: url
dtype: string
- name: name
dtype: string
- name: color
dtype: string
- name: default
dtype: bool
- name: description
dtype: string
- name: state
dtype: string
- name: locked
dtype: bool
- name: assignee
struct:
- name: login
dtype: string
- name: id
dtype: int64
- name: node_id
dtype: string
- name: avatar_url
dtype: string
- name: gravatar_id
dtype: string
- name: url
dtype: string
- name: html_url
dtype: string
- name: followers_url
dtype: string
- name: following_url
dtype: string
- name: gists_url
dtype: string
- name: starred_url
dtype: string
- name: subscriptions_url
dtype: string
- name: organizations_url
dtype: string
- name: repos_url
dtype: string
- name: events_url
dtype: string
- name: received_events_url
dtype: string
- name: type
dtype: string
- name: site_admin
dtype: bool
- name: assignees
list:
- name: login
dtype: string
- name: id
dtype: int64
- name: node_id
dtype: string
- name: avatar_url
dtype: string
- name: gravatar_id
dtype: string
- name: url
dtype: string
- name: html_url
dtype: string
- name: followers_url
dtype: string
- name: following_url
dtype: string
- name: gists_url
dtype: string
- name: starred_url
dtype: string
- name: subscriptions_url
dtype: string
- name: organizations_url
dtype: string
- name: repos_url
dtype: string
- name: events_url
dtype: string
- name: received_events_url
dtype: string
- name: type
dtype: string
- name: site_admin
dtype: bool
- name: milestone
struct:
- name: url
dtype: string
- name: html_url
dtype: string
- name: labels_url
dtype: string
- name: id
dtype: int64
- name: node_id
dtype: string
- name: number
dtype: int64
- name: title
dtype: string
- name: description
dtype: string
- name: creator
struct:
- name: login
dtype: string
- name: id
dtype: int64
- name: node_id
dtype: string
- name: avatar_url
dtype: string
- name: gravatar_id
dtype: string
- name: url
dtype: string
- name: html_url
dtype: string
- name: followers_url
dtype: string
- name: following_url
dtype: string
- name: gists_url
dtype: string
- name: starred_url
dtype: string
- name: subscriptions_url
dtype: string
- name: organizations_url
dtype: string
- name: repos_url
dtype: string
- name: events_url
dtype: string
- name: received_events_url
dtype: string
- name: type
dtype: string
- name: site_admin
dtype: bool
- name: open_issues
dtype: int64
- name: closed_issues
dtype: int64
- name: state
dtype: string
- name: created_at
dtype: timestamp[s]
- name: updated_at
dtype: timestamp[s]
- name: due_on
dtype: timestamp[s]
- name: closed_at
dtype: 'null'
- name: comments
sequence: string
- name: created_at
dtype: timestamp[s]
- name: updated_at
dtype: timestamp[s]
- name: closed_at
dtype: timestamp[s]
- name: author_association
dtype: string
- name: active_lock_reason
dtype: 'null'
- name: body
dtype: string
- name: reactions
struct:
- name: url
dtype: string
- name: total_count
dtype: int64
- name: '+1'
dtype: int64
- name: '-1'
dtype: int64
- name: laugh
dtype: int64
- name: hooray
dtype: int64
- name: confused
dtype: int64
- name: heart
dtype: int64
- name: rocket
dtype: int64
- name: eyes
dtype: int64
- name: timeline_url
dtype: string
- name: performed_via_github_app
dtype: 'null'
- name: state_reason
dtype: string
- name: draft
dtype: bool
- name: pull_request
struct:
- name: url
dtype: string
- name: html_url
dtype: string
- name: diff_url
dtype: string
- name: patch_url
dtype: string
- name: merged_at
dtype: timestamp[s]
- name: is_pull_request
dtype: bool
splits:
- name: train
num_bytes: 9230693
num_examples: 1567
download_size: 0
dataset_size: 9230693
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Dataset Card for Dataset Name
## Dataset Description
- **Point of Contact:** [Adam Hjerpe](hjerpeadam5@gmail.com)
### Dataset Summary
GitHub Issues is a dataset consisting of the top 5_000 GitHub issues, as of 2023-09-02, associated with the KubeFlow Pipelines [repository](https://github.com/kubeflow/pipelines). It is intended for educational purposes and can be used for semantic search or multilabel text classification. The contents of each GitHub issue are in English and concern the domain of datasets for NLP, computer vision, and beyond.
### Languages
Contains language commonly found in English software development.
### Contributions
Thanks to [@hjerpe](https://github.com/hjerpe) for adding this dataset.
提供机构:
hjerpe
原始信息汇总
数据集卡片
数据集描述
数据集概述
GitHub Issues 数据集包含截至2023-09-02与 KubeFlow Pipelines 仓库相关的5000个顶级GitHub问题。该数据集旨在用于教育目的,可用于语义搜索或多标签文本分类。每个GitHub问题的内容为英语,涉及NLP、计算机视觉等领域的数据集。
语言
包含英语软件开发中常见的语言。
数据集特征
数据集包含以下特征:
- url: 字符串类型
- repository_url: 字符串类型
- labels_url: 字符串类型
- comments_url: 字符串类型
- events_url: 字符串类型
- html_url: 字符串类型
- id: 64位整数类型
- node_id: 字符串类型
- number: 64位整数类型
- title: 字符串类型
- user: 结构体类型,包含以下字段:
- login: 字符串类型
- id: 64位整数类型
- node_id: 字符串类型
- avatar_url: 字符串类型
- gravatar_id: 字符串类型
- url: 字符串类型
- html_url: 字符串类型
- followers_url: 字符串类型
- following_url: 字符串类型
- gists_url: 字符串类型
- starred_url: 字符串类型
- subscriptions_url: 字符串类型
- organizations_url: 字符串类型
- repos_url: 字符串类型
- events_url: 字符串类型
- received_events_url: 字符串类型
- type: 字符串类型
- site_admin: 布尔类型
- labels: 列表类型,包含以下字段:
- id: 64位整数类型
- node_id: 字符串类型
- url: 字符串类型
- name: 字符串类型
- color: 字符串类型
- default: 布尔类型
- description: 字符串类型
- state: 字符串类型
- locked: 布尔类型
- assignee: 结构体类型,包含以下字段:
- login: 字符串类型
- id: 64位整数类型
- node_id: 字符串类型
- avatar_url: 字符串类型
- gravatar_id: 字符串类型
- url: 字符串类型
- html_url: 字符串类型
- followers_url: 字符串类型
- following_url: 字符串类型
- gists_url: 字符串类型
- starred_url: 字符串类型
- subscriptions_url: 字符串类型
- organizations_url: 字符串类型
- repos_url: 字符串类型
- events_url: 字符串类型
- received_events_url: 字符串类型
- type: 字符串类型
- site_admin: 布尔类型
- assignees: 列表类型,包含以下字段:
- login: 字符串类型
- id: 64位整数类型
- node_id: 字符串类型
- avatar_url: 字符串类型
- gravatar_id: 字符串类型
- url: 字符串类型
- html_url: 字符串类型
- followers_url: 字符串类型
- following_url: 字符串类型
- gists_url: 字符串类型
- starred_url: 字符串类型
- subscriptions_url: 字符串类型
- organizations_url: 字符串类型
- repos_url: 字符串类型
- events_url: 字符串类型
- received_events_url: 字符串类型
- type: 字符串类型
- site_admin: 布尔类型
- milestone: 结构体类型,包含以下字段:
- url: 字符串类型
- html_url: 字符串类型
- labels_url: 字符串类型
- id: 64位整数类型
- node_id: 字符串类型
- number: 64位整数类型
- title: 字符串类型
- description: 字符串类型
- creator: 结构体类型,包含以下字段:
- login: 字符串类型
- id: 64位整数类型
- node_id: 字符串类型
- avatar_url: 字符串类型
- gravatar_id: 字符串类型
- url: 字符串类型
- html_url: 字符串类型
- followers_url: 字符串类型
- following_url: 字符串类型
- gists_url: 字符串类型
- starred_url: 字符串类型
- subscriptions_url: 字符串类型
- organizations_url: 字符串类型
- repos_url: 字符串类型
- events_url: 字符串类型
- received_events_url: 字符串类型
- type: 字符串类型
- site_admin: 布尔类型
- open_issues: 64位整数类型
- closed_issues: 64位整数类型
- state: 字符串类型
- created_at: 时间戳类型
- updated_at: 时间戳类型
- due_on: 时间戳类型
- closed_at: 时间戳类型
- comments: 字符串序列类型
- created_at: 时间戳类型
- updated_at: 时间戳类型
- closed_at: 时间戳类型
- author_association: 字符串类型
- active_lock_reason: 空类型
- body: 字符串类型
- reactions: 结构体类型,包含以下字段:
- url: 字符串类型
- total_count: 64位整数类型
- +1: 64位整数类型
- -1: 64位整数类型
- laugh: 64位整数类型
- hooray: 64位整数类型
- confused: 64位整数类型
- heart: 64位整数类型
- rocket: 64位整数类型
- eyes: 64位整数类型
- timeline_url: 字符串类型
- performed_via_github_app: 空类型
- state_reason: 字符串类型
- draft: 布尔类型
- pull_request: 结构体类型,包含以下字段:
- url: 字符串类型
- html_url: 字符串类型
- diff_url: 字符串类型
- patch_url: 字符串类型
- merged_at: 时间戳类型
- is_pull_request: 布尔类型
数据集分割
- train: 包含1567个样本,总字节数为9230693。
数据集大小
- 下载大小: 0
- 数据集大小: 9230693
配置
- config_name: default
- data_files:
- split: train
- path: data/train-*
- data_files:



