five

dfalbel/github-r-repos

收藏
Hugging Face2023-07-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/dfalbel/github-r-repos
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: other task_categories: - text-generation language: - code pretty_name: github-r-repos size_categories: - 100K<n<1M --- ## GitHub R repositories dataset R source files from GitHub. This dataset has been created using the public GitHub datasets from Google BigQuery. This is the actual query that has been used to export the data: ``` EXPORT DATA OPTIONS ( uri = 'gs://your-bucket/gh-r/*.parquet', format = 'PARQUET') as ( select f.id, f.repo_name, f.path, c.content, c.size from ( SELECT distinct id, repo_name, path FROM `bigquery-public-data.github_repos.files` where ends_with(path, ".R") ) as f left join `bigquery-public-data.github_repos.contents` as c on f.id = c.id ) EXPORT_DATA OPTIONS ( uri = 'gs://your-bucket/licenses.parquet', format = 'PARQUET') as (select * from `bigquery-public-data.github_repos.licenses`) ``` Files were then exported and processed locally with files in the root of this repository. Datasets in this repository contain data from reositories with different licenses. The data schema is: ``` id: string repo_name: string path: string content: string size: int32 license: string ``` Last updated: Jun 6th 2023
提供机构:
dfalbel
原始信息汇总

GitHub R repositories dataset 概述

数据集基本信息

  • 许可证: other
  • 任务类别: text-generation
  • 语言: code
  • 数据集大小: 100K<n<1M

数据来源与处理

  • 数据集由 Google BigQuery 的公共 GitHub 数据集创建。

  • 使用以下 SQL 查询导出数据: sql EXPORT DATA OPTIONS ( uri = gs://your-bucket/gh-r/*.parquet, format = PARQUET) as ( select f.id, f.repo_name, f.path, c.content, c.size from ( SELECT distinct id, repo_name, path FROM bigquery-public-data.github_repos.files where ends_with(path, ".R") ) as f left join bigquery-public-data.github_repos.contents as c on f.id = c.id )

  • 数据集包含来自具有不同许可证的存储库的数据。

数据结构

  • id: string
  • repo_name: string
  • path: string
  • content: string
  • size: int32
  • license: string

更新日期

  • 最后更新: Jun 6th 2023
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作