dfalbel/cran-packages
收藏Hugging Face2023-07-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/dfalbel/cran-packages
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
task_categories:
- text-generation
language:
- code
pretty_name: cran-packages
size_categories:
- 100K<n<1M
---
## CRAN packages dataset
R and Rmd source codes for CRAN packages.
The dataset has been constructed using the following steps:
- Downloaded latest version from all packages on CRAN (see last updated). The source code has been downloaded from the [GitHub mirror](https://github.com/cran).
- Identified the licenses from each package from their DESCRIPTION file, and classified each of them into some license_code. See the licenses.csv file.
- Extract R and Rmd source files from all packages and joined with the package LICENSES.
Datasets are provided as parquet files containing the following columns:
```
FileSystemDataset with 1 Parquet file
package: string
path: string
content: large_string
size: double
license: string
```
Last updated: Jun 6th 2023
## Changelog
- v1: Initial version
- dev: added all CRAN files and a license field that allows filtering out per license. Also removed some unused columns.
提供机构:
dfalbel
原始信息汇总
CRAN packages dataset 概述
数据集基本信息
- 许可证: other
- 任务类别: text-generation
- 语言: code
- 美观名称: cran-packages
- 大小类别: 100K<n<1M
数据集内容
-
数据来源: 包含CRAN包的R和Rmd源代码。数据集通过以下步骤构建:
- 从CRAN下载所有包的最新版本源代码。
- 从每个包的DESCRIPTION文件中识别许可证,并分类为license_code。
- 从所有包中提取R和Rmd源文件,并与包许可证合并。
-
数据格式: 以parquet文件形式提供,包含以下列:
- package: string
- path: string
- content: large_string
- size: double
- license: string
更新信息
- 最后更新: Jun 6th 2023
- 版本变更:
- v1: 初始版本
- dev: 添加了所有CRAN文件和许可证字段,允许按许可证过滤。移除了一些未使用的列。



