bigcode/programming-languages-keywords
收藏Hugging Face2023-03-06 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/bigcode/programming-languages-keywords
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: language
dtype: string
- name: keywords
sequence: string
splits:
- name: train
num_bytes: 20307
num_examples: 36
download_size: 8838
dataset_size: 20307
---
# Dataset Card for "programming-languages-keywords"
Structured version of https://github.com/e3b0c442/keywords
Generated using:
```python
r = requests.get("https://raw.githubusercontent.com/e3b0c442/keywords/main/README.md")
keywords = r.text.split("### ")[1:]
keywords = [i for i in keywords if not i.startswith("Sources")]
keywords = {i.split("\n")[0]:[j for j in re.findall("[a-zA-Z]*", i.split("\n",1)[1]) if j] for i in keywords}
keywords = pd.DataFrame(pd.Series(keywords)).reset_index().rename(columns={"index":"language", 0:"keywords"})
keywords['language'] = keywords['language'].str.split("\) ").str[0]
keywords['keywords'] = keywords['keywords'].apply(lambda x: sorted(list(set(x))))
ds = Dataset.from_pandas(keywords)
```
提供机构:
bigcode
原始信息汇总
数据集概述
数据集名称
- 名称: programming-languages-keywords
数据结构
- 特征:
- language: 数据类型为字符串
- keywords: 数据类型为字符串序列
数据分割
- 训练集:
- 大小: 20307字节
- 样本数: 36个
数据集大小
- 下载大小: 8838字节
- 总大小: 20307字节



