HelloImSteven/applescript-lines-100k-non-annotated
收藏Hugging Face2023-04-17 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/HelloImSteven/applescript-lines-100k-non-annotated
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 8452105
num_examples: 100000
download_size: 2718505
dataset_size: 8452105
license: mit
task_categories:
- text-classification
tags:
- code
- applescript
size_categories:
- 100K<n<1M
---
# Dataset Card for "applescript-lines-100k-non-annotated"
## Description
Dataset of 100,000 unique lines of AppleScript code scraped from GitHub and GitHub Gists. The dataset has been de-duplicated, comments have been removed (both single and multi-line), and effort has been made to merge multi-line structures such as records into one (however, expect some variability in this regard).
The dataset is constructed as an intermediate step to a fully-annotated AppleScript dataset.
Each row has fields for `text` and `source`, with text being the raw text of the line and source being the file name and extension from which the line was obtained. Full source links have been omitted for anonymity.
提供机构:
HelloImSteven
原始信息汇总
数据集概述
基本信息
- 名称: applescript-lines-100k-non-annotated
- 大小: 100K<n<1M
- 许可证: MIT
数据结构
- 特征:
text: 字符串类型,表示原始的AppleScript代码行。source: 字符串类型,表示代码行来源的文件名和扩展名。
数据分割
- 训练集:
- 示例数量: 100000
- 数据大小: 8452105字节
下载信息
- 下载大小: 2718505字节
- 数据集大小: 8452105字节
任务与标签
- 任务类别: 文本分类
- 标签:
- 代码
- AppleScript



