fasterinnerlooper/lcc_csharp
收藏Hugging Face2024-01-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/fasterinnerlooper/lcc_csharp
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: mit
size_categories:
- 100K<n<1M
task_categories:
- mask-generation
- fill-mask
- text-generation
pretty_name: LCC_csharp dataset modified for infilling
dataset_info:
features:
- name: prefix
dtype: string
- name: suffix
dtype: string
- name: prediction
dtype: string
splits:
- name: train
num_bytes: 1852197668
num_examples: 100000
download_size: 531853418
dataset_size: 1852197668
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
tags:
- code
---
This dataset has been modified from the microsoft/LCC_csharp dataset to provide CodeLLaMa with infilling tasks as per the original fill-in-the-middle paper, were the text that needs to be filled in is moved to the end of the dataset, thus taking advantage of the Generative feature of GPT-style models.
提供机构:
fasterinnerlooper
原始信息汇总
数据集概述
基本信息
- 语言: 英语
- 许可证: MIT
- 大小类别: 100K<n<1M
- 任务类别:
- 掩码生成
- 填充掩码
- 文本生成
- 美观名称: LCC_csharp dataset modified for infilling
数据集信息
- 特征:
- 前缀: 字符串类型
- 后缀: 字符串类型
- 预测: 字符串类型
- 分割:
- 训练集:
- 字节数: 1852197668
- 样本数: 100000
- 训练集:
- 下载大小: 531853418
- 数据集大小: 1852197668
配置
- 配置名称: default
- 数据文件:
- 分割: 训练集
- 路径: data/train-*
标签
- 代码



