nuprl/stack-dedup-python-testgen-starcoder-filter-v2
收藏Hugging Face2024-02-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/nuprl/stack-dedup-python-testgen-starcoder-filter-v2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: content
dtype: string
- name: content_with_types
dtype: 'null'
- name: sha1
dtype: string
- name: id
dtype: int64
- name: entrypoint
dtype: string
- name: tests
sequence: string
- name: coverage
dtype: int64
- name: tests_failed
sequence: string
splits:
- name: train
num_bytes: 305655078
num_examples: 157767
download_size: 118032344
dataset_size: 305655078
extra_gated_prompt: |
If you use this dataset, you agree to cite the paper (see below for citation).
---
# MultiPL-T Python Sources
## Citation
**If you use this dataset we request that you cite our work:**
```
@misc{cassano:multipl-t,
title={Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs},
author={Federico Cassano and John Gouwar and Francesca Lucchetti and Claire Schlesinger and Anders Freeman and Carolyn Jane Anderson and Molly Q Feldman and Michael Greenberg and Abhinav Jangda and Arjun Guha},
year={2024},
eprint={2308.09895},
archivePrefix={arXiv},
primaryClass={cs.PL}
}
```
提供机构:
nuprl
原始信息汇总
数据集概述
数据集名称
MultiPL-T Python Sources
数据特征
- content: 数据类型为字符串。
- content_with_types: 数据类型为空。
- sha1: 数据类型为字符串。
- id: 数据类型为整数。
- entrypoint: 数据类型为字符串。
- tests: 数据类型为字符串序列。
- coverage: 数据类型为整数。
- tests_failed: 数据类型为字符串序列。
数据分割
- train: 包含157767个示例,总字节数为305655078。
数据集大小
- 下载大小: 118032344字节
- 数据集大小: 305655078字节



