PyTorrent

Name: PyTorrent
Creator: 美国北卡罗来纳州立大学
Published: 2021-10-05 04:48:31
License: 暂无描述

arXiv2021-10-05 更新2024-06-21 收录

下载链接：

https://github.com/fla-sil/PyTorrent

下载链接

链接失效反馈

官方服务：

资源简介：

PyTorrent是由美国北卡罗来纳州立大学和Fujitsu研究团队创建的大型Python库语料库，包含218,814个来自PyPI和Anaconda环境的Python包。该数据集旨在支持软件工程研究，如代码复用和代码可理解性，通过提供高质量、文档完善的代码资源。创建过程中，研究团队利用Scrapy爬虫从PyPI和Anaconda API收集了丰富的包元数据，并构建了详细的JSONL格式数据集。PyTorrent的应用领域广泛，包括代码检索、代码生成和缺陷预测等，旨在提高开发效率和软件质量。

PyTorrent is a large-scale Python library corpus created by research teams from North Carolina State University in the United States and Fujitsu. It contains 218,814 Python packages sourced from PyPI and Anaconda environments. This dataset aims to support software engineering research such as code reuse and code comprehensibility by providing high-quality, well-documented code resources. During its development, the research team used Scrapy crawlers to collect rich package metadata from PyPI and Anaconda APIs, and constructed a detailed dataset in JSONL format. PyTorrent has a wide range of application scenarios including code retrieval, code generation and defect prediction, and is designed to improve development efficiency and software quality.

提供机构：

美国北卡罗来纳州立大学

创建时间：

2021-10-05

搜集汇总

数据集介绍