ManyTypes4Py
收藏arXiv2021-04-10 更新2024-06-21 收录
下载链接:
https://zenodo.org/record/4479714
下载链接
链接失效反馈官方服务:
资源简介:
ManyTypes4Py是由代尔夫特理工大学创建的大型Python数据集,旨在通过机器学习技术进行类型推断。该数据集包含5,382个Python项目,总计超过869,000个类型注解。数据集通过移除重复源代码文件来消除重复偏差的影响,并被分为训练、验证和测试集。数据集的创建过程中,开发了一个轻量级的静态分析器管道,用于从抽象语法树(AST)中提取类型信息,并将分析结果存储为JSON格式文件。ManyTypes4Py数据集主要应用于机器学习基础的类型推断,帮助解决动态编程语言中缺乏静态类型的问题。
ManyTypes4Py is a large-scale Python dataset created by Delft University of Technology, intended for type inference using machine learning techniques. This dataset comprises 5,382 Python projects, with a total of over 869,000 type annotations. Duplicate source code files were removed from the dataset to eliminate the impact of duplicate bias, and it is split into training, validation, and test sets. During its development, a lightweight static analyzer pipeline was constructed to extract type information from Abstract Syntax Trees (ASTs), with the analysis results stored as JSON-formatted files. The ManyTypes4Py dataset is primarily applied to machine learning-based type inference, helping address the issue of missing static typing in dynamically typed programming languages.
提供机构:
代尔夫特理工大学软件技术系
创建时间:
2021-04-10



