five

Dataset - MLinter: Learning Coding Practices from Examples—Dream or Reality?

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7341455
下载链接
链接失效反馈
官方服务:
资源简介:
Contains all artefacts generated to write our paper: MLinter: Learning Coding Practices from Examples—Dream or Reality? Contents: output: Datasets and all needed files used to create them results: CSV files generated by our MLinter evaluation for each rule analysis: Pictures and files generated to interpret our results Paper abstract: Coding practices are increasingly used by software companies. Their use promotes consistency, readability, and maintainability, which contribute to software quality. Coding practices were initially enforced by general-purpose linters, but companies now tend to design and adopt their own company- specific practices. However, these company-specific practices are often not automated, making it challenging to ensure they are shared and used by developers. Converting these practices into linter rules is a complex task that requires extensive static analysis and language engineering expertise. In this paper, we seek to answer the following question: can coding practices be learned automatically from examples manually tagged by developers? We conduct a feasibility study using CodeBERT, a state-of-the-art machine learning approach, to learn linter rules. Our results show that, although the resulting classifiers reach high precision and recall scores when evaluated on balanced synthetic datasets, their application on real-world, unbalanced codebases, while maintaining excellent recall, suffers from a severe drop in precision that hinders their usability.
创建时间:
2023-01-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作